# <center>Big Data &ndash; Exercises &amp; Solutions</center>
## <center>Fall 2018 &ndash; Week 7 &ndash; ETH Zurich</center>

## Introduction
This exercise will cover XML and JSON well-formedness.

For the next four weeks you will be using [oXygen](https://www.oxygenxml.com/xml_editor/software_archive_editor.html), an XML/JSON development IDE. You should have received a license key for it by email. Before starting, make sure oXygen is installed and working on your computer.

**Note for MacOS Users:**

With the latest version of MacOS we noticed that simply clicking on the icon of the oXygen app after downloading the software does not work (app crashes immediately). However the software in itself is still runable. To launch the software follow this procedure:
1. Downlaod the software following the above link. 
2. Paste the license key file in the downloaded oxygen folder.
3. Open a terminal 
4. Change the current directory to the oyxgen folder `cd path/to/oxygen`
5. run `sh oxygen.sh`
Oxygen should then open.

## 1 XML
### 1.1 Well-formedness
Correct the following XML documents to be well-formed. Try first to "parse" it in mind, the use oXygen to check.


1.

```
<?xml version="1.0"?>
<catalog>
    <!-- Start book list --to de defined -->
   <Book id=`bk101`>
      <author>&cright; Gambardella, Matthew</author>
      <title>XML Developer's Guide</title>
      <genre>Computer</genre>
      <price>44.95€</price>
      <publish_date version='hard' version='soft'>2000-10-01</publish_date>
      <_description lang=en>An `in-depth look` at creating applications 
      with XML <for dummies>.</_description>
      <xml_parse>true</xml_parse>
   </book>
</>
```


**Solution**

Document 1 has the following problems:
1. the quotes in XML must always be simple quotes or double quotes, but not "Word-style" quotes (〝, 〞, \`, etc.);
1. the `book` start tag does not correspond to the `Book` end tag;
1. the `catalog` tag is not closed correctly;
1. the entity `&cright;` is not defined in XML. You have to define it explicitely;
1. you cannot have the `<` or `>` sign inside attributes. Use `&lt;` or `&gt;` instead (defined by XML). Also it is advised to use `&gt;` for the `>` symbol;
1. attribute `version` in `publish_date` is duplicated, this is forbidden;
1. comments `<!-- -->` cannot include the characters `--`;
1. the `lang` attribute should be quoted;
1. XML names beginning with xml are reserved by the W3C. Their usage should be avoided (except if it is as specified as the W3C, e.g. xml:space, xml:lang, xmlns...).

Here is the corrected document:

```xml
<?xml version="1.0" encoding="utf-16"?>
<!DOCTYPE catalog [
<!ENTITY cright "&#169;">
]>
<catalog>
    <!-- Start book list - -to de defined -->
   <Book id='bk101'>
      <author>&cright; Gambardella, Matthew</author>
      <title>XML Developer's Guide</title>
      <genre>Computer</genre>
      <price>44.95€</price>
      <publish_date version='hard' version2='soft'>2000-10-01</publish_date>
      <_description lang='en'>An `in-depth look` at creating applications 
      with XML &lt;for dummies&gt;</_description>
      <parse>true</parse>
   </Book>
</catalog>
```


2.

```
<?xml version="1.0" encoding="utf-16"?>
<h:library xmlns:xdc="http://www.xml.com/books" xmlns:h="http://xml.com/library">
    <head><h:title>Book Review</title></head>
    <body/>
        <_xdc:bookreview>
            <xdc:title>XML: A Primer</xdc:title>
            <_table _style='container'>
                <h:tr align="#center">
                    <h:td>Author<h:span>St. Laurent & Tom Faron</h:td></h:span>
                </h:tr>
                <h:tr align="#left">
                    <h:td><xdc:author>Simon St. Laurent</xdc:author></h:td>
                    <h:td><xdc:price>31.98</xdc:price></h:td>
                    <h:td><xdc:#pages>352</xdc:#pages></h:td>
                    <h:td><xdc:_date>1998/01</xdc:_date></h:td>
                    <h:td><xdc:-comment>Love it</xdc:-comment></h:td>
                </h:tr>
            </_table>
        </_xdc:bookreview>
    </body>
</h:library>
```

**Solution**

Document 2 has the following problems:
1. `<h:title>` opening tag does not match the closing tag `</title>`;
1. `body` uses an empty tag when opening tag is required instead;
1. in `<_xdc:_bookreview>` the namespace `_xdc` is not defined;
1. the `<h:span>` element containing the author name should be closed before closing its parent;
1. the `&` in the author text field should be escaped;
1. `<xdc:#pages>` is not a valid tag name;
1. `<xdc:-comment>` is not a valid tag name.

(In a very theoretical discussion, you could say that the document is well-formed in the XML core specification without namespaces even if points problems 3 and 7 are not corrected. In that case `_xdc:_bookreview` would be an element name with a colon in it, which doesn't have a special meaning, and, in particular, does not refer to a non-existing namespace. Also, the hyphen in `xdc:-comment` would just be in the middle of the element name, where it is allowed. However, nobody *ever* uses XML without namespaces â€” that separation only exists for historic reasons.)

Here is the corrected document:


```xml
<?xml version="1.0" encoding="utf-16"?>
<h:library xmlns:xdc="http://www.xml.com/books" xmlns:h="http://xml.com/library">
    <head><h:title>Book Review</h:title></head>
    <body>
    <xdc:bookreview>
        <xdc:title>XML: A Primer</xdc:title>
        <_table _style='container'>
            <h:tr align="#center">
                <h:td>Author<h:span>St. Laurent &amp; Tom Faron</h:span></h:td>
            </h:tr>
            <h:tr align="#left">
                <h:td><xdc:author>Simon St. Laurent</xdc:author></h:td>
                <h:td><xdc:price>31.98</xdc:price></h:td>
                <h:td><xdc:pages>352</xdc:pages></h:td>
                <h:td><xdc:_date>1998/01</xdc:_date></h:td>
                <h:td><xdc:comment>Love it</xdc:comment></h:td>
            </h:tr>
        </_table>
    </xdc:bookreview>
    </body>
</h:library>
```

### 1.2 Create your own XML
1. Copy the text of the introduction above (including the title until 'your computer') and paste it into oXygen as plain text. Create a possible XML document, having the same context and including formatting (title, sections, style, links, etc.). Make sure your XML is well-formed and save it as `doc1.xml`.

1. Copy the same text into Microsoft Word or OpenOffice and save it XML (both programs allow to export as `.xml` if you use *Save as...*).

**Questions**
1. Compare the two XML. What differences do you notice?
1. Is this data structured, unstructured, or semi-structured?

**Solution**

There is no unique way to create an XML encoding the text above and its structure. This is one possible solution:

```xml
<?xml version="1.0" encoding="UTF-8"?>
<document lang="en">
    <title font-size='15px' font-style='bold'>Introduction</title>
    <paragraph font-size='12px'>This exercise will cover XML and JSON well-formedness.</paragraph>
    <paragraph font-size='12px'>For the next four weeks, you will be using <link url="https://www.oxygenxml.com/xml_editor/software_archive_editor.html">oXygen</link>, an XML/JSON development IDE. You should have received a license key for it by email. Before starting, make sure oXygen is installed and working on your computer.</paragraph>
</document>
```

1. The differences depend on which program you used to generate export the document. The file exported by Word or OpenOffice probably contains much more information, including the font family, page layout, paragraph layout, etc.
1. This data is semi-structured: there is some structure, but not all of the content is structured as in a flat database.


### 1.3 XML Names
Which of the following are well-formed XML tags (i.e. which tag contain a conform XML name)? 
1. `<_bar/>`
1. `<123foo/>`
1. `<Foo/>`
1. `<foo_123/>`
1. `<foo#123/>`
1. `<foo-123/>`


**Solution**

1, 3, 4, 6 are valid names. Remember:
1. element names are case-sensitive.
1. element names must start with a letter or underscore.
1. element names cannot start with the letters xml (or XML, or Xml, etc).
1. element names can contain letters, digits, hyphens, underscores, and periods.
1. element names cannot contain spaces.

### 1.4 Predefined entities
XML has only 5 predefined entities. Connect each escape code to the corresponding value.
1. `&lt;` &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;     >
1. `&amp;`&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;           "
1. `&gt;` &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;     '
1. `&quot;` &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;                           &
1. `&apos;` &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;                           <

** Solution **
1. `&lt;` &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;     <
1. `&amp;`&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;           &
1. `&gt;` &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;     >
1. `&quot;` &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;                           "
1. `&apos;` &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;                           '



## 2. JSON

### 2.1Well-formedness
Correct the following JSON document to be well-formed. Try first to "parse" it in mind, the use oXygen to check.

```
{
  "firstName": "John",
  "lastName": "Smith",
  "-isAlive": true,
  age: 25,
  "isRetired",
  "address": {
    "streetAddress": "21 2nd Street",
    "city": "New York",
    "state": "NY",
    "postalCode": "10021-3100",
    'is verified' : "true"
  }
  'phoneNumbers': [
    {
      "type": [["home"]],
      "@number": "212 555-1234"
    },
    {
      "type": [["office"]],
      "@number": "646 555-4567"
    },
    {
      "type": [["mobile"[],
      "@number": "123 456-7890"
    }
  ],
  "children": [],
  "settings": {},
  "spouse": Null
}
```

**Solution**
1. `age` name must be quoted.
1. `isRetired` must have a value.
1. `address` object must be followed by a comma.
1. `is verified` and `phoneNumbers` should be double quoted.
1. The nested array in the `type` attribute of the last `phoneNumbers` is badly balanced (`[["mobile"[]`).
1. `Null` is not a valid value.

Using whitespaces and non-ascii characters for key names is allowed although not recommended. Mixing proper boolean values and strings used as boolean values (ie. "true") is also considered a bad practice.

Corrected document:

```json
{
  "firstName": "John",
  "lastName": "Smith",
  "isAlive": true,
  "age": 25,
  "isRetired": false,
  "address": {
    "streetAddress": "21 2nd Street",
    "city": "New York",
    "state": "NY",
    "postalCode": "10021-3100",
    "isVerified" : true
  },
  "phoneNumbers": [
    {
      "type": [["home"]],
      "@number": "212 555-1234"
    },
    {
      "type": [["office"]],
      "@number": "646 555-4567"
    },
    {
      "type": [["mobile"]],
      "@number": "123 456-7890"
    }
  ],
  "children": [],
  "settings": {},
  "spouse": null
}
```

## 3 Conversions from a relational database

Messages from conversations between users are stored in a SQL table. Translate this table into XML and JSON.

|conversation_id | people | sender | content | timestamp | is_read | attachment_id|
|----------------|--------|--------|---------|-----------|---------|--------------|
|42|charlie,ari,jesse|charlie|hey, here's the doc ><|1510410193|TRUE|NULL|
|42|charlie,ari,jesse|charlie|NULL|1510410244|TRUE|doc_6492|
|42|charlie,ari,jesse|ari|thanks! \o/|1510432987|FALSE|NULL|
|17|rudy,sage|rudy|look at this cute "bat-cat"! 😻|1500897189|TRUE|img_91847|
|17|rudy,sage|NULL|aww ♥|1506610190|TRUE|NULL|
    
    
**Solution**

There are, of course, many possible solutions.

JSON:

```json
[
    {
        "conversation_id": 42,
        "people": ["charlie", "ari", "jesse"],
        "messages": [
            {
                "sender": "charlie",
                "content": "hey, here's the doc ><",
                "timestamp": 1510410193,
                "is_read": true
            },
            {
                "sender": "charlie",
                "timestamp": 1510410244,
                "is_read": true,
                "attachment_id": "doc_6492"
            },
            {
                "sender": "ari",
                "content": "thanks! \\o/",
                "timestamp": 1510432987,
                "is_read": false
            }
        ]
    },
    {
        "conversation_id": 17,
        "people": ["rudy", "sage"],
        "messages": [
            {
                "sender": "charlie",
                "content": "look at this cute \"bat-cat\"! 😻",
                "timestamp": 1500897189,
                "is_read": true,
                "attachment_id": "img_91847"
            },
            {
                "content": "aww ♥",
                "timestamp": 1506610190,
                "is_read": true
            }
        ]
    }
]

```

XML:

```xml
<?xml version="1.0" encoding="UTF-8" ?>
<conversations>
    <conversation id="42">
		<people>charlie</people>
		<people>ari</people>
		<people>jesse</people>
		<message>
			<sender>charlie</sender>
			<content>hey, here's the doc &gt;&lt;</content>
			<timestamp>1510410193</timestamp>
			<is_read>true</is_read>
		</message>
		<message>
			<sender>charlie</sender>
			<timestamp>1510410244</timestamp>
			<is_read>true</is_read>
			<attachment_id>doc_6492</attachment_id>
		</message>
		<message>
			<sender>ari</sender>
			<content>thanks! \o/</content>
			<timestamp>1510432987</timestamp>
			<is_read>false</is_read>
		</message>
    </conversation>
    <conversation id="17">
		<people>rudy</people>
		<people>sage</people>
		<message>
			<sender>charlie</sender>
			<content>look at this cute &quot;bat-cat&quot;! 😻</content>
			<timestamp>1500897189</timestamp>
			<is_read>true</is_read>
			<attachment_id>img_91847</attachment_id>
		</message>
		<message>
			<content>aww ♥</content>
			<timestamp>1506610190</timestamp>
			<is_read>true</is_read>
		</message>
    </conversation>
</conversations>
```

It's important to not have tags for NULL values, otherwise the value is interpreted as the empty string. In practice, one could define a schema with a specification of the type of each value, transform timestamps into `xs:datetime`, etc.

## 4 More XML
### 4.1 HTML vs XHTML
Is the following correct HTML? Is it correct XML? XHTML?

```html
<html>
  <head>
    <title>Untitled</title>
  </head>
  Dear jane <br>
  <p>You are invited at the weekly meeting
  <p>Yours sincerely, <br>
  John
</html>
```

**Solution**

This will be shown correctly in most browsers. However, it is not well-formed XML: the `br` and `p` tags are not closed. The following would be well-formed  XML:

```xml
<?xml version="1.0" encoding="UTF-8"?>
<html>
    <head>
        <title>Untitled</title>
    </head>
    <body>
        Dear jane <br/>
        <p>You are invited at the weekly meeting</p>
        <p>Yours sincerely, <br/>
            John</p>
    </body>
</html>
```

But XHTML is more than just XML: it also has to have a certain structure (this is called to be "valid"). Among others, the tags have to live in the XHTML namespace:

```xml
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
    <head>
        <title>Untitled</title>
    </head>
    <body>
        <div>Dear jane 
            <p>You are invited at the weekly meeting</p>
            <p>Yours sincerely, <br/>
                John</p>
        </div>
    </body>
</html>
```

As to whether this is correct HTML, we first have to pick a HTML version to answer that question. With HTML5, you would need to add `<!DOCTYPE html>` in the beginning of the file; and most validators will complain if you don't add a charset declaration (like `<meta charset="utf-8">`) to your file. Then, it becomes a valid file — closing the `<p>` tags is optional in HTML5. It is good practice to close them, though, for consistency and predictability when using CSS.

### 4.2 XML Namespaces

1. Is the following XML file well-formed?
1. What are the namespaces of each attribute and each element?
1. What's wrong with this file? Fix it so it is well-formed, follows best practices, and each element uses the correct namespace.

```xml
<?xml version="1.0" encoding="UTF-8"?>
<foo
xmlns="http://xmlrepo.test/foo.xml"
xmlns:foo="http://xmlrepo.test/foo.xml"
xmlns:math="http://xmlrepo.test/math.xml">
    <bar:baz xmlns:bar="http://xmlrepo.test/bar.xml" bar:attr="some attribute" lalala="some other attribute">
        <svg xmlns="http://xmlrepo.test/svg.xml">
            <textbox>
                <math:msup>42</math:msup>
                <foo:plus/>
                <math:msub>17</math:msub>
            </textbox>
            <foo_value id="748">some value</foo_value>
        </svg>
        <svg xmlns:svg="http://xmlrepo.test/svg.xml">
            <svg:textbox>
                <math:msup>42</math:msup>
                <foo:plus/>
                <msub>17</msub>
            </svg:textbox>
            <bar_value id="867">some other value</bar_value>
        </svg>
        <math:othermath/>
    </bar:baz>
</foo>

```

**Solution**

1. The document is weird and is full of "obvious" mistakes (see answer to question 3), but it is technically well-formed.
2. `foo` is in namespace `foo`. In the first `<svg>` tag, elements `textbox` and `foo_value` are in namespace `svg`. In the second, `msub` and `bar_value` are in namespace `foo`. Prefixed attributes are in the corresponding namespaces. Nonprefixed attributes (`lalala` and `id`) are in no namespace.
3. Let's declare everything in the root, not use any default namespace at all (although leaving `foo` as the default namespace could be reasonable), and prefix everything.

```xml
<?xml version="1.0" encoding="UTF-8"?>
<foo
xmlns:foo="http://xmlrepo.test/foo.xml"
xmlns:bar="http://xmlrepo.test/bar.xml"
xmlns:svg="http://xmlrepo.test/svg.xml"
xmlns:math="http://xmlrepo.test/math.xml">
    <bar:baz bar:attr="some attribute" bar:bar="some other attribute">
        <svg:svg>
            <svg:textbox svg:svg-style="dotted">
                <math:msup>42</math:msup>
                <foo:plus/>
                <math:msub>17</math:msub>
            </svg:textbox>
            <svg:textbox/>
            <foo:foo_value id="748">some value</foo:foo_value>
        </svg:svg>
        <svg:svg>
            <svg:textbox>
                <math:msup>42</math:msup>
                <foo:plus/>
                <math:msub>17</math:msub>
            </svg:textbox>
            <bar:bar_value id="867">some other value</bar:bar_value>
        </svg:svg>
        <math:formula/>
    </bar:baz>
</foo>
```



### 4.3 General entities

Suppose we have a DTD containing the following, stored as a local file named `example.dtd` and accessible online at address `http://dtd.example/example.dtd`:

```dtd
<!ENTITY tp "@TedOnPrivacy">
<!ENTITY tw "twitter">
```

Now, consider the following "base" XML document:

```xml
<?xml version="1.0"?>
<document orig="twitter">
    <author>&lt;@TedOnPrivacy&gt;</author>
</document>
```

Determine which of the following XML documents are equivalent to the base document. Fix every incorrect example. You can find all answers in the [official XML specification](https://www.w3.org/TR/xml/).

1.

```xml
<?xml version="1.0"?>
<!DOCTYPE author [
  <!ENTITY tp "@TedOnPrivacy">
  <!ENTITY tw "twitter">
]>
<document orig="&tw;">
    <author>&lt;&tp;&gt;</author>
</document>
```

**Solution**: Strictly speaking, this would be well-formed and equivalent to the reference document. 
Note: Eventhough, oXygen accepts this version of the DTD, there is a validity constraint stating that the DOCTYPE should be the root element i.e. here `document`. If we were to use the same DTD to validate the document (by adding the corresponding element tag to the DTD) the validation parser would not accept this DTD because it has another DOCTYPE element.

2.

```xml
<?xml version="1.0"?>
<!DOCTYPE document [
  <!ENTITY at "@">
  <!ENTITY tp "&at;TedOnPrivacy">
  <!ENTITY recursive "&recursive;">
  <!ENTITY tp "@torproject">
]>
<document orig="twitter">
    <author>&lt;&tp;&gt;</author>
</document>
```

**Solution**: This is correct, although it will probably issue warnings. The
parser doesn't evaluate `recursive`, since it is not in the document content.
Two entities having the same name are discouraged, but not forbidden; the first
one is binding, and referencing entities in other entities is allowed.

3.

```xml
<?xml version="1.0" standalone="yes"?>
<!DOCTYPE document SYSTEM "http://dtd.example/example.dtd">
<document orig="&tw;">
    <author>&lt;&tp;&gt;</author>
</document>
```

**Solution**: This is incorrect, the `standalone="yes"` attribute prevents the
XML parser from using entities coming from external resources. Changing it to `standalone="no"`, or removing it completely (since the default value is `"no"`), would fix it. Note that the URI after the SYSTEM keyword can be either a URL or a local URI. 

4.

```xml
<?xml version="1.0"?>
<!DOCTYPE document PUBLIC "-//dtd.example/en/example.dtd" "sample.dtd">
<document orig="&tw;">
    <author>&lt;&tp;&gt;</author>
</document>
```

**Solution**: This is incorrect. The PUBLIC keyword tells the parser to first look for in its internal catalog whether the `http://dtd.example/example.dtd` is a know public ID linking to an existing schema. Here this first step fails, the parser will then try to get the local file `sample.dtd`, which also fails. However with `<!DOCTYPE document PUBLIC "-//dtd.example/en/example.dtd" "example.dtd">` it would be equivalent to the reference document.

5.

```xml
<?xml version="1.0"?>
<!DOCTYPE document [
	<!ENTITY example SYSTEM "example.dtd">
	&example;
]>
<document orig="&tw;">
    <author>&lt;&tp;&gt;</author>
</document>
```

**Solution**: This is incorrect: for an entity to be referenced into the DTD itself, it has to be declared as a parameter entity, and referenced using `%`. To fix it, use `<!ENTITY % example SYSTEM "example.dtd">`, then `%example;`.

6.

```xml
<?xml version="1.0"?>
<!DOCTYPE author [
  <!ENTITY tw "twitter">
]>
<document orig="&tw;">
    <author><![CDATA[
        <@TedOnPrivacy>
    ]]></author>
</document>
```

**Solution**: This is valid XML, but different from the example: the whitespace
inside CDATA will appear on the document.

7.

Assuming `ted.txt` is a text file containing `<@TedOnPrivacy>`:


```xml
<?xml version="1.0"?>
<!DOCTYPE document [
	<!ENTITY text SYSTEM "ted.txt" NDATA text>
]>
<document orig="twitter">
    <author>&text;</author>
</document>
```

**Solution**: This will fail, as the type "text" is not defined by any NOTATION
declaration. One would have to add something like `<!NOTATION text SYSTEM "text/plain">`, or replace `text` by `text/plain` directly in the ENTITY definition, to fix it.





## 5. From XML to JSON - back to the REST API request result from previous exercise sessions.
In this exercise you are asked to translate the following XML document into a JSON document. 
Remember the Postman request we got during the tutorial about Azure Blob Storage. The result was an XML file. Below you can find the result of the request (with some elements removed for simplicity and a second fake blob added to the response). Now that you can parse it, please transform it as a JSON file. 
```xml 
<EnumerationResults ContainerName="https://melaniestorage.blob.core.windows.net/exercise02">
    <Blobs>
        <Blob>
            <Name>picture</Name>
            <Url>https://melaniestorage.blob.core.windows.net/exercise02/picture</Url>
            <Properties>
                <Last-Modified>Wed, 03 Oct 2018 07:22:16 GMT</Last-Modified>
                <Content-Length>136356</Content-Length>
                <Content-Encoding />
                <BlobType>BlockBlob</BlobType>
            </Properties>
        </Blob>
        <Blob>
            <Name>music</Name>
            <Url>https://melaniestorage.blob.core.windows.net/exercise02/music</Url>
            <Properties>
                <Last-Modified>Wed, 03 Oct 2018 07:23:16 GMT</Last-Modified>
                <Content-Length>222222</Content-Length>
                <Content-Encoding />
                <BlobType>BlockBlob</BlobType>
            </Properties>
        </Blob>
    </Blobs>
</EnumerationResults>
```
**Solution**
```
{"EnumerationResults": 
    {"ContainerName": "https://melaniestorage.blob.core.windows.net/exercise02", 
            "Blobs": 
            [{"Blob": {"Name": "picture", 
                       "Url": "https://melaniestorage.blob.core.windows.net/exercise02/picture", 
                       "Properties": 
                                {"Last-Modified": "Wed, 03 Oct 2018 07:22:16 GMT", 
                                "Content-Length": 136356, 
                                "Content-Encoding": null, 
                                "BlobType": "BlockBlob"}
                        }
                },
                {"Blob": {"Name": "music", 
                          "Url": "https://melaniestorage.blob.core.windows.net/exercise02/music", 
                          "Properties": 
                                {"Last-Modified": "Wed, 03 Oct 2018 07:23:16 GMT", 
                                "Content-Length": 222222, 
                                "Content-Encoding": null, 
                                "BlobType": "BlockBlob"}
                         }
                 }]
    }
} 
```

## 6. XML vs CSV - the limits of tables for heterogeneous data
If your document consists of a collection of heterogeneous objects with different attributes XML/JSON turns out to be more suited than a comma-separated value format to store the data. Indeed, in this setting denormalization is a good idea. This is what we want to show in this exercise. You are given the following XML document representing a collection of products available in an online shop selling all kinds of products. In this product catalog each product has different attributes. You are asked to turn this data into a CSV file.
```xml
<productscatalog>
    <product>
        <id> 1 </id>
        <category> BBQ </category>
        <type> Gaz </type>
        <height> 120cm </height>
    </product>
    <product>
        <id> 2 </id>
        <category> notebook </category>
        <brand> Apple </brand>
        <specs>
             <RAM> 16Go </RAM>
            <storage> 128Go </storage>
        </specs>
    </product>
    <product>
        <id> 3 </id>
        <category> shoes </category>
        <size> 39 </size>
        <model> Heels </model>
    </product>
```    

### SOLUTION
**1. Write the documents in a CSV format (i.e. in a table).**
```
id, category, type, height, brand, specs:RAM, specs:storage, size, model
1, BBQ, Gaz, 120cm,,,,,,,
2, notebook,,,Apple, 16Go,128Go,,,
3,shoes,,,,,,39, Heels
```

This solution is not unique. 

You could for example also store it in the following way:
```
id, AttributeName, AttributeValue
1, category, BBQ
1, type, Gaz
1, height, 120cm
2, category, notebook
2, brand, Apple
2, specs:RAM, 16Go
2, specs: storage, 128Go
3, category, shoes
3, size, 39
3, model, Heels
```


**2. What are the disadvantages of the CSV format compared to the XML format in this case?**

For the first solution:
We have different attributes for each category of products, so most of the columns in the table are empty. The resulting table is extremely sparse and not easily humanly readable. 

For the second solution: 
It is not convenient to read with several lines for the same product. You have to store the id multiple times. And you need to make sure the table is sorted by id if you want to see all the attributes for one product as a group.

Also if we have a lot of nested attributes it can be cumbersome to put them in the table (there are way more extreme examples available). 


**3. Give an example of usecase where the CSV format would be more appropriate than the XML format.**

If all the rows have the same (fixed set) of attributes and no nesting it is more natural to describe the data as a table.