# PPY lecture #11, May 2 2023

* Exporting data from Python I — XML and JSON
* Zápočty
* Q & A...

# XML

From [Wikipedia](https://en.wikipedia.org/wiki/XML):

> Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data

In Python, there is a built-in module `xml` which can export data in XML format. This module provides various functions and classes to parse, create and manipulate XML data.

1. Creating XML data: To create XML data, you can use the "ElementTree" module. Here's an example:

In [1]:
import xml.etree.ElementTree as ET

In [2]:
root = ET.Element('root')

child1 = ET.SubElement(root, 'child1')
child1.text = 'This is child 1'

child2 = ET.SubElement(root, 'child2', attrib={'name': 'test'})
child2.text = 'This is child 2'

tree = ET.ElementTree(root)
tree.write('output.xml')

2. Modifying XML data: To modify XML data, you can use the various methods provided by the "ElementTree" module, eg. [iter()](https://docs.python.org/3/library/xml.etree.elementtree.html#xml.etree.ElementTree.Element.iter). Here's an example:

In [3]:
!cp output.xml input.xml

In [4]:
tree = ET.parse('input.xml')
root = tree.getroot()

# Modify an attribute of an element
for elem in root.iter('child1'):
    elem.set('new_attribute', 'new_value')

tree.write('output.xml')

3. Parsing XML data: To parse XML data, you can use the "ElementTree" module which is a part of the "xml" module. This module provides an easy-to-use API for parsing XML data. Here's an example:

In [5]:
tree = ET.parse('output.xml')
root = tree.getroot()

# Iterate over child elements of root
for child in root:
    print(child.tag, child.attrib)

child1 {'new_attribute': 'new_value'}
child2 {'name': 'test'}


Alternatively, you can use the `pandas` package...

In [6]:
import pandas as pd
import numpy as np

# create a sample dataframe
df = pd.DataFrame({'number_column': [1.0, 2.0, np.nan], 'text_column': ['foo', 'bar', 'baz']})

# write it as xml
df.to_xml('table.xml', index=False)

In [7]:
!cat table.xml

<?xml version='1.0' encoding='utf-8'?>
<data>
  <row>
    <number_column>1.0</number_column>
    <text_column>foo</text_column>
  </row>
  <row>
    <number_column>2.0</number_column>
    <text_column>bar</text_column>
  </row>
  <row>
    <number_column/>
    <text_column>baz</text_column>
  </row>
</data>


In [8]:
df = pd.read_xml('table.xml')

In [9]:
df

Unnamed: 0,number_column,text_column
0,1.0,foo
1,2.0,bar
2,,baz


In [10]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 2 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   number_column  2 non-null      float64
 1   text_column    3 non-null      object 
dtypes: float64(1), object(1)
memory usage: 176.0+ bytes


# JSON

From [Wikipedia](https://en.wikipedia.org/wiki/JSON):
    
> JSON (JavaScript Object Notation) is an open standard file format and data interchange format that uses human-readable text to store and transmit data objects consisting of attribute–value pairs and arrays (or other serializable values)

In [11]:
df.to_json('table.json')

In [12]:
!cat table.json

{"number_column":{"0":1.0,"1":2.0,"2":null},"text_column":{"0":"foo","1":"bar","2":"baz"}}

Or, even better:

In [13]:
df.to_json('table.jsonl', orient='records', lines=True)

In [14]:
!cat table.jsonl

{"number_column":1.0,"text_column":"foo"}
{"number_column":2.0,"text_column":"bar"}
{"number_column":null,"text_column":"baz"}


And, using the native Python alternative... Python has built-in support for working with JSON data through the `json` module. The json module provides methods to encode Python objects as JSON strings and to decode JSON strings back into Python objects.

Here's an example of how to encode a Python object as a JSON string using the json module:

In [15]:
import json

data = {'name': 'John', 'age': 30, 'city': 'New York'}
json_str = json.dumps(data)
print(json_str)

{"name": "John", "age": 30, "city": "New York"}


And here's an example of how to decode a JSON string back into a Python object:

In [16]:
json_str = '{"name": "John", "age": 30, "city": "New York"}'
data = json.loads(json_str)
print(data)
print(type(data))

{'name': 'John', 'age': 30, 'city': 'New York'}
<class 'dict'>


You can also read/write files directly:

```
with open('data.json') as f:
    data = json.load(f)
    print(data)
``` 
and

```
with open('data.json', 'w') as f:
    json.dump(data, f)
```