## Structured Data

### Structured data

CSV files can only model data where each record has several fields, and each field is a simple datatype,
a string or number.

We often want to store data which is more complicated than this, with nested structures of lists and dictionaries.
Structured data formats like Json, YAML, and XML are designed for this.

### Json

A very common structured data format is JSON.

This allows us to represent data which is combinations of lists and dictionaries as a text file which
looks a bit like a Javascript (or Python) data literal.

In [1]:
import json

Any nested group of dictionaries and lists can be saved:

In [2]:
mydata =  {'key': ['value1', 'value2'], 
           'key2': {'key4':'value3'}}

In [3]:
json.dumps(mydata)

'{"key": ["value1", "value2"], "key2": {"key4": "value3"}}'

Loading data is also really easy:

In [118]:
%%writefile myfile.json
{
    "somekey": ["a list", "with values"]
}

Overwriting myfile.json


In [123]:
with open('myfile.json', 'r') as f:
    mydataasstring = f.read()

In [124]:
mydataasstring

'{\n    "somekey": ["a list", "with values"]\n}\n'

In [125]:
mydata = json.loads(mydataasstring)
type(mydata)
mydata.keys()
mydata.values()
mydata.items()

dict_items([('somekey', ['a list', 'with values'])])

In [18]:
mydata['somekey']

['a list', 'with values']

This is a very nice solution for loading and saving python datastructures.

It's a very common way of transferring data on the internet, and of saving datasets to disk.

There's good support in most languages, so it's a nice inter-language file interchange format.

### Yaml

Yaml is a very similar dataformat to Json, with some nice additions:

* You don't need to quote strings if they don't have funny characters in
* You can have comment lines, beginning with a #
* You can write dictionaries without the curly brackets: it just notices the colons.
* You can write lists like this:

In [63]:
%%writefile myfile.yaml
somekey:
    - a list # Look, this is a list
    - with values
someotherkey:
    - 8
andyetanotherkey:
    - maybetryadict: 
        - this
        - that

Overwriting myfile.yaml


In [64]:
import yaml  # This may need installed as pyyaml

In [65]:
mydata = yaml.load(open('myfile.yaml')) # you load it from a file in memory!! equivalent of StringIO
print(mydata)

{'somekey': ['a list', 'with values'], 'someotherkey': [8], 'andyetanotherkey': [{'maybetryadict': ['this', 'that']}]}


  mydata = yaml.load(open('myfile.yaml')) # you load it from a file in memory!! equivalent of StringIO


In [66]:
# try loading the bruker settings.yml file 
import yaml
import os 
import io

with open('settings.yml','r') as loaded:
    ll = loaded.read()
    
#print(ll)
# this works but is suboptimal because now i need to parse manually --> better to parse using module

In [67]:
with open('settings.yml','r') as loaded:
    ll = yaml.load(loaded)

#print(ll)
ll['FOVsize_UM_1x']

# much preferred!!!, import as dictionary and can access using keys:
#ll.keys()
#ll.values()
#ll.items()

[print(v) for k,v in ll.items() if k == 'FOVsize_PX']

512


  ll = yaml.load(loaded)


[None]

Yaml is a popular format for ad-hoc datafiles, but the library doesn't ship with default Python, (though it is part
of Anaconda and Canopy) so some people still prefer Json for it's univerality.

Because Yaml gives the **option** of serialising a list either as newlines with dashes, *or* with square brackets,
you can control this choice:

In [26]:
yaml.safe_dump(mydata)

'somekey: [a list, with values]\n'

In [27]:
yaml.safe_dump(mydata, default_flow_style=True)

'{somekey: [a list, with values]}\n'

`default_flow_style=True` uses indicators (`-`, `:`) and `false` uses indentation to delineate data structure. [See the YAML docs for more details](http://yaml.org/spec/1.2/spec.html)

### XML

*Supplementary material*: [XML](http://www.w3schools.com/xml/) is another popular choice when saving nested data structures. 
It's very careful, but verbose. If your field uses XML data, you'll need to learn a [python XML parser](https://docs.python.org/2/library/xml.etree.elementtree.html),
(there are a few), and about how XML works.

### Exercise: Saving and loading data

Use YAML or JSON to save your maze datastructure to disk and load it again.

In [41]:
house = {
    'living' : {
        'exits': {
            'north' : 'kitchen',
            'outside' : 'garden',
            'upstairs' : 'bedroom'
        },
        'people' : ['James'],
        'capacity' : 2
    },
    'kitchen' : {
        'exits': {
            'south' : 'living'
        },
        'people' : [],
        'capacity' : 1
    },
    'garden' : {
        'exits': {
            'inside' : 'living'
        },
        'people' : ['Sue'],
        'capacity' : 3
    },
    'bedroom' : {
        'exits': {
            'downstairs' : 'living',
            'jump' : 'garden'
        },
        'people' : [],
        'capacity' : 1
    }
}

# SUMMARY

In [140]:
# summary: best solutions to write to json and yml files: 
# convert dict objects to a str with the correct formatting using json.dumps or yaml.dump
type(json.dumps(house))
type(yaml.dump(house))

str

In [141]:
# write json files 
import json
with open('house_write.json','w') as q:
    q.write(json.dumps(house))

In [142]:
# write yml files 
import yaml
with open('house_write.yml','w') as q:
    q.write(yaml.dump(house))

#and this is how you read them in again 

In [149]:
# json read in 
with open('house_write.json','r') as q:
    p=json.load(q)
out1=p

# alternatively
with open('house_write.json','r') as q:
    p=q.read()
out2 = json.loads(p)
 
print(out1)
print(out2)

# both solutions are identical. 1: directly loads StrinIO object, 2: leads as str, then converts

{'living': {'exits': {'north': 'kitchen', 'outside': 'garden', 'upstairs': 'bedroom'}, 'people': ['James'], 'capacity': 2}, 'kitchen': {'exits': {'south': 'living'}, 'people': [], 'capacity': 1}, 'garden': {'exits': {'inside': 'living'}, 'people': ['Sue'], 'capacity': 3}, 'bedroom': {'exits': {'downstairs': 'living', 'jump': 'garden'}, 'people': [], 'capacity': 1}}
{'living': {'exits': {'north': 'kitchen', 'outside': 'garden', 'upstairs': 'bedroom'}, 'people': ['James'], 'capacity': 2}, 'kitchen': {'exits': {'south': 'living'}, 'people': [], 'capacity': 1}, 'garden': {'exits': {'inside': 'living'}, 'people': ['Sue'], 'capacity': 3}, 'bedroom': {'exits': {'downstairs': 'living', 'jump': 'garden'}, 'people': [], 'capacity': 1}}


In [151]:
# read yml files 
with open('house_write.yml','r') as r:
    t = yaml.load(r)
print(t)

{'bedroom': {'capacity': 1, 'exits': {'downstairs': 'living', 'jump': 'garden'}, 'people': []}, 'garden': {'capacity': 3, 'exits': {'inside': 'living'}, 'people': ['Sue']}, 'kitchen': {'capacity': 1, 'exits': {'south': 'living'}, 'people': []}, 'living': {'capacity': 2, 'exits': {'north': 'kitchen', 'outside': 'garden', 'upstairs': 'bedroom'}, 'people': ['James']}}


  t = yaml.load(r)


In [45]:
# save using yaml: 

import yaml 

with open('savemyhouse.yml','w') as y:
    y.write(str(house))

In [48]:
# read again

yr = yaml.load(open('savemyhouse.yml','r'))

#print(yr)
yr['bedroom']

  yr = yaml.load(open('savemyhouse.yml','r'))


{'exits': {'downstairs': 'living', 'jump': 'garden'},
 'people': [],
 'capacity': 1}

In [82]:
#alternative reading strategy: 
with open('savemyhouse.yml','r') as q:
    w = yaml.load(q)
print(w)

{'living': {'exits': {'north': 'kitchen', 'outside': 'garden', 'upstairs': 'bedroom'}, 'people': ['James'], 'capacity': 2}, 'kitchen': {'exits': {'south': 'living'}, 'people': [], 'capacity': 1}, 'garden': {'exits': {'inside': 'living'}, 'people': ['Sue'], 'capacity': 3}, 'bedroom': {'exits': {'downstairs': 'living', 'jump': 'garden'}, 'people': [], 'capacity': 1}}


  w = yaml.load(q)


alternatively write as plain text:

In [96]:
%%writefile savemyhouse2.yml 

house:
    - 'living' : 
        - 'exits': 
            - 'north' : 'kitchen'
            - 'outside' : 'garden'
            - 'upstairs' : 'bedroom'
        - 'people' : ['James']
        - 'capacity' : 2
    - 'kitchen' :
        - 'exits':
            - 'south' : 'living'
        - 'people' : []
        - 'capacity' : 1
    - 'garden' : 
        - 'exits': 
            - 'inside' : 'living'
        - 'people' : ['Sue']
        - 'capacity' : 3
    - 'bedroom' : 
        - 'exits': 
            - 'downstairs' : 'living'
            - 'jump' : 'garden'
        - 'people' : []
        - 'capacity' : 1

Overwriting savemyhouse2.yml


In [97]:
yr = yaml.load(open('savemyhouse2.yml','r'))
print(yr['house'])

[{'living': [{'exits': [{'north': 'kitchen'}, {'outside': 'garden'}, {'upstairs': 'bedroom'}]}, {'people': ['James']}, {'capacity': 2}]}, {'kitchen': [{'exits': [{'south': 'living'}]}, {'people': []}, {'capacity': 1}]}, {'garden': [{'exits': [{'inside': 'living'}]}, {'people': ['Sue']}, {'capacity': 3}]}, {'bedroom': [{'exits': [{'downstairs': 'living'}, {'jump': 'garden'}]}, {'people': []}, {'capacity': 1}]}]


  yr = yaml.load(open('savemyhouse2.yml','r'))


**NOW: json**

<class 'str'>


In [None]:
# best solutions to write!!!
import json
with open('maze.json','w') as json_maze_out:
    json_maze_out.write(json.dumps(house))
    
    

In [99]:
json.dumps(house) # this just makes a json file in the workspace, not for saving

'{"living": {"exits": {"north": "kitchen", "outside": "garden", "upstairs": "bedroom"}, "people": ["James"], "capacity": 2}, "kitchen": {"exits": {"south": "living"}, "people": [], "capacity": 1}, "garden": {"exits": {"inside": "living"}, "people": ["Sue"], "capacity": 3}, "bedroom": {"exits": {"downstairs": "living", "jump": "garden"}, "people": [], "capacity": 1}}'

In [129]:
%%writefile myhouse.json
{"living" : {
        "exits": {
            "north" : "kitchen",
            "outside" : "garden",
            "upstairs" : "bedroom"
        },
        "people" : ["James"],
        "capacity" : 2
    },
    "kitchen" : {
        "exits": {
            "south" : "living"
        },
        "people" : [],
        "capacity" : 1
    },
    "garden" : {
        "exits": {
            "inside" : "living"
        },
        "people" : ["Sue"],
        "capacity" : 3
    },
    "bedroom" : {
        "exits": {
            "downstairs" : "living",
            "jump" : "garden"
        },
        "people" : [],
        "capacity" : 1
    }
}

Overwriting myhouse.json


In [120]:
%%writefile myfile.json
{
    "somekey": ["a list", "with values"]
}

Overwriting myfile.json


In [130]:
with open('myhouse.json', 'r') as f:
    mydataasstring = f.read()
mydata = json.loads(mydataasstring)

In [131]:
print(mydata)

{'living': {'exits': {'north': 'kitchen', 'outside': 'garden', 'upstairs': 'bedroom'}, 'people': ['James'], 'capacity': 2}, 'kitchen': {'exits': {'south': 'living'}, 'people': [], 'capacity': 1}, 'garden': {'exits': {'inside': 'living'}, 'people': ['Sue'], 'capacity': 3}, 'bedroom': {'exits': {'downstairs': 'living', 'jump': 'garden'}, 'people': [], 'capacity': 1}}


In [128]:
with open('myhouse.json','r') as hl:
    out1 = hl.read()
    
print(out1)
out2 = json.loads(out1)


{"living" : {
        "exits": {
            "north" : "kitchen",
            "outside" : "garden",
            "upstairs" : "bedroom"
        },
        "people" : ["James"],
        "capacity" : 2
    }
}

