# Dictionaries

A dictionaries is a data structure for storing _pairs_ of data. Each pair consist of a _key_ and a _value_. Dictionaries have the type `dict` and are surrounded by curly brackets, `{...}`.

This way of storing things can be used in several useful ways. Typical think:

* Lookup - Keyword = data
* structured storage, think big data 
* unknown number of entries, as these can be added with names at any times
* entry to linking axis and data for e.g. plots,...

## Creation and Common Operations

In [15]:
# a few imports to be used later
import matplotlib
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline  

In [16]:
d = {} # an empty dictionary
d['Na'] = 'sodium' # 'Na' is the key, 'sodium' is the value
d['K'] = 'potassium'
d

{'Na': 'sodium', 'K': 'potassium'}

With that in hand I can access data with the key so d['K'] is the same as the entry

In [17]:
d['K']

'potassium'

For accessing all elements I can then loop over all keys and maybe use selector like in the following example to access special keys/names:

In [18]:
for key in d:
    print(key)
    if key == 'Na':
        print('Do not drop into water because we found %s'%d[key])
    else:
        print('We found',d[key])

Na
Do not drop into water because we found sodium
K
We found potassium


This is already quite usefull for "looking up" values. alternatively you can use this to store data/information and can add keys later. 

A clever way is to actually store multiple things under one central name. With other words to use e.g. a list or a dictionary in a dictionary.

In [19]:
data={}
data['Na']={}                # The entry in the dictionary is actually another dictionary
data['Na']['name']='sodium'  # Now we add our actual data into this second dictionary, each entry can have its own format
data['Na']['Z']=11
data['Na']['mass']=23

data['Cl']={}                
data['Cl']['name']='chloride'  
data['Cl']['Z']=17
data['Cl']['mass']=35.45

data

{'Na': {'name': 'sodium', 'Z': 11, 'mass': 23},
 'Cl': {'name': 'chloride', 'Z': 17, 'mass': 35.45}}

So now I can do calculations with that and one can easily see how we could program useful functions with that

In [20]:
total_mass=0
molecule=[('Na',1),('Cl',1)]

for compound in molecule:
    total_mass+=data[compound[0]]['mass']*compound[1]
total_mass

58.45

## Task
add hydrogen and oxygen to the dictionary and calculate the atomic mass of water.<br>
Hint, you don't need to add all the entries for the new elements, adding the mass for each element is enough.

## Filehandling with dictionaries
Finally dictionaries are also excellent for handling many files, as it permits to work with an undefined number of entries.

In the following code example we use **listdir** from the **os** package to read all filenames in that are in a folder. We then try to read each file in turn and if successful add it to our dictionary.<br>
In this example we are using one more useful trick to handle convoluted paths.<br>
Test what "os.sep" and "os.getcwd" returns and what 'os.sep.join?' returns <br>
We will discuss opening and reading files in the next session, for now simply look on all the keys in data and think about what the for loop does. 

In [9]:
import os            # a package to handle files, paths and more
path_to_files=os.sep.join((os.getcwd(),'Data','subset'))
filelist=os.listdir(path_to_files)
data_in_folder={}
for filename in filelist:
    temp_path_to_file=os.sep.join((path_to_files,filename))
    with open(temp_path_to_file,'r') as f: # open a file and once done with the block, close the file automatically
        data_in_folder[filename]=f.readlines()       # read all lines in the file and 

## Advanced

Now this was the last of the "standard" python filetypes we needed to look at. The next sessions will focus on data handling, plotting and fitting. However you have gotten a little feeling for what python is and how to do things here. So this is the perfect time for some python nerdism. The following import returns the "Zen of Python" or the philosophy behind the programming language. Admittedly this is nerdy, it is also a good advice for better coding. In time most of them will become clear.

In [10]:
import this

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!


Lets pick one of them and discuss it.

When reading files like we have done in the previous code example things can go wrong. In the current code a single unreadable file in this list would crash the code, resulting in an error message. To have a more complicated program not crash one would place the reading section of this code into a 

**try: except:** bracket as follows: 

In [11]:
import os            # a package to handle files, paths and more
path_to_files=os.sep.join((os.getcwd(),'Data','subset'))
filelist=os.listdir(path_to_files)
data2={}
for filename in filelist:
    temp_path_to_file=os.sep.join((path_to_files,filename))
    try:
        with open(temp_path_to_file,'r') as f: # open a file and once done with the block, close the file automatically
            data2[filename]=f.readlines()       # read all lines in the file and 
    except:
        continue

if something during the reading in line 8 or 9 goes wrong the code jumps to line 10 and here continues silently.

this is nice, as it does not crash, but bad because you do not know that something went wrong. After the **Zen** the following version would be better, that actually reports the error, but does not crash the code.

In [12]:
import os            # a package to handle files, paths and more
path_to_files=os.sep.join((os.getcwd(),'Data','subset'))
filelist=os.listdir(path_to_files)
data3={}
for filename in filelist:
    temp_path_to_file=os.sep.join((path_to_files,filename))
    try:
        with open(temp_path_to_file,'r') as f: # open a file and once done with the block, close the file automatically
            data3[filename]=f.readlines()       # read all lines in the file and 
    except Exception as e:
        print('There was a reading error for file',filename,'it produced the following error:')
        print(e)

## Structured File Format
This structured file format is also the basis of the **HDF5** fileformat that has become standard in many large scale research facilities. HDF5 is pretty much a series of staggered dictionaries, pretty much like our dictionary "data" from above

In [21]:
import h5py
import os
if not os.path.exists('Data'):
    os.mkdir('Data')
filename=os.sep.join(['Data','my-pse.hdf5'])
with h5py.File(filename, 'w') as f:
    for key in data.keys():
        print(key)
        try:
            f.create_dataset(name=key, data=data[key])
            print('written %s to file'%key)
        except Exception as e1:
            try:
                grp = f.create_group(key)
                for key2 in data[key].keys():
                    print(key + '-' + key2)
                    grp.create_dataset(name=key2, data=data[key][key2])
                    print('written [%s]-[%s] to file'%(key,key2))    
            except Exception as e2:
                print('while writing [%s]-[%s] the following error occured'%(key,key2))
                print(e2)
            

Na
Na-name
written [Na]-[name] to file
Na-Z
written [Na]-[Z] to file
Na-mass
written [Na]-[mass] to file
Cl
Cl-name
written [Cl]-[name] to file
Cl-Z
written [Cl]-[Z] to file
Cl-mass
written [Cl]-[mass] to file


In [1]:
import os

In [None]:
os.mkdir