# A survey of I/O Formats

By the end of this lesson, you should be able to list at least three different ways that data can be stored and to use the corresponding tools needed to load and save data to and from those formats.

## Required Preparation

- Read [File handling](https://www.pythonforbeginners.com/cheatsheet/python-file-handling)
- Read the documentation about NumPy's [loadtxt](https://docs.scipy.org/doc/numpy/reference/generated/numpy.loadtxt.html) and [savetxt](https://docs.scipy.org/doc/numpy/reference/generated/numpy.savetxt.html) functions.  Also peruse this neat, advanced [example](https://scipython.com/book/chapter-6-numpy/examples/using-numpys-loadtxt-method/) using the same.
- Read the [JSON module](https://docs.python.org/3/library/json.html) documentation.
- Read the  [h5py Quick Start](http://docs.h5py.org/en/stable/quick.html) and execute `conda install h5py`.

## Basic File I/O via File Handles

```python
f = open(filename, 'r')
s = f.read() # reads whole file as one string
# lines = f.readlines() # produces a list of strings for each line
f.close()

f.open(filename, 'w')
f.write(s) # write a string to file
f.close()
```

## Processing Data Requires `str` Manipulation

```python
token in line  # e.g., "hello" in "i always say hello, world!"
line.find(token) # e.g., line.find(h
line.split()
line.split(',')
line.replace(',', '')
```

## Exercise 1

Run the following code

In [None]:
s = """
  inp.put_int("number_groups",                      2)
  inp.put_int("dimension",                          2)
  inp.put_str("equation",                           "diffusion")
  inp.put_str("bc_west",                            "reflect")
  inp.put_str("bc_east",                            "vacuum")
  inp.put_str("bc_south",                           "reflect")
  inp.put_str("bc_north",                           "vacuum")
"""
f = open('ex1.txt', 'w')
f.write(s)
f.close()

Then, write a short program to read this file and produce a dictionary of the form `{'number_groups": 1, 'bc_south', 'reflect'}`, etc.  Remember, given a dictionary `d = {}`, add a key-value pair by doing, e.g., `d['dimension'] = 2`.

# I/O Using NumPy

If you have data that looks like 

```
 T [K],     P [MPa],  rho [kg/m^3],
 293.150,   2.000,     999.073,
 293.150,   4.000,     999.982,
 293.150,   6.000,    1000.888,
 293.150,   8.000,    1001.791,
 313.150,   2.000,     993.053,
 313.150,   4.000,     993.924,
```

then `np.loadtxt` is your friend.

Execute the following:

In [None]:
f = open('ex2.txt', 'w')
f.write("""T [K],   P [MPa],  rho [kg/m^3],
 293.150,   2.000,    999.073,
 293.150,   4.000,    999.982,
 293.150,   6.000,   1000.888,
 293.150,   8.000,   1001.791,
 313.150,   2.000,    993.053,
 313.150,   4.000,    993.924,""")
f.close()

Now, use `np.loadtxt` to produce two arrays `T` and `rho`.  Can you do it in just one line? (Hint: look through the optional arguments!)

## Human Readable but Flexible: `json`

Lots of human-readable formats out there: XML (sorta ugly), JSON, YAML.  JSON is great since it's like a Python dictionary.

In [None]:
import json

d = {'dog': "Fido", 
     'x': [1.0, 2.0, 3.0],
     'pi': 3.14}

f = open('test.json', 'w')
json.dump(d, f)
f.close()

!more test.json

## Exercise 3

Look up `json.load` and produce a new dictionary object `D` that contains the stuff stoed in `test.json`.

# Easy Peasy With Pickle

```python
import pickle
pickle.dump(d, open('test.p', 'w'))  # try 'wb'
D = pickle.load(open('test.p', 'r'))  # try 'rb'
```

## Exercise 4

Use the above code to write out the pickle file.  Look at it (use `more`).  Then read it.  Repeat using `wb` and `rb`.  Any differences?

# HDF5 for Large-Scale Data

In [None]:
import h5py
import numpy as np

name =  'mytestfile.hdf5'
f = h5py.File('mytestfile.hdf5', 'w')

# some "attributes" for the "root" group
f.attrs['file_name'] = 'mytestfile.hdf5'
f.attrs['author'] = 'roberts'

# make a new group and give it an attribute
day1 = f.create_group('day1')
day1.attrs['note'] = 'Experiments from day 1'
# add some datasets
day1.create_dataset("array", data=np.random.rand(10))

f.close()

In [None]:
!h5dump mytestfile.hdf5