# I/O with files 

Today we cover some practical aspects of python that make it a great language for scripting. While the same is possibile with bash and other shells, the OOP aspect of python and its high level semantics make it very easy for a beginner to write their first scripts.

- Input/Output with files
  - with statement
  - parsing lines
  - lines as lists
  - splitting lines into lists
  
- json format for storage
  - example of json usage
  - storing python objects and reading them back
  - storing custom python objects
   [custom serialize](https://realpython.com/python-json/)
  
- functions with variable number of arguments
  - example of printf


# File handling

Basic I/O with files is almost identical to C, at first.

- You need to open a file object on disk before writing information into it
- Opening a file can fail
  - location does not exist
  - no write privilege for the location

By default a file is opened in **read** mode

In [12]:
fname = '/data.txt'
f = open(fname)

FileNotFoundError: [Errno 2] No such file or directory: '/data.txt'

You must specify the **write** mode to store data in a file in output

In [14]:
fname = '/data.txt'
open?
f = open(fname,mode='w')

PermissionError: [Errno 13] Permission denied: '/data.txt'

In this case the user does not have write permissions in `/`.

Possible modes are:
```
========= ===============================================================
Character Meaning
--------- ---------------------------------------------------------------
'r'       open for reading (default)
'w'       open for writing, truncating the file first
'x'       create a new file and open it for writing
'a'       open for writing, appending to the end of the file if it exists
'b'       binary mode
't'       text mode (default)
'+'       open a disk file for updating (reading and writing)
========= ===============================================================
```
So by default a text file is opened in read mode.

We now store some values in a file to simulate data.

It is important to close the file to make sure all data are flushed from memory to disk and the file handle closed properly

In [31]:
fname = 'data.txt'
f = open(fname,'w')

f.write('first file in python\n')
    
f.close()

os.listdir()

['lec28.ipynb', 'examples', '.ipynb_checkpoints', 'data.txt']

## getting rid of `close()`

To make it less C-like and feel more like python we can get rid of `close()` is usinng the `with` statement

In [33]:
fname = 'data.txt'
with open(fname,'w') as ofile:
  ofile.write('first file in python\n')

os.listdir()

['lec28.ipynb', 'examples', '.ipynb_checkpoints', 'data.txt']

`with` makes sure that ofile is an open file handle in the `with` scope. Once it ends you can no more use the handle, because `close()` has been called autmatically

## Storing lists and multiple values

You can use the C-style output to format and store elements of a list

In [47]:
import random

nevents = 3

fname = 'data1.txt'
with open(fname,'w') as f:
    for i in range(nevents):
        measurements = [ random.random() for j in range(10) ]
        for val in measurements:
            f.write("%.3f\t"%val)
        f.write('\n')


A more python-style is to use a new `writelines()` function and comprehensions

In [48]:
import random

nevents = 3

fname = 'data2.txt'
with open(fname,'w') as f:
    for i in range(nevents):
        measurements = [ random.random() for j in range(10) ]
        f.writelines("%.3f\t"%val for val in measurements)
        f.write('\n')

which can be further reduced

In [49]:
import random

nevents = 3

fname = 'data2.txt'
with open(fname,'w') as f:
    for i in range(nevents):
        f.writelines("%.3f\t"%val for val in [ random.random() for j in range(10) ] )
        f.write('\n')


## Input from file

A file can be read in a single string and then split into lines and columns. 

In [50]:
fname = 'data2.txt'

f = open(fname)
file = f.read()
print(file)

vals = file.split()
print(vals)

0.520	0.666	0.448	0.425	0.540	0.792	0.076	0.134	0.522	0.663	
0.055	0.315	0.623	0.555	0.567	0.465	0.651	0.105	0.017	0.149	
0.936	0.860	0.257	0.487	0.362	0.913	0.381	0.399	0.349	0.268	

['0.520', '0.666', '0.448', '0.425', '0.540', '0.792', '0.076', '0.134', '0.522', '0.663', '0.055', '0.315', '0.623', '0.555', '0.567', '0.465', '0.651', '0.105', '0.017', '0.149', '0.936', '0.860', '0.257', '0.487', '0.362', '0.913', '0.381', '0.399', '0.349', '0.268']


You could also read the file as a list of iines, each line marked by a newline `\n`

In [51]:
fname = 'data2.txt'
lines = [l for l in open(fname)]
print(lines)

['0.520\t0.666\t0.448\t0.425\t0.540\t0.792\t0.076\t0.134\t0.522\t0.663\t\n', '0.055\t0.315\t0.623\t0.555\t0.567\t0.465\t0.651\t0.105\t0.017\t0.149\t\n', '0.936\t0.860\t0.257\t0.487\t0.362\t0.913\t0.381\t0.399\t0.349\t0.268\t\n']


However you note that you have `\t` and `\n` as part of the strigf being read in! Fixing this is easy

In [52]:
fname = 'data2.txt'
lines = [l.strip() for l in open(fname)]
print(lines)

['0.520\t0.666\t0.448\t0.425\t0.540\t0.792\t0.076\t0.134\t0.522\t0.663', '0.055\t0.315\t0.623\t0.555\t0.567\t0.465\t0.651\t0.105\t0.017\t0.149', '0.936\t0.860\t0.257\t0.487\t0.362\t0.913\t0.381\t0.399\t0.349\t0.268']


this has removed the `\n`. We now split each line using `\t` as the separator

In [55]:
fname = 'data2.txt'
lines = [l.strip() for l in open(fname)]
data = [ l.split('\t') for l in lines ]
print(data)
print(data[2:])

[['0.520', '0.666', '0.448', '0.425', '0.540', '0.792', '0.076', '0.134', '0.522', '0.663'], ['0.055', '0.315', '0.623', '0.555', '0.567', '0.465', '0.651', '0.105', '0.017', '0.149'], ['0.936', '0.860', '0.257', '0.487', '0.362', '0.913', '0.381', '0.399', '0.349', '0.268']]
[['0.936', '0.860', '0.257', '0.487', '0.362', '0.913', '0.381', '0.399', '0.349', '0.268']]


even more concisely

In [56]:
fname = 'data2.txt'
data = [ l.split('\t') for l in [line.strip() for line in open(fname)] ]
print(data)

[['0.520', '0.666', '0.448', '0.425', '0.540', '0.792', '0.076', '0.134', '0.522', '0.663'], ['0.055', '0.315', '0.623', '0.555', '0.567', '0.465', '0.651', '0.105', '0.017', '0.149'], ['0.936', '0.860', '0.257', '0.487', '0.362', '0.913', '0.381', '0.399', '0.349', '0.268']]


### exercise
- change the separator and use `,` or `:` to store and read back data files

## Storing Lists, Dicts, and Tuples

As you have seen with the example above, there is no automatic writing of objects. So for a dictionary you need to take care of formatting the output file. 

In [61]:
import random

datum = {'val':-1.1, 'err':0.2}

fname = 'data4.txt'

with open(fname,'w') as f:
    f.writelines("%s\t"%v for v in datum.keys())
    f.write('\n')
    for i in range(10):
        datum['val'] = random.uniform(-3.,3.)
        datum['err'] = random.normalvariate(0., 0.2)
        f.writelines("%.3f\t"%val for val in datum.values() )
        f.write('\n')


### Exercise
- use a dictionary to store data for 3 keys of different type
- store 100 dictionary instances in file
- read back and populate dictionary objects from file

## Storing NumPy objects

NumPy provides built-in functions to easily store and read ndarrays in binary and text format  without iterating over each element

In [74]:
import numpy as np
import os

matrix = np.random.randn(100,10)

fname = 'npdata1'
np.save(fname+'.npy', matrix)
np.savetxt(fname+'.txt', matrix)


Reading the file is also simple with `load()`

In [75]:
vals = np.load(fname+'.npy')

print(vals.shape)
print(vals[:1,])

(100, 10)
[[ 1.49193655  0.86816299 -1.01754419  0.53341083 -1.67983499  2.1464185
   1.3642971   0.24185052 -0.10550519 -0.86854152]]


The [Python Data Analysis Library (pandas)](http://pandas.pydata.org) provides even more efficient tools and data formats to handle data for analysis and their storage to file.

## Data storage with pickle and JSON 

With NumPy we saw the first example of using the binary format to easy store an array.

Previously we had oly saved data in text files by iterating over elements of lists and dictionaries.

Python provides a built-in [pickle]() library for easy storage of lists and other built-in python objects in binary format. 

In [105]:
import random
import pickle
import os

datum = {'val':-1.1, 'err':0.2}

fname = 'pickle1.data'
with open(fname,'wb') as f:
    pickle.dump(datum,f)

os.listdir()

['npdata1.npy',
 'pickle1.data',
 'lec28.ipynb',
 'examples',
 '.ipynb_checkpoints',
 'data1.txt',
 'data2.txt',
 'npdata1.txt',
 'data.txt',
 'data4.txt']

Readig back is also easy

In [106]:

fname = 'pickle1.data'
with open(fname,'rb') as f:
    indata = pickle.load(f)

print(indata)


{'val': -1.1, 'err': 0.2}


## JSON 

However, a commonly used format for data storage that is cross platform and cross language is [JSON (JavaScript Object Notation](https://www.json.org).

The JSON librray in python allows you to convert python objects (including your custom classes) into JSON for storage.

Converting or enconding an object into JSON is commonly called **serialization**. Converting from JSON to python objects is referred to as **deserialization**. For  more details and introduction see this nice webpage on [working with JSON](https://realpython.com/python-json/). 

Here is an example of dictionary and list stored in JSON files.

There are two functions commonly used
- `dump()`: convert an object into JSON and possibly write to file
- `dumps()` note the extra **s**: converto to JSON string but cannot interact with file
The two functions are identical except for the file interaction.

In [120]:
import json
import os

datum = {'val':-1.1, 'err':0.2}

x = json.dumps(datum)
print(x)

data = [z for z in range(10)]
y = json.dumps(data)

print(y)

with open('data.json','w') as of:
    json.dump([datum, data], of)
 
os.listdir()

{"val": -1.1, "err": 0.2}
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]


['npdata1.npy',
 'pickle1.data',
 'lec28.ipynb',
 'data.json',
 'examples',
 '.ipynb_checkpoints',
 'data1.txt',
 'data2.txt',
 'npdata1.txt',
 'data.txt',
 'data4.txt']

Now we read back or deserialize the data from file

In [122]:
with open('data.json') as infile:
    indata = json.load(infile)
print(indata)
datum = indata[0]
data = indata[1]
print(datum, data)

[{'val': -1.1, 'err': 0.2}, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]]
{'val': -1.1, 'err': 0.2} [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]


# Functions with arbitrary number of arguments

As you have seen, `print()` function can have a variable number of arguments. The same behaviour can easily be defined for any custom defined function for both positioal and keyword arguments

## positional arguments

Additional arguments are taken via the special `*arg` argument which is a tuple of additional positional arguments

In [146]:
def myfunc(a, *arg):
    print("positional arguments: %s %s"%(a,arg))
    if len(arg):
        for x in arg:
            print('[%s]\t'%x)
        print('\n')

myfunc(1.1)
myfunc('ciao')
myfunc(-0.2, 0.3, 'ciao')
myfunc(-0.2, 0.3, 'ciao', 'hello', -2, 100)

positional arguments: 1.1 ()
positional arguments: ciao ()
positional arguments: -0.2 (0.3, 'ciao')
[0.3]	
[ciao]	


positional arguments: -0.2 (0.3, 'ciao', 'hello', -2, 100)
[0.3]	
[ciao]	
[hello]	
[-2]	
[100]	




## keyword arguments

For optional keyword arguments the `**kargs` feature is used

In [147]:
def myf2(a,mu=0.0, sig=0.1, **karg):
    print("a: %s"%(a))
    print("keyword arguments: %s %s %s"%(mu,sig,karg))
    if len(karg):
        for x in karg:
            print('[%s]\t'%x)
            
        print('\n')
myf2(0.1)
myf2(0.3, sig=0.5)
myf2(0.3, color='red')
myf2(0.3, color='red', mu=0.6)

a: 0.1
keyword arguments: 0.0 0.1 {}
a: 0.3
keyword arguments: 0.0 0.5 {}
a: 0.3
keyword arguments: 0.0 0.1 {'color': 'red'}
[color]	


a: 0.3
keyword arguments: 0.6 0.1 {'color': 'red'}
[color]	




The additional keyword arguments are stored as a dictionary.

In [148]:
def myf3(a,mu=0.0, sig=0.1, **karg):
    print("a: %s"%(a))
    print("keyword arguments: %s %s %s"%(mu,sig,karg))
    if len(karg):
        for x in karg.keys():
            print('[%s = %s]\t'%(x, karg[x]))
        print('\n')
myf3(0.1)
myf3(0.3, color='red', mu=0.6)

a: 0.1
keyword arguments: 0.0 0.1 {}
a: 0.3
keyword arguments: 0.6 0.1 {'color': 'red'}
[color = red]	




You can also combine both positional and keyword arguments for the most generic function

In [164]:
def myf4(a,*arg, mu=0.0, sig=0.1, **karg):
    print("function called")
    print("positional a: %s %s"%(a,arg))
    if len(arg):
        for x in arg:
            print('[%s]\t'%x)
        print('\n')
    print("keyword: %s %s %s"%(mu,sig,karg))    
    if len(karg):
        for x in karg.keys():
            print('[%s = %s]\t'%(x, karg[x]))
        print('\n')
    print('\n')
myf4(-0.1)
myf4(-0.1,10.1)
myf4(-0.1,mu=10.1)

function called
positional a: -0.1 ()
keyword: 0.0 0.1 {}


function called
positional a: -0.1 (10.1,)
[10.1]	


keyword: 0.0 0.1 {}


function called
positional a: -0.1 ()
keyword: 10.1 0.1 {}




In [165]:
myf4(0.3,'x','y', 0.9, color='red', mu=0.6, thick=1.1, fill='true')

function called
positional a: 0.3 ('x', 'y', 0.9)
[x]	
[y]	
[0.9]	


keyword: 0.6 0.1 {'color': 'red', 'thick': 1.1, 'fill': 'true'}
[color = red]	
[thick = 1.1]	
[fill = true]	






## Command line arguments for python programs

The sys module gives easy access to command line arguments as a list. An example is in [app1.py](examples/app1.py)

In [None]:
# %load examples/app1.py
import sys, os

print("Running "+__file__)

print("Running "+os.path.basename(__file__))


print("program called with %d arguments"%len(sys.argv))

for a in sys.argv:
    print(a)
