# I/O with files 

Today we cover some practical aspects of python that make it a great language for scripting. While the same is possibile with bash and other shells, the OOP aspect of python and its high level semantics make it very easy for a beginner to write their first scripts.

- Input/Output with files
  - with statement
  - parsing lines
  - lines as lists
  - splitting lines into lists
  
- json format for storage
  - example of json usage
  - storing python objects and reading them back
  - storing custom python objects
   [custom serialize](https://realpython.com/python-json/)
  
- functions with variable number of arguments
  - example of printf


# File handling

Basic I/O with files is almost identical to C, at first.

- You need to open a file object on disk before writing information into it
- Opening a file can fail
  - location does not exist
  - no write privilege for the location

By default a file is opened in **read** mode

In [None]:
fname = '/tmp/data.txt'
f = open(fname)

You must specify the **write** mode to store data in a file in output

In [None]:
fname = '/tmp/data.txt'

f = open(fname,mode='w')

In this case the user does not have write permissions in `/`.

Possible modes are:
```
========= ===============================================================
Character Meaning
--------- ---------------------------------------------------------------
'r'       open for reading (default)
'w'       open for writing, truncating the file first
'x'       create a new file and open it for writing
'a'       open for writing, appending to the end of the file if it exists
'b'       binary mode
't'       text mode (default)
'+'       open a disk file for updating (reading and writing)
========= ===============================================================
```
So by default a text file is opened in read mode.

We now store some values in a file to simulate data.

It is important to close the file to make sure all data are flushed from memory to disk and the file handle closed properly

In [None]:
import os
import numpy as np

l0 =  set(os.listdir())

print(len(l0))


In [None]:
print("print one file at the time")
for i in l0:
    print(i)

Note how `listdir()` shows files and directories and gidden files starting with `.`

In [None]:
fname = 'data.txt'
f = open(fname,'w')

f.write('first file in python\n')
    
f.close()


By default `write()` does  not have a carriage return so you need to add `\n` to start a new line

check the new files

In [None]:
lnew = set(os.listdir())

new_items = lnew.difference(l0)
print(new_items)

use the magic jupyter command to look at the file

In [None]:
!cat data.txt

## getting rid of `close()`

To make it less C-like and feel more like python we can get rid of `close()` is usinng the `with` statement

In [None]:
fname = 'data2.txt'
with open(fname,'w') as ofile:
  ofile.write('a new file in python\n')


`with` makes sure that ofile is an open file handle in the `with` scope. Once it ends you can no more use the handle, because `close()` has been called autmatically

## Storing lists and multiple values

You can use the C-style output to format and store elements of a list

In [None]:
import random

nevents = 3

fname = 'data1.txt'
with open(fname,'w') as f:
    for i in range(nevents):
        measurements = [ random.random() for j in range(5) ]
        for val in measurements:
            f.write("%.3f\t"%val)
        f.write('\n')


In [None]:
!cat data1.txt

A more python-style is to use a new `writelines()` function and comprehensions

In [None]:
import random

nevents = 3

fname = 'data2.txt'
with open(fname,'w') as f:
    for i in range(nevents):
        measurements = [ random.random() for j in range(5) ]
        f.writelines("%.3f\t"%val for val in measurements)
        f.write('\n')

In [None]:
!cat data2.txt

which can be further reduced

In [None]:
import random

nevents = 10

fname = 'data2.txt'
with open(fname,'w') as f:
    for i in range(nevents):
        f.writelines("%.3f\t"%val for val in [ random.random() for j in range(3) ] )
        f.write('\n')


## Input from file

A file can be read in a single string and then split into lines and columns. 

In [None]:
fname = 'data2.txt'

f = open(fname)
file = f.read()
print(file)

In [None]:
vals = file.split()
print(type(vals))
print(len(vals))


In [None]:
print(vals[0])
print(vals[-1])

### Reading line by line
You could also read the file as a list of lines, each line marked by a newline `\n`

In [None]:
fname = 'data2.txt'
lines = [l for l in open(fname)]
print(type(lines))
print(len(lines))

In [None]:
print(lines[-1])

However you note that you have `\t` and `\n` as part of the string being read in. Fixing this is easy with the `strip()` function

In [None]:
fname = 'data2.txt'
lines = [l.strip() for l in open(fname)]
print(lines)

this has removed the `\n`. We now split each line using `\t` as the separator

In [None]:
fname = 'data2.txt'

lines = [l.strip() for l in open(fname)]

data = [ l.split('\t') for l in lines ]


In [None]:
print(data)
print(data[2:])


even more concisely

In [None]:
fname = 'data2.txt'
data = [ l.split('\t') for l in [line.strip() for line in open(fname)] ]
print(data)

### exercise
- change the separator and use `,` or `:` to store and read back data files

### Reading data into arrays

Often you will have a file with the following structure
```
#x  #y  #z  #t  #energy
.. ..  ..   ..  ....
.. ..  ..   ..  ....
.. ..  ..   ..  ....
.. ..  ..   ..  ....
```

with data stored per event (line) in columns (variables).

With numpy we can read the data directly into arrays per column without reading in all data and then manipulate it
- [Input/Output with numpy](https://numpy.org/doc/stable/reference/routines.io.html)

In particular [numpy.loadtxt](https://numpy.org/doc/stable/reference/generated/numpy.loadtxt.html#numpy.loadtxt) is very handy for quick reading of data from files


In [None]:
!cat data2.txt

In [None]:
fname = 'data2.txt'

data = np.loadtxt(fname)

print(data.shape)

In [None]:
print(data)

In [None]:
print(data[0])

In [None]:
print(data[:,0])

now use slicing to put in columns

In [None]:
x = data[:,0]
y = data[:,1]
z = data[:,2]

print(z)

since this is a typical usecase, `loadtxt` has options to do it for you

In [None]:
x,z = np.loadtxt(fname, unpack=True, usecols=(0,2))

print(z)

- `usecols=(a,b,c)` or `usecols=d` reads only the specific columns
- `unpack=True` transposes data to go in columns

In [None]:
y = np.loadtxt(fname, unpack=True, usecols=1)
print(y)

In [None]:
%matplotlib notebook
import matplotlib.pyplot as pl

pl.plot(x,y, '^', label="y")
pl.plot(x,z, '.', label="z")
pl.xlabel('x')
pl.grid()
pl.legend()
pl.show()



# Fitting data

A typical use case for data input is to fir the data to some model and estimatethe model parameter.
This can be done easily with the [scipy.optimize.curve_fit](https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html) function

In this example we

- generate some data according to a model
- add some random noise 
- write the data to file
- read back the data with numpy
- fit the data with `curve_fit`

### generate data

In [None]:
nevents = 100

x = np.linspace(10, 100, nevents)

def background(x, A=100, tau=10):
    return A*np.exp(-x/tau)

y = background(x)

import matplotlib.pyplot as pl

%matplotlib notebook
pl.plot(x,y, 'b-', label='background')
pl.xlabel('x')
pl.legend()

### add some random noise

In [None]:
pedestal = 3

noise = pedestal * np.random.normal(size=x.size)

y_noise = y+noise

%matplotlib notebook
pl.plot(x,y_noise, 'r-', label='background with noise')
pl.xlabel('x')
pl.legend()

### write data to file

this time we use [nump.savetxt](https://numpy.org/doc/stable/reference/generated/numpy.savetxt.html)

In [None]:
fname = 'fitdata.txt'

np.savetxt(fname, (x, y, y_noise), delimiter=" ", fmt="%.4f")

In [None]:
!cat fitdata.txt

otherwise you can loop and write to file as in the past

In [None]:
fname = 'fitdata2.txt'

with open(fname,'w') as f:
    for i in range(len(x)):
        f.writelines("%.3f %.3f %.3f"%(x[i],y[i],y_noise[i]) )
        f.write('\n')


In [None]:
!cat fitdata2.txt

### read data from file

In [None]:
t, z, w = np.loadtxt(fname, unpack=True)
print(t.shape)

In [None]:
%matplotlib notebook
pl.plot(t,z, 'r-', label='z')
pl.plot(t,w, 'b--', label='w')

pl.xlabel('time [s]')
pl.legend()

### Fit data

In [None]:
def fitfunc(x, N, alpha, c):
    return N*np.exp(-alpha*x) + c


In [None]:
from scipy.optimize import curve_fit

pars, pars_cov = curve_fit(fitfunc, t, w)

print(pars)
print(pars_cov)

In [None]:
%matplotlib notebook
pl.plot(t, w, 'b-', label='data points w')
pl.plot(t, fitfunc(t, *pars), 'm-', label='fit N:%.2f alpha: %.3f  c: %.3f'%tuple(pars))
pl.legend()
pl.xlabel('t')
pl.grid()

## Storing Lists, Dicts, and Tuples

As you have seen with the example above, there is no automatic writing of objects. So for a dictionary you need to take care of formatting the output file. 

In [None]:
import random

datum = {'val':-1.1, 'err':0.2}

fname = 'data4.txt'

with open(fname,'w') as f:
    f.writelines("%s\t"%v for v in datum.keys())
    f.write('\n')
    for i in range(10):
        datum['val'] = random.uniform(-3.,3.)
        datum['err'] = random.normalvariate(0., 0.2)
        f.writelines("%.3f\t"%val for val in datum.values() )
        f.write('\n')


### Exercise
- use a dictionary to store data for 3 keys of different type
- store 100 dictionary instances in file
- read back and populate dictionary objects from file

## Storing NumPy objects

NumPy provides built-in functions to easily store and read ndarrays in binary and text format  without iterating over each element

In [None]:
import numpy as np
import os

matrix = np.random.randn(100,10)

fname = 'npdata1'
np.save(fname+'.npy', matrix)
np.savetxt(fname+'.txt', matrix)


Reading the file is also simple with `load()`

In [None]:
vals = np.load(fname+'.npy')

print(vals.shape)
print(vals[:1,])

The [Python Data Analysis Library (pandas)](http://pandas.pydata.org) provides even more efficient tools and data formats to handle data for analysis and their storage to file.

## Data storage with pickle and JSON 

With NumPy we saw the first example of using the binary format to easy store an array.

Previously we had oly saved data in text files by iterating over elements of lists and dictionaries.

Python provides a built-in [pickle]() library for easy storage of lists and other built-in python objects in binary format. 

In [None]:
import random
import pickle
import os

datum = {'val':-1.1, 'err':0.2}

fname = 'pickle1.data'
with open(fname,'wb') as f:
    pickle.dump(datum,f)

os.listdir()

Readig back is also easy

In [None]:

fname = 'pickle1.data'
with open(fname,'rb') as f:
    indata = pickle.load(f)

print(indata)


## JSON 

However, a commonly used format for data storage that is cross platform and cross language is [JSON (JavaScript Object Notation](https://www.json.org).

The JSON librray in python allows you to convert python objects (including your custom classes) into JSON for storage.

Converting or enconding an object into JSON is commonly called **serialization**. Converting from JSON to python objects is referred to as **deserialization**. For  more details and introduction see this nice webpage on [working with JSON](https://realpython.com/python-json/). 

Here is an example of dictionary and list stored in JSON files.

There are two functions commonly used
- `dump()`: convert an object into JSON and possibly write to file
- `dumps()` note the extra **s**: converto to JSON string but cannot interact with file
The two functions are identical except for the file interaction.

In [None]:
import json
import os

datum = {'val':-1.1, 'err':0.2}

x = json.dumps(datum)
print(x)

data = [z for z in range(10)]
y = json.dumps(data)

print(y)

with open('data.json','w') as of:
    json.dump([datum, data], of)
 
os.listdir()

Now we read back or deserialize the data from file

In [None]:
with open('data.json') as infile:
    indata = json.load(infile)
print(indata)
datum = indata[0]
data = indata[1]
print(datum, data)

# Functions with arbitrary number of arguments

As you have seen, `print()` function can have a variable number of arguments. The same behaviour can easily be defined for any custom defined function for both positioal and keyword arguments

## positional arguments

Additional arguments are taken via the special `*arg` argument which is a tuple of additional positional arguments

In [None]:
def myfunc(a, *arg):
    print("positional arguments: %s %s"%(a,arg))
    if len(arg):
        for x in arg:
            print('[%s]\t'%x)
        print('\n')


In [None]:
myfunc(1.1)


In [None]:
myfunc('ciao')


In [None]:
myfunc(-0.2, 0.3, 'ciao')

In [None]:
myfunc(-0.2, 0.3, 'ciao', 'hello', -2, 100)

## keyword arguments

For optional keyword arguments the `**kargs` feature is used

In [None]:
def myf2(a,mu=0.0, sig=0.1, **karg):
    print("a: %s"%(a))
    print("keyword arguments: %s %s %s"%(mu,sig,karg))
    if len(karg):
        for x in karg:
            print('[%s]\t'%x)
            
        print('\n')


In [None]:
myf2(0.1)


In [None]:
myf2(0.3, sig=0.5)


In [None]:
myf2(0.3, color='red')


In [None]:
myf2(0.3, color='red', mu=0.6)

The additional keyword arguments are stored as a dictionary.

In [None]:
def myf3(a,mu=0.0, sig=0.1, **karg):
    print("a: %s"%(a))
    print("keyword arguments: %s %s %s"%(mu,sig,karg))
    if len(karg):
        for x in karg.keys():
            print('[%s = %s]\t'%(x, karg[x]))
        print('\n')
myf3(0.1)

In [None]:
myf3(0.3, color='red', mu=0.6)

You can also combine both positional and keyword arguments for the most generic function

In [None]:
def myf4(a,*arg, mu=0.0, sig=0.1, **karg):
    print("myf4 called")
    print("positional:  a: %s    optional: %s"%(a,arg))
    if len(arg):
        for x in arg:
            print('[%s]\t'%x)
        print('\n')
    print("keyword: %s %s %s"%(mu,sig,karg))    
    if len(karg):
        for x in karg.keys():
            print('[%s = %s]\t'%(x, karg[x]))
        print('\n')
    print('\n')
myf4(-0.1)


In [None]:
myf4(-0.1,10.1)


In [None]:
myf4(-0.1,mu=10.1)

In [None]:
myf4(0.3,'x','y', 0.9, color='red', mu=0.6, thick=1.1, fill='true')

## Command line arguments for python programs

The sys module gives easy access to command line arguments as a list. An example is in [app1.py](../examples/python/app1.py)

In [None]:
# %load examples/app1.py
import sys, os

print("Running "+__file__)

print("Running "+os.path.basename(__file__))


print("program called with %d arguments"%len(sys.argv))

for a in sys.argv:
    print(a)
