# From HPL, Ch. 4: File input and Output (File IO)

## Reading data from files

Scientific data are often available in files. We want to read the data
into objects in a program to compute with the data.



**Example on a data file.**

        21.8
        18.1
        19
        23
        26
        17.8


One number on each line. How can we read these numbers?



<!-- Have to make a tmp.txt file with the right content -->

In [14]:
f = open('data.txt', 'w')
f.write("""21.8
18.1
19
23
26
17.8
""")
f.close()

## Reading a file line by line

Basic file reading:

```Python
        infile = open('data.txt', 'r')    # open file
        for line in infile:
            # do something with line
        infile.close()                    # close file
```

Compute the mean values of the numbers in the file:

In [15]:
infile = open('data.txt', 'r')    # open file
mean = 0
n = 0
for line in infile:
    n += 1
    number = float(line)          # line is string
    mean = mean + number
    print('number={:g}'.format(number))
    
mean = mean/n
print("Mean: %.3f" % mean)

number=21.8
number=18.1
number=19
number=23
number=26
number=17.8
Mean: 20.950


## Alternative ways to read a file

Read all lines at once into a list of strings (lines):

```Python
        lines = infile.readlines()
        for line in lines:
            # process line
        infile.close()
```

The modern `with` statement:

```Python
        with open('data.txt', 'r') as infile:
            for line in infile:
                # process line
```

The old-fashioned `while` construction:

```Python
        while True:
            line = infile.readline()
            if not line:
                break
            # process line
        infile.close()
```

Reading the whole file into a string:

```Python
        text = infile.read()
        # process the string text
```

## Demo of file reading

**File:**

Line 1.
Line 2.
Line 3.
Line 4.



<!-- Have to make a data.txt file with the right content -->

In [16]:
f = open('tmp.txt', 'w')
f.write("""Line 1.
Line 2.
Line 3.
Line 4.
""")
f.close()

**Session:**

In [17]:
infile = open('tmp.txt', 'r')
lines = infile.readlines()  # read all lines
lines

['Line 1.\n', 'Line 2.\n', 'Line 3.\n', 'Line 4.\n']

In [18]:
infile.readline()  # no more to read

''

In [19]:
infile = open('tmp.txt', 'r')
infile.readline()    # read one line

'Line 1.\n'

In [20]:
infile.readline()    # read next line

'Line 2.\n'

In [21]:
for line in infile:  # read the next lines to the end
    print(line)

Line 3.

Line 4.



## More demo of file reading and string splitting

In [22]:
infile = open('tmp.txt', 'r')
filestr = infile.read()
filestr

'Line 1.\nLine 2.\nLine 3.\nLine 4.\n'

In [23]:
filestr.split()  # split out all words

['Line', '1.', 'Line', '2.', 'Line', '3.', 'Line', '4.']

In [24]:
line = 'Line 3.\n'
line.split()

['Line', '3.']

In [25]:
line.split('e')

['Lin', ' 3.\n']

## Most data files contain text mixed with numbers

**File with data about rainfall:**

        Average rainfall (in mm) in Rome: 1188 months between 1782 and 1970
        Jan  81.2
        Feb  63.2
        Mar  70.3
        Apr  55.7
        May  53.0
        Jun  36.4
        Jul  17.5
        Aug  27.5
        Sep  60.9
        Oct  117.7
        Nov  111.0
        Dec  97.9
        Year 792.9


How do we read such a file?



<!-- Have to make a rainfall.txt file with the right content -->

In [30]:
f = open('rainfall.dat', 'w')
f.write("""Average rainfall (in mm) in Rome: 1188 months between 1782 and 1970
Jan  81.2
Feb  63.2
Mar  70.3
Apr  55.7
May  53.0
Jun  36.4
Jul  17.5
Aug  27.5
Sep  60.9
Oct  117.7
Nov  111.0
Dec  97.9
Year 792.9
""")
f.close()

## Reading a mixture of text and numbers

The key idea to process each line is to split the line into
words:

```Python
        months = []
        values = []
        for line in infile:
            words = line.split()  # split into words
            if words[0] != 'Year':
                months.append(words[0])
                values.append(float(words[1]))
```

Can split with respect to any string `s`: `line.split(s)`

In [31]:
line = 'Oct  117.7'
words = line.split()
words

['Oct', '117.7']

In [32]:
type(words[1])   # string, not a number!

str

## Complete program for reading rainfall data

In [33]:
def extract_data(filename):
    infile = open(filename, 'r')
    infile.readline() # skip the first line
    months = []
    rainfall = []
    for line in infile:
        words = line.split()
        # words[0]: month, words[1]: rainfall
        months.append(words[0])
        rainfall.append(float(words[1]))
    infile.close()
    months = months[:-1]      # Drop the "Year" entry
    annual_avg = rainfall[-1] # Store the annual average
    rainfall = rainfall[:-1]  # Redefine to contain monthly data
    return months, rainfall, annual_avg

months, values, avg = extract_data('rainfall.dat')
print('The average rainfall for the months:')
for month, value in zip(months, values):
    print(month, value)
print('The average rainfall for the year:', avg)

The average rainfall for the months:
Jan 81.2
Feb 63.2
Mar 70.3
Apr 55.7
May 53.0
Jun 36.4
Jul 17.5
Aug 27.5
Sep 60.9
Oct 117.7
Nov 111.0
Dec 97.9
The average rainfall for the year: 792.9


## Writing data to file

Basic pattern:

```Python
        outfile = open(filename, 'w')  # 'w' for writing
        
        for data in somelist:
            outfile.write(sometext + '\n')
        
        outfile.close()
```

Can *append* text to a file with `open(filename, 'a')`.



## Example: Writing a table to file

**Problem:**

We have a nested list (rows and columns):

In [34]:
data = \
[[ 0.75,        0.29619813, -0.29619813, -0.75      ],
 [ 0.29619813,  0.11697778, -0.11697778, -0.29619813],
 [-0.29619813, -0.11697778,  0.11697778,  0.29619813],
 [-0.75,       -0.29619813,  0.29619813,  0.75      ]]

In [35]:
data

[[0.75, 0.29619813, -0.29619813, -0.75],
 [0.29619813, 0.11697778, -0.11697778, -0.29619813],
 [-0.29619813, -0.11697778, 0.11697778, 0.29619813],
 [-0.75, -0.29619813, 0.29619813, 0.75]]

Write these data to file in tabular form



**Solution:**

In [36]:
outfile = open('tmp_table.dat', 'w')
for row in data:
    for column in row:
        outfile.write('{:14.8f}'.format(column))
    outfile.write('\n')
outfile.close()

**Resulting file:**

            0.75000000    0.29619813   -0.29619813   -0.75000000
            0.29619813    0.11697778   -0.11697778   -0.29619813
           -0.29619813   -0.11697778    0.11697778    0.29619813
           -0.75000000   -0.29619813    0.29619813    0.75000000


## What is a file?

  * A file is a sequence of characters

  * For simple text files, each character is one byte (=8 bits, a bit is 0 or 1), which gives $2^8=256$ different characters

  * (Text files in, e.g., Chinese and Japanese need several bytes for each character)

  * Save the text "ABCD" to file in Gedit/Emacs and OpenOffice/Word and examine the file!

  * File reading in Python is about either reading all characters (`read`) or
    reading line by line (`readline`, `readlines`, `for line in fileobj`)


## Summary of file reading and writing

```Python
        infile  = open(filename, 'r')   # read
        outfile = open(filename, 'w')   # write
        outfile = open(filename, 'a')   # append
        
        # Reading
        line    = infile.readline()   # read the next line
        filestr = infile.read()       # read rest of file into string
        lines   = infile.readlines()  # read rest of file into list
        for line in infile:           # read rest of file line by line
        
        # Writing
        outfile.write(s)   # add \n if you need it
        
        # Closing
        infile.close()
        outfile.close()
```