# File I/O

<div align='center'><img src='https://raw.githubusercontent.com/eitanlees/ISC-3313/master/images/scribe.gif' width='35%'/></div>

In todays lecture we will learn how to read and write files in python

There are a wide variety of file types out there but we will focus on two today

- text data
- numerical data

We will use pythons built in feature to explore text data and numpy for numerical data. 

## Reading text files

The basic syntax for reading a text file is as follows:

```python
    file_object = open(filename, mode)
```

Where `filename` is a string to the file you want to read and `mode` tells the open function what you want to do with the file

There are are few things one might want to do with a file:

- 'w' – **Write Mode**: This mode is used when the file needs to be altered and information changed or added. Keep in mind that this erases the existing file to create a new one. File pointer is placed at the beginning of the file.

- 'r' – **Read Mode**: This mode is used when the information in the file is only meant to be read and not changed in any way. File pointer is placed at the beginning of the file.

- 'a' – **Append Mode**: This mode adds information to the end of the file automatically. File pointer is placed at the end of the file.

There are also some hybrid modes:

- 'r+' – **Read/Write Mode**: This is used when you will be making changes to the file and reading information from it. The file pointer is placed at the beginning of the file.

- 'a+' – **Append and Read Mode**: A file is opened to allow data to be added to the end of the file and lets your program read information as well. File pointer is placed at the end of the file.

Lets open a file and see what methods are available for the `file_object`

In [4]:
f = open('data/ArthurAndDennis.txt', 'r')
f

<_io.TextIOWrapper name='data/ArthurAndDennis.txt' mode='r' encoding='UTF-8'>

Now we can use the file object

In [6]:
script = f.readlines()
script

["Dennis: I'm 37! I'm not old!\n",
 'Arthur: Well, I can\'t just call you "man".\n',
 'Dennis: You could say "Dennis".\n',
 "Arthur: I didn't know you were called Dennis.\n",
 "Dennis: Well you didn't bother to find out, did you?\n",
 'Arthur: I did say I\'m sorry about the "old woman", but from behind you looked...\n']

Once you are done working with a file you should always close it using 

In [7]:
f.close()

After a file is closed it cannot be accessed 

In [8]:
f.read()

ValueError: I/O operation on closed file.

Sometimes people forget to close files and it can cause trouble. The best way to avoid this is to use the `with` keyword to nest the file interactions. 

In [10]:
with open('data/ArthurAndDennis.txt', 'r') as text_file:
    for line in text_file:
        print(line)

Arthur: Old woman!

Dennis: MAN!

Arthur: Man, sorry. What knight lives in that castle over there?

Dennis: I'm 37.

Arthur: What?

Dennis: I'm 37! I'm not old!

Arthur: Well, I can't just call you "man".

Dennis: You could say "Dennis".

Arthur: I didn't know you were called Dennis.

Dennis: Well you didn't bother to find out, did you?

Arthur: I did say I'm sorry about the "old woman", but from behind you looked...

Dennis: What I object to is you automatically treatin' me like an inferior.

Arthur: Well, I am king.

Dennis: Oh, king, eh - very nice. And how'd you get that, then? By exploiting the

workers! By hanging on to outdated imperialist dogma which perpetuates the

economic and social differences in our society. If there's ever going to be any

progress...

Dennis' Mother: Dennis, Dennis! There's some lovely filth down here. Oh, how'd

you do?

Arthur: How'd you do good lady? I am Arthur, king of the Britons. Whose castle

is that?

Dennis' Mother: King of the who?

Arthur: T

## Exercise

Read the file `data/gettysburg.txt`. Print out each line along with the line number. (hint: use `enumerate`). For example: 

    0: Four score and seven years ago our fathers brought forth on this

    1: continent a new nation, conceived in liberty and dedicated to the

    2: proposition that all men are created equal. Now we are engaged in
    
    etc ...

## Exercise

Read the file `data/sonnets.txt` which contain all of the sonnets by William Shakespeare. 

Count the number of times the word "love" or "Love" appears. 

## Writing to a file

So far we have only read the contents of a file but we also can write to a file. 

When creating a file for the first time, you should either use the `a+` or `w+` modes.

Often it's preferable to use the `a+` mode because the data will default to be added to the end of the file. 

Using `w+` will clear out any existing data in the file and give you a "blank slate"to start from.

Let's create `grade_book.txt` based on the example from last class

In [11]:
students = ["Emily", "Jacob", "Emma", "Madison", "Matthew", "Hailey", "Nicholas", "Sarah", "Joshua"]
grades = [81, 81, 92, 79, 79, 82, 99, 94, 87]

with open('grade_book.txt', 'w+') as file:
    for index, (person, grade) in enumerate(zip(students, grades)):
        file.write(f'{index + 1}. {person}: {grade} \n')


In [12]:
ls

17-Inclass-Exercises.ipynb  data.zip
17-file_io.ipynb            grade_book.txt
Untitled.ipynb              hello_with_input.py
commandline_arguments.py    [34mimages[m[m/
count_down.py               magic_number.py
[34mdata[m[m/                       my_data.txt


In [13]:
cat grade_book.txt

1. Emily: 81 
2. Jacob: 81 
3. Emma: 92 
4. Madison: 79 
5. Matthew: 79 
6. Hailey: 82 
7. Nicholas: 99 
8. Sarah: 94 
9. Joshua: 87 


The `\n` acts as the new line indicator, moving subsequent writes to the next line.

If you want to write something that isn't a string to a text file, such as a series of numbers, you have to convert or "cast" them to strings, using conversion code.

## Exercise

Let's revisit an example from before. 

A leap year is exactly divisible by 4 except for century years (years ending with 00). The century year is a leap year only if it is perfectly divisible by 400. For example,

    2017 is not a leap year
    1900 is a not leap year
    2012 is a leap year
    2000 is a leap year
    
This time loop over all years in `year_list` and instead of printing, write out the results to a file `leap_year.txt`

In [51]:
year_list = [2017, 1900, 2012, 2000, 1111, 1554, 1315, 1342, 1520, 1988, 1834, 1319, 1739, 1846, 1282,
       1891, 1260, 1488, 1911, 2009, 1772, 1048, 1428, 1150]

## Numerical data

Often we are only interested in numerical data and luckily numpy has some convenient functions to help with file IO

- `np.loadtxt()`	Load data from a text file.
- `np.savetxt()`	Save an array to a text file.

In [14]:
import numpy as np
x = np.arange(1, 13).reshape(4, 3)
print(x)

[[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]]


In [15]:
np.savetxt('my_data.txt', x)

In [16]:
cat my_data.txt

1.000000000000000000e+00 2.000000000000000000e+00 3.000000000000000000e+00
4.000000000000000000e+00 5.000000000000000000e+00 6.000000000000000000e+00
7.000000000000000000e+00 8.000000000000000000e+00 9.000000000000000000e+00
1.000000000000000000e+01 1.100000000000000000e+01 1.200000000000000000e+01


We can also pass `np.savetxt` optional parameters for formmating as well as delimiters. 

In [17]:
np.savetxt('my_data.csv', x, fmt='%d', delimiter=', ')

In [18]:
cat my_data.csv

1, 2, 3
4, 5, 6
7, 8, 9
10, 11, 12


`np.loadtxt` is similar but performs the opposite action and reads data into a numpy array

In [19]:
x = np.loadtxt('my_data.csv', delimiter=', ')

In [20]:
print(x)

[[ 1.  2.  3.]
 [ 4.  5.  6.]
 [ 7.  8.  9.]
 [10. 11. 12.]]


## Exercise

Load the file `data/blobs.txt` directly into a numpy array and create a plot. Note there is a non-standard delimiter. 

## Exercise

Create a plot of 

$$
y=\frac{1}{4}x^3+\frac{3}{4}x^2-\frac{3}{2}x-2
$$


from $x = [-5, 3]$ and save the `xy` data points in a single file called `inclass_data.txt` with a comma as a delimiter. 

## User Interaction

As a final topic for today I wanted to show how python can be used along user input. 

The simplest way to get user input into a python program is to use the `input` function.

In [21]:
name = input('What is your name?  ')
print('Hello', name)

What is your name?  Eitan
Hello Eitan


Note that the `input` function always returns a string, so if you want to input numbers you will have to convert them before using them. 

## Exercise 

Create a program which asks the user to guess what number it is thinking of between 1 and 10. 

Let the user know if their guess is above, below, or correct. 

Prompt the user until they answer correctly. 

When running python on the command line it is often useful to include extra command line arguments to define parameters. Such as

    python my_program.py argument1 argument2

The simplest way to access these extra command line arguments is with the `sys` module. 

This technique is only applicable when working with stand alone scripts so we will flip back and forth for the examples. 

The `sys` module keeps track of all of the parameters from the command line and stores them in a list called `sys.argv`

This list contains all of the words that come after the `python` command seperated by spaces. 

That means that the first element of `sys.argv` will always be the name of the program you are running. 

Let's look at some examples:

- `commandline_arguments.py`
- `count_down.py`

## Exercise 

Rewrite the previous number guessing program as a stand alone script which uses two command line arguments to define the lower and upper bound of the range of numbers. 

For example:

    $ python number_guess.py 5 30
    
would guess a number between 5 and 30