# Reading and Writing Files

This notebook will teach you the skills necissary to read and write text files.

## Introduction

On your computer, you have probably opened a .txt file at some point. 
A txt file is just a file that has plain text, just like a string.

They are a common way of storing information on computers, and it is very easy to open and read from them.

For example, in the same folder as this notebook, there should be a file called romeo_and_juliet.txt

If you open this .txt file, you should see the entirety of romeo_and_juliet in your text editor.

Let's start with something simple. There's another text file called Grades.txt

The text file is just a list of people's names, with their grade after their name. 

If you wanted to open it in Python to do some analysis on it, here's how you would do it.

In [None]:
f = open("Grades.txt") # First, open the file - you can save the opened file 

In [None]:
f.read() # Then, you have you do f.read() to get the string version of the text file. 

It would be worth it to save the f.read() into a variable. 

In [None]:
f = open("Grades.txt") # First, open the file - you can save the opened file 
s = f.read()
print(s)

Notice that when I printed this s, those weird \n's didn't show up. 
In a string, \n is the code that means new line, and when you give it to print, it will print out a new line. Look at this for example:

In [None]:
print("Hello\nMy\nName\nIs\nArya")

For files, the thing you probably want to do first is use the "splitlines" method on strings. This will split your string into a list of all the lines:

In [None]:
split = s.splitlines()
print(split)

In [None]:
# We can put all these three lines together into one line
all_the_lines =  open("Grades.txt").read().splitlines()
print(all_the_lines)

Now we can do some data analysis on it. Let's first do a list comprehension to turn this into a lists of lists.

In [None]:
grades = [x.split() for x in all_the_lines] # .split() splits the string on the space, so now this is a list of lists. 
grades

Now this is much easier! Now let's see what the average grade is.

In [None]:
# first let's create a list of just the score themselves, without the names,
# and lets convert the scores from strings to Python integers
scores = [int(line[1]) for line in grades] # why do we need to use the int(...) function??
print('the quiz scores are',scores)
print() # this prints a blank line!

# now that we have a list of the scores, we can easily find the average by dividing the sum by the length of the lists
avg_score = sum(scores)/len(scores)
print('the sum of the scores is',sum(scores),'the number of scores is',len(scores),'the average score is',avg_score)

Looks like we have a solid C average. I think we'll do better after taking the makeups!!

## Files1
Add a few lines to the Grades.txt file using a code editor
then write some code to read the file and find the highest grade!

## Digital Humanities
Now let's look at some examples of working with Romeo and Juliet, which we downloaded from the
www.gutenberg.org website.

In [None]:
# Let's find every line in Romeo and Juliet with the word "dog"

rj_lines = open("romeo_and_juliet.txt").read().splitlines() 
# In one line, this opens the file, turns it into a string, and splits it into lines

# next we filter the list of lines to find only those containing the word 'dog'
result_lines = [line for line in rj_lines if 'dog' in line]  # note the use of a list comprehension!

# now we print out the results
print('there are',len(result_lines),'lines in Romeo and Juliet containing the word dog',)
print('\nHere they are:\n')
for line in result_lines:
    print(line)

## Line Numbers!
Here is a modification that finds the line numbers of the lines in Romeo and Juliet containing the word "dog".  Observe that we create a list of the indices k of lines rj_lines[k] which contain the word "dog" and we can use those indices to print out the original lines...

In [None]:
# Let's find every line in Romeo and Juliet with the word "dog"
rj_lines = open("romeo_and_juliet.txt").read().splitlines() # In one line - opens file, turns it into string, and splits lines

# next we find out how many lines are in the Romeo and Juliet manuscript
num_lines = len(rj_lines)  # let's see how many indexes we need to use

# here we find the line numbers (indexes) of the lines which contain the word 'dog'
# notice that range(0,num_lines) is the list of all line numbers for Romeo and Juliet

result_lines = [k for k in range(0,num_lines) if 'dog' in rj_lines[k]]  # note the use of a list comprehension to filter

print('there are',len(result_lines),'lines in Romeo and Juliet containing the word "dog"')

print('\nHere they are:\n')
for k in result_lines:
    print(k,rj_lines[k])

### Files 2: Modify the code in the previous example to also print the line before and the line after each occurrence of the word "dog".

So is should print

    there are 6 lines in Romeo and Juliet containing the word "dog"

    Here they are:

    36 SAMPSON
    37 A dog of the house of Montague moves me.
    38 GREGORY
    
    41 SAMPSON
    42 A dog of that house shall move me to stand: I will
    43 take the wall of any man or maid of Montague's.
    
    ....
    
with three consecutive lines around each occurrence of the word.

Cut/paste your code into TeachBack, as usual!

In [None]:

print('there are',len(result_lines),'lines in Romeo and Juliet containing the word "dog"')

print('\nHere they are:\n')
for k in result_lines:
    print(k,rj_lines[k])

## How to store information in files. 

Let's say you want to send some results to your boss in a text file. It's very easy to write things to text files. Here's how you do it. 

In [None]:
f = open("newfile.txt", "w") 

Notice that second argument "w" - This is how you tell python that this file doesn't exist, and you're going to create it, and "write" things into the file ("w" for write!)

Now to write a line to it, you can just write to the file like this:

In [None]:
f.write("Hello\n") # Don't forget the /n, that's how you make there be a new line in the text file. 
f.write("World\n")

You can also "print" to a file by passing in the file like this as a second argument, file=f (where f is the name of your file). This is an example of "overriding a default parameter."  You can define your own functions to have parameters with default values and allowing the user to override them.  We'll talk about this later.

Notice that you don't need the \n here, since print adds it automatically.

In [None]:
print("Hello", file=f)

In [None]:
# Let's print a table of square roots
import math
for i in range(10):
    print(i,math.sqrt(i), file=f)

When you're done with the file you're writing, you have to close it. Like this:

In [None]:
f.close()

Now let's see the file! You can also open it in Atom.

In [None]:
print(open("newfile.txt").read())