# Introduction to Python

## Section 3: File I/O

Python makes it very easy to read and write from files.  In most cases, we will build off of the following lines of code:

```python
    with open(filename, mode) as my_file:
        for line in my_file:
            # do stuff
            # do some more stuff
            ...
            # do the last stuff
```

`open` is a function that takes a file and opens it for your use.  The first argument, `filename`, is the name of the file you want to open.  The second argument, `mode`, is the mode that you want to open that file in.  By default, files are opened in `read` mode (i.e., `mode = r`); you are *only* able to read from the file.  If you would like to open a file *only* to write to, set `mode = w`.  Opening a file with `mode = w` will overwrite whatever is currently in the file.  You can open a file in append mode by passing `mode = a` when you open the file; doing so will add to the end of the file instead of overwriting everything.  Finally, `mode = r+` opens the file for both reading and writing.  In summary:

|     Mode    | Access | 
|:-----------:|:------:|
| r (default) | Read only     | 
|      w      | Write only     |
|      r+     | Read and write     |  
|      a      | Append     | 



When you begin a block of code with the `with open(filename) as my_file` syntax, Python will take care of opening and closing the file for you when control leaves the code block begun by the `with` statement.  The `for line in my_file` line sets up a for loop to iterate through every line in the file.  Finally, write the code you want to perform in the inner-most code block.

Go [here][git-files] to download the files we'll be working with.  Click `compressed_data.zip > Download`, unzip it on your computer, then navigate your Spyder file explorer to inside the new folder.  We'll be working from inside this working directory for the remainder of the tutorial.  

[git-files]: https://github.com/pommevilla/p3.bootcamp.python.2018/tree/master/lessons/data

### Section 3.1: Reading from Files

The most basic thing you can do with a file is print it out:

In [1]:
with open('./data/stories/woods.txt') as fin:
    for line in fin:
        print(line)

Stopping by Woods on a Snowy Evening by Robert Frost



Whose woods these are I think I know.   

His house is in the village though;   

He will not see me stopping here   

To watch his woods fill up with snow.   



My little horse must think it queer   

To stop without a farmhouse near   

Between the woods and frozen lake   

The darkest evening of the year.   



He gives his harness bells a shake   

To ask if there is some mistake.   

The only other sound’s the sweep   

Of easy wind and downy flake.   



The woods are lovely, dark and deep,   

But I have promises to keep,   

And miles to go before I sleep,   

And miles to go before I sleep.


Notice how there's an extra blank line between each line.  This is because each line in a `.txt` file ends with a newline character.  We can remove this by calling `str.rstrip()`:

In [2]:
with open('./data/stories/woods.txt') as fin:
    for line in fin:
        line = line.rstrip()
        print(line)

Stopping by Woods on a Snowy Evening by Robert Frost

Whose woods these are I think I know.
His house is in the village though;
He will not see me stopping here
To watch his woods fill up with snow.

My little horse must think it queer
To stop without a farmhouse near
Between the woods and frozen lake
The darkest evening of the year.

He gives his harness bells a shake
To ask if there is some mistake.
The only other sound’s the sweep
Of easy wind and downy flake.

The woods are lovely, dark and deep,
But I have promises to keep,
And miles to go before I sleep,
And miles to go before I sleep.


Better.  Now suppose we want to count the number of lines in the file.  The simplest way to do it is to initialize a counter variable outside the `for` loop and increment it as we read through the line (we'll supress the output for now).

In [3]:
line_counter = 0
with open('./data/stories/woods.txt') as fin:
    for line in fin:
        line_counter +=1

print("There are {} lines in woods.txt.".format(
line_counter))

There are 21 lines in woods.txt.


Now let's say we want to label the lines as we print them out.  We can use string formatting to do so:

In [4]:
line_counter = 0
with open('./data/stories/woods.txt') as fin:
    for line in fin:
        line_counter += 1
        line = line.rstrip()
        print("{} {}".format(line_counter, line))
        
    print("There are {} lines in woods.txt.".format(line_counter))

1 Stopping by Woods on a Snowy Evening by Robert Frost
2 
3 Whose woods these are I think I know.
4 His house is in the village though;
5 He will not see me stopping here
6 To watch his woods fill up with snow.
7 
8 My little horse must think it queer
9 To stop without a farmhouse near
10 Between the woods and frozen lake
11 The darkest evening of the year.
12 
13 He gives his harness bells a shake
14 To ask if there is some mistake.
15 The only other sound’s the sweep
16 Of easy wind and downy flake.
17 
18 The woods are lovely, dark and deep,
19 But I have promises to keep,
20 And miles to go before I sleep,
21 And miles to go before I sleep.
There are 21 lines in woods.txt.


Using a `line_counter` variables is a perfectly acceptable way of keeping track of the line number when reading a file - if it makes sense to you and its easy to remember, use it.  However, Python provides the `enumerate` method to do just this:

In [5]:
with open('./data/stories/woods.txt') as fin:
    for line_num, line in enumerate(fin):
        line = line.rstrip()
        print("{} {}".format(line_num, line))
    
    print("There are {} lines in woods.txt.".format(line_num))

0 Stopping by Woods on a Snowy Evening by Robert Frost
1 
2 Whose woods these are I think I know.
3 His house is in the village though;
4 He will not see me stopping here
5 To watch his woods fill up with snow.
6 
7 My little horse must think it queer
8 To stop without a farmhouse near
9 Between the woods and frozen lake
10 The darkest evening of the year.
11 
12 He gives his harness bells a shake
13 To ask if there is some mistake.
14 The only other sound’s the sweep
15 Of easy wind and downy flake.
16 
17 The woods are lovely, dark and deep,
18 But I have promises to keep,
19 And miles to go before I sleep,
20 And miles to go before I sleep.
There are 20 lines in woods.txt.


The `enumerate` method essentially takes care of the `line_counter` variable for you.  Use whichever method you're more comfortable with.  There are times where you'll want to declare an independent `line_counter` variable, but for today, I'm going to stick with `enumerate`.  

Now let's say we want to find the longest line in the file.  Since each line in the file is just a string, we can easily see how long it is by calling `len(str)`.  The process of locating the longest (or smallest, or whatever extreme measurement you're looking for) entry will typically follow this procedure:

* Create a variable called `longest_line_size` and set it equal to 0
* Create an empty string called `longest_line` and set it equal to ""
* Create a variable called `longest_line_number` and set it equal to 0
* For each line in the file:
    * If the length of `current_line` is larger than `longest_line_size':
        * Set `longest_line_size` = `len(current_line)`
        * Set `longest_line` = `current_line`
        * Set `longest_line_number` = `current_line_number`
* Process.

Translating this to our current file:

In [6]:
with open('./data/stories/woods.txt') as fin:
    longest_line = ""
    longest_line_size = 0
    longest_line_number = 0
    for line_num, line in enumerate(fin):
        line = line.rstrip()
        print("{} {}".format(line_num, line))
        current_line_size = len(line)
        if len(line) > longest_line_size:
            longest_line = line
            longest_line_size = current_line_size
            longest_line_number = line_num
              
    print("The longest line was line number {}: {}".format(longest_line_number, longest_line))
    print("It was {} characters long.".format(longest_line_size))
    

0 Stopping by Woods on a Snowy Evening by Robert Frost
1 
2 Whose woods these are I think I know.
3 His house is in the village though;
4 He will not see me stopping here
5 To watch his woods fill up with snow.
6 
7 My little horse must think it queer
8 To stop without a farmhouse near
9 Between the woods and frozen lake
10 The darkest evening of the year.
11 
12 He gives his harness bells a shake
13 To ask if there is some mistake.
14 The only other sound’s the sweep
15 Of easy wind and downy flake.
16 
17 The woods are lovely, dark and deep,
18 But I have promises to keep,
19 And miles to go before I sleep,
20 And miles to go before I sleep.
The longest line was line number 0: Stopping by Woods on a Snowy Evening by Robert Frost
It was 52 characters long.


#### Exercise 3.1

* Modify the above code to find the shortest line in `woods.txt`.  What was it?
* Modify the code again to report both the shortest *and* longest lines.  Switch around the file names and see what lines you get. 

### Section 3.2: Writing to Files

The syntax for writing to a file is similar to that for reading from a file:

```python
with open(filename, 'w') as my_file:
    # Stuff...
    my_file.write(str)
```

If `filename` doesn't exist at the time you open it for writing, Python will create it for you.  Let's go ahead and test it.

In [7]:
with open('test', 'w') as fout:
    fout.write("Dude, nice!")

Now look in the File Explorer pane in Spyder: Python created the file `test` for you.  Dude, nice!

Let's write a few more lines - we'll just overwrite what's in `test`.

In [8]:
with open('test', 'w') as fout:
    fout.write("He stuck a feather in his cap")
    fout.write("And called it macaroni")

Open up the file.  What's wrong?

`file.write()` takes a string and writes it as-is to `file`.  Since we didn't put any newline characters, it just mashed it all together.  Let's fix that now.

In [9]:
with open('test', 'w') as fout:
    fout.write("He stuck a feather in his cap\n")
    fout.write("And called it macaroni\n")

Dude, nice!

Since it's just passing a string, you can do all the formatting stuff you've been doing:

In [10]:
best_songs = {
    "Britney Spears": "Lucky",
    "Bruno Mars": "When I Was Your Man",
    "Backstreet Boys": "Larger Than Life",
    "Drake": "HYFR",
    "Jay Park": "All I Wanna Do",
    "Nero": "Into the Night",
    "Rihanna": "Where Have You Been",
    "The Weeknd": "What You Need"
}

with open('best_songs.txt', 'w') as fout:
    for artist in best_songs:
        fout.write("The best song by {} is {}.\n".format(
        artist, best_songs[artist]))

#### Exercise 3.2

* You can open multiple files at once when using the `with` statement.  Search online for how to do it, then modify your code from **Exercise 3.1** so that it opens an additional output file.  Instead of printing the results to the console, write it to your output file.

[Previous Section: Control Flow](P3Bootcamp2018-02.ipynb)<br>
[Next Section: Functions](P3Bootcamp2018-04.ipynb)