## File Input / Output

Often data or information is stored in files and you need to get that data into your program so you can work with it, which means you need a way to read the data from file into a data structure(s).  I'm going to primarily deal with text files with human readable characters, but Python can also deal with binary files.  However, for binary files you need to know alot more about the `bytes` type, encoding/decoding and Unicode, and potentially the actual structure of the data file. 

For now we are going to ignore those complexities.  Text files are easier to deal with as we are just starting out and the concepts you'll learn here are a good foundation for working with all types of files, text or binary.

In [12]:
%ls *.txt      # NOTE: anything with a leading % (or %%) is a Jupyter command NOT Python

gettysburg.txt  hamlet.txt      student.txt     test.txt


In [13]:
%cat hamlet.txt

To be, or not to be, that is the question:
Whether 'tis nobler in the mind to suffer
the slings and arrows of outrageous fortune.
Or to take arms against a sea of troubles
And by opposing end them.  To die-to sleep,
No more: and by a sleep to say we end.

In [14]:
# First we need to open a file and assign an object to reperesent it
# I'm going to to use fp.  Why? That's for me to know and you to find out (as the kids say)
#
fp = open('hamlet.txt', 'r')   # the 'r' is called the mode, and it is redundant here 'r' is the default

In [15]:
# fp is an object and it has methods like readline()
fp.readline()

'To be, or not to be, that is the question:\n'

The `readline` method returns a string or `str`.  Note the `\n` at the end of the string, this is returned by the readline mehtod and is part of the string.  When we print out the string, however, it transforms the new line character into a newline.  There are several "ecaped" characters like this, you'll commonly see `\n` - newline and `\t` - tab most often.

In [16]:
fp.readline()

"Whether 'tis nobler in the mind to suffer\n"

The newline character may be useful to you, or not, how can you get rid of it?

Well, since `readline()` returns a string, we can turn to a string method, the rather indelicately named string function `strip()`

In [17]:
next_line = fp.readline()
next_line

'the slings and arrows of outrageous fortune.\n'

In [18]:
next_line.strip()

'the slings and arrows of outrageous fortune.'

Now we've ben reading line by line and this is a very prepetitions operations, so it shouldn't be too much of a surprise to learn that **a file object is also an interable**; it can appear in a loop

In [19]:
for line in fp:
    print(line)

Or to take arms against a sea of troubles

And by opposing end them.  To die-to sleep,

No more: and by a sleep to say we end.


Whoa, wait a minute, who made it all double spaced?  Well `print` did it; we get one `\n` for every line we read and a BONUS new line added silently by the `print` funtion, but we just saw a way to fix this:

In [20]:
for line in fp:
    print(line.strip())

No output, hmmm?  Why

In [22]:
# let's re-open the file
fp = open('hamlet.txt')
for line in fp:
    print(line.strip())

To be, or not to be, that is the question:
Whether 'tis nobler in the mind to suffer
the slings and arrows of outrageous fortune.
Or to take arms against a sea of troubles
And by opposing end them.  To die-to sleep,
No more: and by a sleep to say we end.


The file above didn't return any more output because the reading was finished, we were at the end of the file.  Either we have to *rewind* to the beginning of the file (see `seek`) or we re-open the file.  It is just a good idea to make sure every file you use is `closed` after you are done with it, so you don't surprise yourself (unless you like these type of surprises).  The Pythonic way to do this is:

In [23]:
with open('gettysburg.txt') as fp:    # this with... block is called a context manager, which closes the file
    lines = fp.readlines()            # read all lines at once
    print(f"Type of lines: {type(lines)} with length: {len(lines)}")

Type of lines: <class 'list'> with length: 17


In [24]:
print(''.join(lines))

Four score and seven years ago our fathers brought forth on this continent a new nation, conceived in liberty, 
and dedicated to the proposition that all men are created equal.

Now we are engaged in a great civil war, testing whether that nation, or any nation so conceived and 
so dedicated, can long endure. We are met on a great battlefield of that war. We have come to dedicate 
a portion of that field as a final resting place for those who here gave their lives that that nation 
might live. It is altogether fitting and proper that we should do this.

But in a larger sense we cannot dedicate, we cannot consecrate, we cannot hallow this ground. The brave men, 
living and dead, who struggled here have consecrated it, far above our poor power to add or detract. The 
world will little note, nor long remember, what we say here, but it can never forget what they did here. 
It is for us the living, rather, to be dedicated here to the unfinished work which they who fought here 
have thus far

### Steps to Read a file

So to read a file you need to  

1. Open thte filre and asign it to a reference, I use `fp`, you can use whatever you like
2. Read the file in a way that makes sense to you, line by line with a for loop, or all at once (this may depend on what you want to do with the lines)

## Reading a CSV file

Often you can get data in CSV format (or TSV) which you can red nicely with Python.  There are many ways to do this so let's look at a couple, wihc get esier and easier for you as you go down the list.

### Method 1 - Bare Bones Python

In [43]:
data_table = []
with open("../data/periodic_table.csv") as fp:
    for line in fp:
        data_table.append(line.split(','))

In [44]:
print(data_table[0])
print()
print(data_table[1])
print()
print(data_table[2])

['AtomicNumber', 'Element', 'Symbol', 'AtomicMass', 'NumberofNeutrons', 'NumberofProtons', 'NumberofElectrons', 'Period', 'Group', 'Phase', 'Radioactive', 'Natural', 'Metal', 'Nonmetal', 'Metalloid', 'Type', 'AtomicRadius', 'Electronegativity', 'FirstIonization', 'Density', 'MeltingPoint', 'BoilingPoint', 'NumberOfIsotopes', 'Discoverer', 'Year', 'SpecificHeat', 'NumberofShells', 'NumberofValence\n']

['1', 'Hydrogen', 'H', '1.007', '0', '1', '1', '1', '1', 'gas', '', 'yes', '', 'yes', '', 'Nonmetal', '0.79', '2.2', '13.5984', '8.99E-05', '14.175', '20.28', '3', 'Cavendish', '1766', '14.304', '1', '1\n']

['2', 'Helium', 'He', '4.002', '2', '2', '2', '1', '18', 'gas', '', 'yes', '', 'yes', '', 'Noble Gas', '0.49', '', '24.5874', '1.79E-04', '', '4.22', '5', 'Janssen', '1868', '5.193', '1', '\n']


### Method 2 - Python Standard Library

- [csv - Reading and Writing CSV Files](https://docs.python.org/3/library/csv.html)
- [Python Module of the Week - csv](https://pymotw.com/3/csv/index.html)

In [45]:
import csv

data_table = []
with open("../data/periodic_table.csv") as fp:
    csv_reader = csv.reader(fp)
    for row in csv_reader:
        data_table.append(row)

In [46]:
print(data_table[0])
print()
print(data_table[1])
print()
print(data_table[2])

['AtomicNumber', 'Element', 'Symbol', 'AtomicMass', 'NumberofNeutrons', 'NumberofProtons', 'NumberofElectrons', 'Period', 'Group', 'Phase', 'Radioactive', 'Natural', 'Metal', 'Nonmetal', 'Metalloid', 'Type', 'AtomicRadius', 'Electronegativity', 'FirstIonization', 'Density', 'MeltingPoint', 'BoilingPoint', 'NumberOfIsotopes', 'Discoverer', 'Year', 'SpecificHeat', 'NumberofShells', 'NumberofValence']

['1', 'Hydrogen', 'H', '1.007', '0', '1', '1', '1', '1', 'gas', '', 'yes', '', 'yes', '', 'Nonmetal', '0.79', '2.2', '13.5984', '8.99E-05', '14.175', '20.28', '3', 'Cavendish', '1766', '14.304', '1', '1']

['2', 'Helium', 'He', '4.002', '2', '2', '2', '1', '18', 'gas', '', 'yes', '', 'yes', '', 'Noble Gas', '0.49', '', '24.5874', '1.79E-04', '', '4.22', '5', 'Janssen', '1868', '5.193', '1', '']


### Method 2++ - Python Standard Library++


In [47]:
import csv

data_table = []
with open("../data/periodic_table.csv") as fp:
    csv_reader = csv.DictReader(fp)
    for row in csv_reader:
        data_table.append(row)

In [48]:
print(data_table[0])
print()
print(data_table[1])
print()
print(data_table[2])

{'AtomicNumber': '1', 'Element': 'Hydrogen', 'Symbol': 'H', 'AtomicMass': '1.007', 'NumberofNeutrons': '0', 'NumberofProtons': '1', 'NumberofElectrons': '1', 'Period': '1', 'Group': '1', 'Phase': 'gas', 'Radioactive': '', 'Natural': 'yes', 'Metal': '', 'Nonmetal': 'yes', 'Metalloid': '', 'Type': 'Nonmetal', 'AtomicRadius': '0.79', 'Electronegativity': '2.2', 'FirstIonization': '13.5984', 'Density': '8.99E-05', 'MeltingPoint': '14.175', 'BoilingPoint': '20.28', 'NumberOfIsotopes': '3', 'Discoverer': 'Cavendish', 'Year': '1766', 'SpecificHeat': '14.304', 'NumberofShells': '1', 'NumberofValence': '1'}

{'AtomicNumber': '2', 'Element': 'Helium', 'Symbol': 'He', 'AtomicMass': '4.002', 'NumberofNeutrons': '2', 'NumberofProtons': '2', 'NumberofElectrons': '2', 'Period': '1', 'Group': '18', 'Phase': 'gas', 'Radioactive': '', 'Natural': 'yes', 'Metal': '', 'Nonmetal': 'yes', 'Metalloid': '', 'Type': 'Noble Gas', 'AtomicRadius': '0.49', 'Electronegativity': '', 'FirstIonization': '24.5874', 'Den

## Writing to Files

In [49]:
numbers = [2, 4, 6, 8, 10]

In [50]:
with open('test.txt', 'w') as fp:
    for i in numbers:
        fp.write(f"Number: {i} - square {i**2}")

In [51]:
%cat test.txt

Number: 2 - square 4Number: 4 - square 16Number: 6 - square 36Number: 8 - square 64Number: 10 - square 100

In [52]:
with open('test.txt', 'w') as fp:
    for i in numbers:
        fp.write(f"Number: {i} - square {i**2}\n")

In [53]:
%cat test.txt

Number: 2 - square 4
Number: 4 - square 16
Number: 6 - square 36
Number: 8 - square 64
Number: 10 - square 100


In [None]:
x