# Reading and writing data

Here we'll go over some of the most common ways of handling data, including:
1. Standard out
2. Reading data from a file
3. Writing data to a file
4. Getting user input
5. Pandas

## 1. Standard out

"Standard out" (stdout) is simply where a program prints/writes its output data by default. For example, when you open your interactive Python shell and type 
    >>> 3 + 2
the ```5``` that shows up is what was written to stdout.

In [None]:
5 + 6  # the 11 that gets shown in the output is your stdout

Cool. But what about reading data from files? Or writing to them?

## 2. Reading data from a file

The file "sample_data_50.csv" contains the following headers:
id, first_name, country, age, gender

The first few rows look like this:
```
    id,first_name,country,age,gender
    1,Nicholas,Honduras,43,M
    2,Mildred,Indonesia,31,F
    3,Catherine,Malaysia,33,F
```
Columns are delineated by commas, so the data is said to be in comma-separated value or "csv" format. The data in this sample file is 50 lines of random data generated by http://www.mockaroo.com/.

### Basic Way to Open and Read a File

In [None]:
sample_filename = './sample_data_50.csv'

file_object = open(sample_filename, 'rb')  # create a file object using the open() method and set the 
                                           # access mode to "read in binary mode"
    
for line in file_object:  # loop over the file object to read each line:
    print line
    
file_object.close()  # IMPORTANT!!!

### Note!!!
This is NOT the recommended way to read a file because the user has to remember to close the file when they're finished with it. All files are *supposed* to be closed upon termination of a script, but sometimes this doesn't happen. If your file is big enough, this can cause some gnarly problems.


### The Pythonic Way

Use the **with** statement. There's a lot going on behind the scenes of **with**, but all you really need to know to use it is that it handles file opening and closing for you. 

No matter what happens after you open the file (say, your program crashes because of an error), **with** makes sure that the file is closed properly. That is, 99.999% of the time. ;)

In [None]:
sample_filename = './sample_data_50.csv'

with open(sample_filename, 'rb') as file_object:
    for line in file_object:
        print line

### Using the csv Module
The csv module provides a way to read in distinct values for each row and column of a csv file.  You can define things like delimiters and quote characters, and read in data from different sources, including Excel.  In the next cell we will use the csv module to read in values from a csv file.

In [None]:
import csv
sample_filename = './sample_data_50.csv'

with open(sample_filename, 'rb') as csvfile:
    # Return an object that will iterate over each line in the file.
    reader = csv.reader(csvfile, delimiter=',')   
    
    # next will read one line.  Once a line is read it cannot read again
    headers = next(reader)
    print headers
    for row in reader:  # read lines sequentially
        row_id, first_name, country, age, gender = row
        print row    

### Exercise 1: add the rows to a list

In [None]:
import csv
sample_filename = './sample_data_50.csv'
data = []

with open(sample_filename, 'rb') as csvfile:
    # Return an object that will iterate over each line in the file.
    reader = csv.reader(csvfile, delimiter=',')   
    
    # next will read one line.  Once a line is read it cannot read again
    headers = next(reader)                        
    for row in reader:  # read lines sequentially
        row_id, first_name, country, age, gender = row
        data.append(row)
        
    print data

### Exercise 2: Add the rows to a dictionary

In [None]:
import csv
sample_filename = './sample_data_50.csv'
data = {}

with open(sample_filename, 'rb') as csvfile:
    # Return an object that will iterate over each line in the file.
    reader = csv.reader(csvfile, delimiter=',')   
    
    # next will read one line.  Once a line is read it cannot read again
    headers = next(reader)
    print headers
    for row in reader:  # read lines sequentially
        row_id, first_name, country, age, gender = row
        data[row_id] = {                 ###### this will only work if the row_id is unique
            'name' : first_name,
            'country' : country,
            'age' : age,
            'gender' : gender
        }
        
    print data.keys()
    
    for key in data.keys():
        print key, data[key]
    


## 3. Writing data to a file

Next, let's try writing some data to a file.

### Basic way to open and write to a file

In [None]:
output_filename = './test_output.csv'
sample = ('the_answer', 42)  # a tuple
s = str(sample)  # convert the tuple to a string

# The 'w' indicates we are opening a file that we will write stuff to
output_file = open(output_filename, 'w')   
output_file.write(s)
output_file.close()

Again, this is **not** the recommended way of writing to a file. Reasons for this include:
+ the user must remember to convert all data to be written into a string
+ the user must remember to close the file afterwards

### The Pythonic Way Using the csv Module

In [21]:
output_filename = './test_output.csv'

header = ['col0', 'col1', 'col2']

start_num = 1000
count = 0
with open(output_filename, 'w') as outcsvfile:
    mywriter = csv.writer(outcsvfile, delimiter=',')
    mywriter.writerow(header)
    
    while start_num >= 0:
        mywriter.writerow([start_num, count])
        start_num -= 1
        count += 1

In [None]:
input_file = './test_output.csv'

with open(input_file, 'rb') as csvfile:
    reader = csv.reader(csvfile, delimiter=',') 
    for row in reader:
        print row

What does this little piece of code do?


*Note to Windows users*: to open a file for reading and writing in binary mode, set the access mode to 'wb+'.

### Trying to read and write in the same loop? No problem.

Below is a template of what this process might look like:
```
with open(newfile, 'wb') as outfile, open(oldfile, 'rb') as infile:
    for name in infile:
        if name.startswith('C') or name.startswith('G'):
            line = name + ' - Truly a great person!\n'
        outfile.write(line)
```

## 4. User input

Let's stay your Python script needs some user input to proceed. How can we capture keyboard input in Python?

Note: There's a difference between how this works in Python 2.7 and 3.0 (https://docs.python.org/3/whatsnew/3.0.html). Be sure to use the method that corresponds to your version of Python :)

In [None]:
s = raw_input('--> ')
print s

In [None]:
print 'You entered: ' + s  # anything you enter will be considered a string

#### A note about Python 3

To capture user input in Python 3.0, this is what you'd do:

    s = input()

This will throw an error in Python 2.7.

## 5. Pandas

Pandas is a library that's useful for analyzing medium-sized (hundreds of thousands of rows) datasets. Its primary data abstraction is a "data frame" that is very similar to the R DataFrame. Check out the tutorials here: http://pandas.pydata.org/pandas-docs/version/0.15.2/tutorials.html.  If you are interested in learning more about pandas, also check out the notebook python_6_final_project_with_pandas.ipynb.