In this lecture we will examine different ways to write to and read from files. This can be useful when you have thousands or millions of numbers that you need to read from a file or store somewhere for later analysis. 

# Open (write)

In [None]:
# The open function is built into Python.
#
# Usage:
# file = open('FILENAME','MODE')
# 
# where MODE can be 'r' for read, 'w' for 'write' (erases any file with that name), or 'a' for append.
#
# You can then write strings to a file. 
# 

# Open the file
outfile = open('simpledata.dat','w')

# Write some data. Make sure you have the end-of-line symbol '\n' at the end of each line.
output = "This is data you might be interested in\n"
outfile.write(output)

# Write some more data
output = "1 2 3 4 5 6 7 8 9\n"
outfile.write(output)

# Close the file
outfile.close()

# Now go look at this file from the main Jupyter file listing.

# Open (read)
## Reading each line yourself

We can then read the file and print each line.


In [None]:
infile = open('simpledata.dat','r') # Open it in read mode

# Loop over each line

for line in infile:
    print(line)

In [None]:
# If you want to pull out information from each line, you can use the split function. 
# For example, pull out the second column of numbers from the file, data0.dat.
# Take a look at it from the file listing first. 

infile = open('data0.dat','r')

# Use the numbers list to hold the data you want. 
numbers = []
for line in infile:
    
    values = line.split() # Returns a list of strings
    print(values)
    
    numbers.append(int(values[1]))
    
print(numbers)

# Note that the numbers list is of integers, rather than a string.

# Using csv module

A fairly common format for data files is .csv, Comma-Separated Values. 

Fortunately, there is even a module that helps pull out the information. 

In [None]:
import csv

# Note the explicit use of the delimiter argument. 
infile = open('data1.csv','r')
rows = csv.reader(infile,delimiter=',')

# This takes care of the 'split'
for row in rows:
    print(row)

# Using numpy.loadtxt

There are some very nice features in numpy to do this kind of stuff for uniformly formatted data. 

That is, datafiles where the columns are well-defined and have the same type of quantity in them (float, str, etc.). This is the case for many files you will ever work with. 

* [numpy.loadtxt](http://docs.scipy.org/doc/numpy/reference/generated/numpy.loadtxt.html#numpy.loadtxt)
* [numpy.savetxt](http://docs.scipy.org/doc/numpy/reference/generated/numpy.savetxt.html#numpy.savetxt)

These are generally my go-to for data files. 

In [None]:
import numpy as np

# skiprows. Use this if there are "header" rows that explain the data.
# unpack=True. Use this if you want to read the data out as columns, not rows. Usually this is what we want. 

columns = np.loadtxt('data1.csv',delimiter=',',skiprows=0,unpack=True)

print(columns[1])
print(type(columns[1]))

# Note that these are all read in as floats, the default option.
#
# Also, the columns are numpy arrays, not lists! This is very helpful!

In [None]:
# Let's try it with a different file. 
# Take a look at it first from the file listing. 

# Using demographics.csv file from 
# http://evc-cit.info/psych018/r_intro/r_intro5.html

# Uncomment the following and run it, just to see the error.
#columns = np.loadtxt('demographics.csv',delimiter=',',skiprows=1,unpack=True)

# Because some columns have letters (M or F), it cannot interpret them all as numbers. 
# So we need to tell loadtxt to first read in *everything* as a string and then we'll convert it afterwards.

columns = np.loadtxt('demographics.csv',delimiter=',',skiprows=1,unpack=True,dtype='str')

print(columns[1])

print()

# Make use of the astype() function of arrays. This doesn't change the array itself,
# but will return a version of the array as a different type. 

x = columns[1].astype('float')

print(x)

In [None]:
# You can even "slice up arrays" using conditionals on other arrays! 
# So long as they are the same length. 

gender = columns[0]
weight = columns[4].astype(float)

print(gender) 
print(weight)

In [None]:
# Make a list of booleans of True or False if the person is male (ismale)

ismale = gender=="M"
print(ismale)


In [None]:
# Now use it!

print("All weights")
print(weight)
print() 
print("Male weights")
print(weight[ismale])
print()
print("Female weights")
print(weight[ismale==False]) # Female