# File Handling

Today we will learn about how to read from and write to files on your computer using a Python script! Credit to Rochelle Terman

In [None]:
## Import required libraries
import tweepy
import json

### Reading from a file

Reading a file requires three steps:

1. Opening the file
2. Reading the file
3. Closing the file

- An exclamation point `!` puts you in bash
- The `touch` command creates a file. You use it by including an argument which is the name of the file you create.

In [None]:
!touch sample.txt

In [None]:
my_file = open("sample.txt", "r")
text = my_file.read()
my_file.close()

print("--" + text + "--")
print(len(text))

We see that when we create a new file using bash, it's empty. Let's try reading from a file with text in it -- for example, `example.txt`.

In [None]:
my_file = open("example.txt", "r")
text = my_file.read()
my_file.close()

print("--" + text + "--")
print(len(text))

- However, use the `with open` syntax and this will automatically close files for you. 
- The `'r'` indicates that you are reading the file, as opposed to, say, writing to it.

In [None]:
# better code
with open('example.txt', 'r') as my_file:
    text = my_file.read()
    
print("--" + text + "--")
print(len(text))

`with` will keep the file open as long as the program is still in the indented block, once outside, the file is no longer open, and you can't access the contents, only what you have saved to a variable.

### Reading a file as a list

- Very often we want to read in a file line by line, storing those lines as a list.
- To do that, we can use the `for line in my_file` syntax:

In [None]:
stored = []
with open('example.txt', 'r') as my_file:
    for line in my_file:
        stored.append(line)

In [None]:
stored

- We can use the `strip` [method](https://github.com/dlab-berkeley/python-intensive/blob/master/Glossary.md#method) to get rid of those line breaks at the end

In [None]:
stored = []
with open('example.txt', 'r') as my_file:
    for line in my_file:
        stored.append(line.strip())

In [None]:
stored

### Writing to a file

We can use the `with open` syntax for writing files as well.

In [None]:
# this is okay...
new_file = open("example2.txt", "w")
bees = ['bears', 'beets', 'Battlestar Galactica']
for i in bees:
    new_file.write(i + '\n')
new_file.close()

In [None]:
# but this is better...
bees = ['bears', 'beets', 'Battlestar Galactica']
with open('example2.txt', 'w') as new_file:
    for i in bees:
        new_file.write(i + '\n')

In [None]:
!cat example2.txt

In [None]:
# but this is better...
bees = ['bears', 'beets', 'Battlestar Galactica']
with open('example2.txt', 'a') as new_file:
    for i in bees:
        new_file.write(i + '\n')

In [None]:
!cat example2.txt

### Using the CSV Module

A common task in programming is reading a csv file. 
- In python, a common way to do that is to read a csv as a list of dictionaries. 
- For this, we use the `csv` module

In [None]:
import csv

In [None]:
# read csv and read into a list of dictionaries
capitals = [] # make empty list
with open('capitals.csv', 'r') as csvfile: # open file
    reader = csv.DictReader(csvfile) # create a reader
    for row in reader: # loop through rows
        capitals.append(row) # append each row to the list

In [None]:
capitals[:5]

- Writing a list of dictionaries as a CSV is similar:

In [None]:
print(len(capitals))

# get the keys in each dictionary
keys = capitals[1].keys()
print(keys)
keys = list(keys)
print(keys)

In [None]:
# write rows
with open('capitals2.csv', 'w') as output_file:
    dict_writer = csv.DictWriter(output_file, ['Country', 'Capital', 'Latitude', 'Longitude'])
    dict_writer.writeheader()
    dict_writer.writerows(capitals)

### Challenge 1: Read in a list

The file `counties.txt` has a column of counties in California. Read in the data into a list called `counties`.

### Challenge 2: Writing a CSV file

Below is a list of dictionaries representing US states. Write this [object](https://github.com/dlab-berkeley/python-intensive/blob/master/Glossary.md#object) as a CSV file called `states.csv`

In [None]:
states = [{'state': 'Ohio', 'population': 11.6, 'year in union': 1803, 'state bird': 'Northern cardinal', 'capital': 'Columbus'},
          {'state': 'Michigan', 'population': 9.9, 'year in union': 1837, 'capital': 'Lansing'},
          {'state': 'California', 'population': 39.1, 'year in union': 1850, 'state bird': 'California quail', 'capital': 'Sacramento'},
          {'state': 'Florida', 'population': 20.2, 'year in union': 1834, 'capital': 'Tallahassee'},
          {'state': 'Alabama', 'population': 4.9, 'year in union': 1819, 'capital': 'Montgomery'}]

In [None]:
keys = []

for state in states:
    for key in state.keys():
        if key not in keys:
            keys.append(key)

print(keys)

###Challenge 3: Writing Twitter API data to a CSV

In [None]:
## NOTE: Better to use your own keys and tokens!!
## Our access key, mentioned above
consumer_key = 'Q8kC59z8t8T7CCtIErEGFzAce'
## Our signature, also given upon app creation
consumer_secret = '24bbPpWfjjDKpp0DpIhsBj4q8tUhPQ3DoAf2UWFoN4NxIJ19Ja'
## Our access token, generated upon request
access_token = '719722984693448704-lGVe8IEmjzpd8RZrCBoYSMug5uoqUkP'
## Our secret access token, also generated upon request
access_token_secret = 'LrdtfdFSKc3gbRFiFNJ1wZXQNYEVlOobsEGffRECWpLNG'

## Set of Tweepy authorization commands
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)

In [None]:
# Search for tweets containing a positive attitude to 'hillary' or
# 'clinton' since October 1st
query1 = "hillary%20OR%20clinton%20%3A%29"

# Search for tweets containing a positive attitude to 'donald' or
# 'trump' since October 1st
query2 = "donald%20OR%20trump%20%3A%29"

results1 = api.search(q=query1)
results2 = api.search(q=query2)

*Remember*: in order to write a set of dictionaries to a CSV file, we will need a list of **all** keys found in any of the dictionaries, and a list of the dictionaries.

In [None]:
'''
Things to remember:
- results1 and results 2 are lists.
- Each item in lists results1 and results2 is a Twitter status object, which
  has a _json attribute.
- This _json attribute can be accessed from the status using "dot notation"
- This _json attribute can be used as a dictionary
- We also need a list of keys *without duplicates* in order to write to a
  CSV file.
'''

# Your variables here are:
## "keys1": a list of keys for the first set of statuses
## "lst_1": a list of _json dictionary objects
keys1 = []
lst_1 = []
for i in range(len(results1)):
    status = results1[i]
    dictionary = status._json # access this using dot notation!
    lst_1.append(dictionary) # function for adding to a list
    for key in dictionary.keys():
        if key not in keys1:
            keys1.append(key)
            
print(keys1)
keys1 = []

for status in results1:
    dictionary = status._json
    lst_1.append(dictionary)
    for key in dictionary.keys():
        if key not in keys1:
            keys1.append(key)

print(keys1)

# Your variables here are:
## "keys2": a list of keys for the second set of statuses
## "lst_2": a list of _json dictionary objects
keys2 = []
lst_2 = []
for i in range(len(results2)):
    status = results2[i]
    dictionary = status._json
    lst_2.append(dictionary)
    for key in dictionary.keys():
        if key not in keys2:
            keys2.append(key)

In [None]:
# write rows
with open('query_results1.csv', 'w') as output_file:
    dict_writer = csv.DictWriter(output_file, keys1)
    dict_writer.writeheader()
    dict_writer.writerows(lst_1)

# write rows
with open('query_results2.csv', 'w') as output_file:
    dict_writer = csv.DictWriter(output_file, keys2)
    dict_writer.writeheader()
    dict_writer.writerows(lst_2)