# Reading Data from Files
#### Introduction to Programming with Python

## Why use files in Python?

Often, your program needs to work with data in files - you don't want to make the user input a whole bunch of values every time they use it.

Examples:
* analyzing weather data
* generating payroll
* saving progress and returning to it later (like in a word processor)

## Opening files in Python

Python provides a built-in `open()` function which will returns a new type with a scary name like `_io.TextIOWrapper`. 

In [1]:
#assumes the gettysburg.txt file is in
#the same directory as your .py file
gettysburg_file = open("gettysburg.txt")
type(gettysburg_file)

_io.TextIOWrapper

<center>
<div>
<img src="images/gettysburg.png" width="400"/>
</div>
</center>


For our purposes, just think of the variable `gettysburg_file` as a variable which represents a _file object_.

## Making sure your files are in the right place

<center>
<p>
<div>
<img src="images/file_setup.png" width="700"/>
</div>
</p>
</center>

## Reading data from files

Python files work with several different methods that allow you to read data.
* `read()` reads the file into a big string
* `readline()` reads the next line of the file into a string
* `readlines()` reads the lines of the file into a list of strings

In [2]:
gettysburg_file = open("gettysburg.txt")
file_contents = gettysburg_file.read()
print(file_contents)

Fourscore and seven years ago our fathers brought forth on
this continent a new nation, conceived in liberty and
dedicated to the proposition that all men are created equal.

Now we are engaged in a great civil war, testing whether 
that nation or any nation so conceived and so dedicated can 
long endure. We are met on a great battlefield of that war. 
We have come to dedicate a portion of it as a final resting 
place for those who died here that the nation might live. 
This we may, in all propriety do. But in a larger sense, we 
cannot dedicate, we cannot consecrate, we cannot hallow this 
ground. The brave men, living and dead who struggled here 
have hallowed it far above our poor power to add or detract. 
The world will little note nor long remember what we say 
here, but it can never forget what they did here.

It is rather for us the living, we here be dedicated to the 
great task remaining before us--that from these honored dead 
we take increased devotion to that cause for whic

In [3]:
gettysburg_file = open("gettysburg.txt")
firstline = gettysburg_file.readline()
secondline = gettysburg_file.readline()
print(secondline)

this continent a new nation, conceived in liberty and



In [4]:
gettysburg_file = open("gettysburg.txt")
contents_as_list = gettysburg_file.readlines()
print( contents_as_list[8] )

place for those who died here that the nation might live. 



## Opening files using `with` statements

Opening files with a `with` statement does the same thing, but it does some nice things like closing the file when it is done.

In [5]:
with open("gettysburg.txt") as gettysburg_file:
    
    gettysburg_text = gettysburg_file.readlines()
    
    print(gettysburg_text)
    
print("All done with the file.")

['Fourscore and seven years ago our fathers brought forth on\n', 'this continent a new nation, conceived in liberty and\n', 'dedicated to the proposition that all men are created equal.\n', '\n', 'Now we are engaged in a great civil war, testing whether \n', 'that nation or any nation so conceived and so dedicated can \n', 'long endure. We are met on a great battlefield of that war. \n', 'We have come to dedicate a portion of it as a final resting \n', 'place for those who died here that the nation might live. \n', 'This we may, in all propriety do. But in a larger sense, we \n', 'cannot dedicate, we cannot consecrate, we cannot hallow this \n', 'ground. The brave men, living and dead who struggled here \n', 'have hallowed it far above our poor power to add or detract. \n', 'The world will little note nor long remember what we say \n', 'here, but it can never forget what they did here.\n', '\n', 'It is rather for us the living, we here be dedicated to the \n', 'great task remaining bef

## Analyzing Baby Names Example

Let's say we want to analyze baby name popularity, and we have a file with data from https://www.ssa.gov/oact/babynames/decades/names2010s.html

<center>
<div>
<img src="images/babynamefile.png" width="400"/>
</div>
</center>

In [7]:
with open("top_male_baby_names_2010s.txt") as male_names_file:
    male_names = male_names_file.readlines()
    
    print(male_names)

['Noah\n', 'Liam\n', 'Jacob\n', 'William\n', 'Mason\n', 'Ethan\n', 'Michael\n', 'Alexander\n', 'James\n', 'Elijah\n', 'Benjamin\n', 'Daniel\n', 'Aiden\n', 'Logan\n', 'Jayden\n', 'Matthew\n', 'Lucas\n', 'David\n', 'Jackson\n', 'Joseph\n', 'Anthony\n', 'Samuel\n', 'Joshua\n', 'Gabriel\n', 'Andrew\n', 'John\n', 'Christopher\n', 'Oliver\n', 'Dylan\n', 'Carter\n', 'Isaac\n', 'Luke\n', 'Henry\n', 'Owen\n', 'Ryan\n', 'Nathan\n', 'Wyatt\n', 'Caleb\n', 'Sebastian\n', 'Jack\n', 'Christian\n', 'Jonathan\n', 'Julian\n', 'Landon\n', 'Levi\n', 'Isaiah\n', 'Hunter\n', 'Aaron\n', 'Charles\n', 'Thomas\n', 'Eli\n', 'Jaxon\n', 'Connor\n', 'Nicholas\n', 'Jeremiah\n', 'Grayson\n', 'Cameron\n', 'Brayden\n', 'Adrian\n', 'Evan\n', 'Jordan\n', 'Josiah\n', 'Angel\n', 'Robert\n', 'Gavin\n', 'Tyler\n', 'Austin\n', 'Colton\n', 'Jose\n', 'Dominic\n', 'Brandon\n', 'Ian\n', 'Lincoln\n', 'Hudson\n', 'Kevin\n', 'Zachary\n', 'Adam\n', 'Mateo\n', 'Jason\n', 'Chase\n', 'Nolan\n', 'Ayden\n', 'Cooper\n', 'Parker\n', 'Xavier

It's annoying that the newline character `\n` is included in all of the strings. To remove these, you could use the `rstrip()` string method.

In [8]:
name = "Eric\n"
name

'Eric\n'

In [9]:
name.rstrip()

'Eric'

## looping through and removing all the newlines

In [10]:
with open("top_male_baby_names_2010s.txt") as male_names_file:
    
    male_names = male_names_file.readlines()
    
    name_counter = 0
    
    while name_counter < len(male_names):
        male_names[name_counter] = male_names[name_counter].rstrip()
        
        name_counter += 1
     
 
    print(male_names)

['Noah', 'Liam', 'Jacob', 'William', 'Mason', 'Ethan', 'Michael', 'Alexander', 'James', 'Elijah', 'Benjamin', 'Daniel', 'Aiden', 'Logan', 'Jayden', 'Matthew', 'Lucas', 'David', 'Jackson', 'Joseph', 'Anthony', 'Samuel', 'Joshua', 'Gabriel', 'Andrew', 'John', 'Christopher', 'Oliver', 'Dylan', 'Carter', 'Isaac', 'Luke', 'Henry', 'Owen', 'Ryan', 'Nathan', 'Wyatt', 'Caleb', 'Sebastian', 'Jack', 'Christian', 'Jonathan', 'Julian', 'Landon', 'Levi', 'Isaiah', 'Hunter', 'Aaron', 'Charles', 'Thomas', 'Eli', 'Jaxon', 'Connor', 'Nicholas', 'Jeremiah', 'Grayson', 'Cameron', 'Brayden', 'Adrian', 'Evan', 'Jordan', 'Josiah', 'Angel', 'Robert', 'Gavin', 'Tyler', 'Austin', 'Colton', 'Jose', 'Dominic', 'Brandon', 'Ian', 'Lincoln', 'Hudson', 'Kevin', 'Zachary', 'Adam', 'Mateo', 'Jason', 'Chase', 'Nolan', 'Ayden', 'Cooper', 'Parker', 'Xavier', 'Asher', 'Carson', 'Jace', 'Easton', 'Justin', 'Leo', 'Bentley', 'Jaxson', 'Nathaniel', 'Blake', 'Elias', 'Theodore', 'Kayden', 'Luis', 'Tristan', 'Ezra', 'Bryson', 

## Now we can use Python to ask interesting question of our data

Here's a program that lets the user check how popular a name was.

In [11]:
name_to_search = input("Enter a name: ")
if name_to_search in male_names:
    position = male_names.index(name_to_search)
    print(name_to_search,"was the number",(position+1),"most popular male name in the 2010s.")
else:
    print(name_to_search,"was not a popular name in the 2010s.")

Enter a name: Ryker
Ryker was the number 175 most popular male name in the 2010s.
