<img src="http://imgur.com/1ZcRyrc.png" style="float: left; margin: 20px; height: 55px">

# Reading and writing to files
---




### Lesson objectives

By the end of this lesson, you should be able to:

1. Use input from users in your code
2. Create Python scripts to read and write to files
3. Use import statements to access Python libraries
4. Use documentation to utilize module functions


## Intro

---

From what we've covered so far this week, remember that functions:
- start with `def`, followed by the name of the function.
- take inputs (or arguments).
- return outputs.
- use `return` to give the user new values.
- are used frequently to make coding more efficient.

The general **function syntax** is:
![](./imgs/python-functions.png)  
Taken from [Learn by Example - Python Functions](https://www.learnbyexample.org/python-functions/). 


## Putting it all together
---

Armed with the foundational building blocks of Python - 

- data types: floats, integers, strings, and booleans
- data structures: lists, dictionaries, sets, and tuples
- control flow: if statements and conditionals, for loops, and while loops
- functions: passing arguments through code to return some value

we can start creating genuinely useful scripts to automate specific tasks, like:

- Reading and writing txt and csv files
- Analyzing data stored in files
- Accepting user input to control scripts

## What's a Script?

---

One of the strongest use cases for Python is writing helpful scripts to automate work tasks or create helpful tools for yourself. Scripting will open up our ability to interact with files and provide meaningful input to our Python code; in other words, to start building useful things!


When people refer to scripts, they usually mean code that:
- Takes input that is subject to change
- Gives output based on the process that's been coded
- Reads existing files and/or writes to a file
- Performs a task or a series of tasks in succession

We have “performs a task” down! Let’s look at how we can build useful tools in Python by combining these other layers in our programs.

## User input and file input

---

Scripts can accept multiple different kinds of inputs. In some cases, you may want to pass a file to a script in order to analyze the contents. In other cases, you might want to build a script that takes instructions directly from a user. Python allows users to provide custom input to control what a script may be doing under the hood.

**Example:** Imagine you're building a script to report the weather in major cities across the globe. The scripts you write could make use of both user input and file input:

**User input:** your script might prompt the user with a question such as, "What city would you like to see the weather for?" The user could then enter the name of a specific city they're interested in. This allows the user to update their input dynamically as needed. 

**File input:** your script will need to get the weather data from somewhere! You could write it to pull a data file directly from a weather database and use that file as input in the functions you've built to analyze the data and build a report.

 ### Accepting input from the user

The function `input()` is built into Python and will prompt users to type in a response, then process that response as a variable. This functions accepts a string that will be output to the user - this is a great place to tell the user what type of information you want them to input!

In [6]:
# Use input() to prompt the user for their name
# Include a descriptive message when you prompt them! 
# Save the input as a variable called name

name = input('What is your name? ')
# The script pauses until the user enters some input

# We can use this variable freely in the rest of our script 
print(f'Hello, {name}!')

What is your name?  Jonas


Hello, Jonas!


### Opening files in Python

Opening files can be done with another built-in function called `open()`. This will access the contents of a specific file from within a Python script. 

There are a few different "modes" that can be used when opening files with this function. We can control which mode we're in by passing an abbreviation to the `open()` function: 

|Abbreviation| Mode                                            |
|------------|-------------------------------------------------|
| 'r'        | Read-only mode                                  |
| 'w'        | Write mode                                      |
| 'a'        | Append new contents to the end of the file      |
| '+'        | Add to any other mode to include read and write |

When accessing files, you should always **specify the mode** in which the file should be opened. Here's an example of how this might look:

```python
log = open("file.txt", "w")
```

You must also be sure to **close the file** after working with it. Use the following pattern to let Python do that for you automatically:

```python
with open("file.txt", "a") as log:
    log.write("All clear.")
```

When we want to open a file in Python, we have to tell it what the file is named. In the examples above, our file was named `file.txt.`. Note that we had to **include the `.txt` extension** - this is important information for Python to have and should always be included! 

Another thing to note is that we passed the file name as a string. Python always expects file names to have the string datatype, so make sure you're using your quote marks! 

Let's try an example. Open the file named `to_do_list.txt`. In order to read the lines stored in our `.txt` file, we can use a method called `readlines()`:

In [2]:
# Open the file 
with open('to_do_list.txt', 'r') as file:
    # Save the lines in the file as a variable
    lines = file.readlines()
    # Loop through the lines and print them out 
    for line in lines:
        print(line)

Go to the store

Buy milk, eggs, bread 

Go to the plant shop 

Buy a new houseplant

Grab a coffee

Grab a coffee

Grab a coffee

Grab a coffee

Grab a coffee

Grab a coffee

Grab a coffee

Grab a coffee

Grab a coffee

Grab a coffee

Grab a coffee

Grab a coffee





Once the cell above has executed, Python automatically closed our file thanks to the `open()` function's built-in methods. In order to keep using the information stored in the file once the file is closed, we had to save the lines in the file to a variable (in this case, we called it `lines`). 

See what this looks like when we print it out:

In [3]:
lines

['Go to the store\n',
 'Buy milk, eggs, bread \n',
 'Go to the plant shop \n',
 'Buy a new houseplant\n',
 'Grab a coffee\n',
 'Grab a coffee\n',
 'Grab a coffee\n',
 'Grab a coffee\n',
 'Grab a coffee\n',
 'Grab a coffee\n',
 'Grab a coffee\n',
 'Grab a coffee\n',
 'Grab a coffee\n',
 'Grab a coffee\n',
 'Grab a coffee\n',
 'Grab a coffee\n']

Notice the '\n' included at the end of each item! These are *new line indicators* - if you open the `.txt` file (you can do this by double clicking the file name in the lefthand browser directly in Jupyter Lab), you'll see no '\n' in the file. That's because this character is hidden and only included in the under-the-hood formatting of our text file. 

These will show up again, so it's best to be familiar with ways to strip them from our data once we've loaded it into Python! Strings have a built-in function called `.strip()` which will remove leading and trailing whitespaces, including '\n' (it will instead remove a specified character if you pass an argument into it!).

In [4]:
for item in lines:
    print(item.strip())

Go to the store
Buy milk, eggs, bread
Go to the plant shop
Buy a new houseplant
Grab a coffee
Grab a coffee
Grab a coffee
Grab a coffee
Grab a coffee
Grab a coffee
Grab a coffee
Grab a coffee
Grab a coffee
Grab a coffee
Grab a coffee
Grab a coffee


Clean our text data by stripping the new line characters and saving over each item with the clean version:

In [5]:
# iterate through. address every list[0] = mutated list [0]

for i in range(len(lines)):
    lines[i] = lines[i].strip()
    
lines 

['Go to the store',
 'Buy milk, eggs, bread',
 'Go to the plant shop',
 'Buy a new houseplant',
 'Grab a coffee',
 'Grab a coffee',
 'Grab a coffee',
 'Grab a coffee',
 'Grab a coffee',
 'Grab a coffee',
 'Grab a coffee',
 'Grab a coffee',
 'Grab a coffee',
 'Grab a coffee',
 'Grab a coffee',
 'Grab a coffee']

In order to write a new line in our file, you have to open it in modes "w" (write) or "a" (append): 

In [7]:
# Open the file 
with open('to_do_list.txt', 'a+') as file:
    # Write a new line with a new to-do item:
    file.write('Grab a coffee\n')
    
# Open the file again in read mode
with open('to_do_list.txt', 'r')  as file:
    # print each line to see your addition show up! 
    line = file.readlines()
    for line in lines:
        print(line)

Go to the store
Buy milk, eggs, bread
Go to the plant shop
Buy a new houseplant
Grab a coffee
Grab a coffee
Grab a coffee
Grab a coffee
Grab a coffee
Grab a coffee
Grab a coffee
Grab a coffee
Grab a coffee
Grab a coffee
Grab a coffee
Grab a coffee


## Activity 3.3 So Much To Do

---

Reading code can be just as important as writing code! Now that we know how to use the `open()` and `input()` functions, let's try and decipher the following code:

Add a comment to each line of this python script, attempting to explain what it does

In [None]:
with open("to_do_list.txt", "a+") as file: #opens the to_do_list.txt file under append mode (which also allows for other permissions like read/write) and gives this file the alias 'file'
    print("Welcome to ToDoVille!") #prints the string phrase listed
    lines = file.readlines() #returns list of all lines in a file
        
    current_command = "" #creates a new variable assigned to a string containing nothing
    while(current_command != "q"): #starts a while loop, with a does not equal string 'q' condition to continue/break the loop
        current_command = input("Awaiting further commands. [q] to quit, [a] to add new item, [p] to print list: ") #creates a user input line specifying commands user can type
        
        if(current_command == "a"): #tells the computer to execute the below if the user types 'a'
            new_item = input("Type your todo below: ") #creates input for user to add item to their to do list
            file.write(new_item + "\n") #takes the user input and writes their item into the file, along with a line break
            
        elif(current_command == "p"): #like above, but for user input 'p'
            file.seek(0) #tells the computer which line to head to, in this case the 'first' or 0 line position
            lines = file.readlines() #returns list of all lines in a file (now updated based on any line updates that have been made)
            
            print("Here are your items:") #prints the following string at the top 
            for line in lines: #for function to print lines in separate string format, instead of a list
                print(line) #prints line

## Introduction to modules in Python

---

Modules are collections of helpful Python code and functions we can use. Instead of reinventing the wheel, modules provide us with immediate benefits:

- Reliable, heavily tested code.
- Well-known patterns for easy collaboration.
- The ability to focus on your application’s higher-level needs. 

Throughout your career in data analysis and programming, you'll frequently use modules written by other developers just like you. They are free to use, and you can think of them as extensions of Python's functionality.

The syntax for importing a module looks like this: 

```python
import random
```

### Importing From the Standard Library

Some modules are so commonly used that they are bundled with Python itself.

To get a feel for modules, let’s explore the random module. In order to use a module, we first have to import it by including an import statement at the top of our code:

In [None]:
#data analyst party!!!!! 

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt



Importing this module gives us access to all of its methods. If you aren't sure what a function in a module does, you can find usually a description and examples of use cases in the module's documentation. 

Let's familiarize ourselves with documentation by checking the [documentation for the random library](https://docs.python.org/3/library/random.html). See if you can find out what the function `random.randint()` does.

To use a function in a module, call it by typing the module's name, followed by `.function_name()`.

In [16]:
# Use the randint function to get a random die roll:
import random as r

roll = r.randint(1,6)

roll

4

The downside of borrowing someone else's code is having no idea how to use it — at first, that is!


[Python’s Standard Documentation](https://docs.python.org/3/contents.html) provides a good place to begin reading up on the random module. Many other modules bundled into base python are also documented there, but some of the modules you'll use in your work will have different documentation sources - luckily, they're usually just one Google search away!

We'll play around with the random module more next week. For now, let's get back to our main lesson - using files as input in scripts.

## Combining this to read `.csv` files

---

Managing files manually can get a bit awkward. Fortunately, there are plenty of modules out there to make things easier — especially for common file and data formats like `.csv`!


Let’s explore a basic module called **csv** for just this purpose.

First, take a few minutes to familiarize yourself with the [documentation for the csv library](https://docs.python.org/3/library/csv.html).

Next, let's import the csv module:

In [17]:
import csv 

To practice using the csv module, we've provided you with a **list of dictionaries** below. Remember that a list of dictionaries is a great way to store table-like information or datasets in simple structures in Python.

Let's learn how to translate this list of dictionaries into a `.csv` file using the csv module.

In [18]:
# 3.4 Writing to CSV
employees = [
  {
    "first_name": "Hennah", 
    "last_name": "Chadwick",
    "job_title": "Vice President",
    "hire_date": 1985,
    "performance_review": "excellent"
  }, {
    "first_name": "Michael", 
    "last_name": "Bolton",
    "job_title": "Programmer",
    "hire_date": 1995,
    "performance_review": "poor"
  }, {
    "first_name": "Ellesse", 
    "last_name": "Jaramillo",
    "job_title": "Programmer",
    "hire_date": 1989,
    "performance_review": "poor"
  }, {
    "first_name": "Samir", 
    "last_name": "Nagheenanajar",
    "job_title": "Programmer",
    "hire_date": 1974,
    "performance_review": "fair"
  }, {
    "first_name": "Milton", 
    "last_name": "Waddams",
    "job_title": "Collator",
    "hire_date": 1974,
    "performance_review": "does he even work here?"
  }, {
    "first_name": "Bob", 
    "last_name": "Porter",
    "job_title": "Consultant",
    "hire_date": 1999,
    "performance_review": "excellent"
  }, {
    "first_name": "Bob", 
    "last_name": "Slydell",
    "job_title": "Consultant",
    "hire_date": 1999,
    "performance_review": "excellent"
  }
]


There are a few steps here, so let's break this down. 

First, we're going to want to use the keys of this dictionary as the column names. Let's start with getting the list of what those column names will be: 

In [19]:
# Get the dictionary:
employees[0]

{'first_name': 'Hennah',
 'last_name': 'Chadwick',
 'job_title': 'Vice President',
 'hire_date': 1985,
 'performance_review': 'excellent'}

In [22]:
# Get the dictionary keys:
employees[0].keys()

dict_keys(['first_name', 'last_name', 'job_title', 'hire_date', 'performance_review'])

In [23]:
# Convert the dictionary keys into a list:
list(employees[0].keys())

['first_name', 'last_name', 'job_title', 'hire_date', 'performance_review']

In [24]:
for element in list(employees[0].items()):
    print(element[0])

first_name
last_name
job_title
hire_date
performance_review


Great! We can use the above as our headers names.

Next, open the [documentation for csv](https://docs.python.org/3/library/csv.html) and read the description for the function `csv.DictWriter()`. A short example is included there that we can model our code after. You can copy the example into your notebook if it helps to have it visible while you're working! 

```python
import csv

with open('names.csv', 'w', newline='') as csvfile:
    fieldnames = ['first_name', 'last_name']
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames)

    writer.writeheader()
    writer.writerow({'first_name': 'Baked', 'last_name': 'Beans'})
    writer.writerow({'first_name': 'Lovely', 'last_name': 'Spam'})
    writer.writerow({'first_name': 'Wonderful', 'last_name': 'Spam'})
```

Finally, let's write the list of employee dictionaries to a new `.csv` file titled `all_evaluations.csv`. To create the new file, we can pass this file name into the `open()` function directly. Remember to use the "write" mode!

In addition to writing this dictionary to a file, we also want to add a column named "action_item" with the following logic:
- If the performance_review is "poor" the action_item should be "terminate"
- If the performance_review is "excellent" the action_item should be "bonus"
- Otherwise, the action_item should be "attend GA workshop"

In [27]:
with open("all_evaluations.csv", "w") as file:
    # Create a fieldnames list that will act as column headers using our dictionary keys 
    fieldnames = list(employees[0].keys())
    # Append the "action_item" column to the end of our fieldnames
    fieldnames.append('action_item')
    # Pass the file name and fieldnames list to the DictWriter function 
    writer = csv.DictWriter(file, fieldnames=fieldnames)
    # Pass the file name and fieldnames list to the DictWriter function 
    writer.writeheader()
    # Loop through each employee and determine what to put in their action_item column
    for employee in employees:
        # At the end of the loop, add the action item -- purely manip with dictionary
        row = employee
        if employee['performance_review'] == 'poor':
            row['action_item'] = 'terminate'
        elif employee['performance_review'] == 'excellent':
            row['action_item'] = 'bonus'
        else:
            row['action_item'] = 'attend GA workshop and be gifted 1000 red staplers'
        #write all rows
        writer.writerow(row)

In [None]:
with open("all_evaluations.csv", "w") as file:


Great work! You should see your file appear in the file browser to the left. You can double click to open it and see our work saved!

Next, we want to reward the employees who had excellent performance reviews. Let's create a new file named `bonus_list.csv` that contains the names of employees with "excellent" reviews that we want to give bonuses to this quarter.

In [29]:
with open('all_evaluations.csv', 'r') as file:    
    rows = csv.DictReader(file)
    
    for row in rows: 
        print(row['first_name'], row['last_name'], row['job_title'])

Hennah Chadwick Vice President
Michael Bolton Programmer
Ellesse Jaramillo Programmer
Samir Nagheenanajar Programmer
Milton Waddams Collator
Bob Porter Consultant
Bob Slydell Consultant


```python
# Open a new file named bonus_list.csv 
# Be sure to use the "write" mode!

    # Remember, to use the DictWriter function we need a list of column headers
    # Use the keys in the employees dictionaries 
    
    # Call the DictWriter function
    
    # Loop through each employee and check whether their performance review is "excellent"
            
        # If it is excellent, write their name to the bonus_list file
```

In [30]:
employees[0].keys()

dict_keys(['first_name', 'last_name', 'job_title', 'hire_date', 'performance_review', 'action_item'])

In [32]:
with open('bonus_list.csv', 'w') as file:
    fieldnames = list(employees[0].keys())
    writer = csv.DictWriter(file, fieldnames)
    writer.writeheader()
    for employee in employees:
        if employee['performance_review'] == 'excellent':
            writer.writerow(employee)


If you did this right, you should see a new file called `bonus_list.csv` in your file browser. Open it up to see who it includes. 

Next week, we'll get started using a powerful data analysis library called pandas. It's one of the most commonly used libraries for working with `.csv` files and will allow you to work with and visualize datasets in Python with ease. 