# Assignment 7 - File IO, Functions and Logic

## Learning Objectives <a id="section-objectives"></a>
This lesson meets the following learning objectives:

1. Program linear scripts in python.
2. Program scripts that include functions, logic, and control statements


You will meet these objectives by learning to read and write to files, write functions and use logic in programs.

## Instructions <a id="section-instructions"></a>
Read through all of the text in this page. This assignment provides step-by-step training divided into numbered sections. The sections often contain embeded exectable code for demonstration.  Section headers with icons have special meanings: 

- <i class="fas fa-puzzle-piece"></i> The puzzle icon indicates that the section provides a practice exercise that must be completed.  Follow the instructions for the exercise and do what it asks.  Exercises must be turned in for credit.
- <i class="fa fa-cogs"></i> The cogs icon indicates that the section provides a task to perform.  Follow the instructions to complete the task.  Tasks are not turned in for credit but must be completed to continue progress.

Review the list of items in the **Expected Outcomes** section to check that you feel comfortable with the material you just learned. If you do not, then take some time to re-review that material again. If after re-review you are not comfortable, do not feel confident or do not understand the material, please ask questions on Slack to help.

Follow the instructions in the **What to turn in** section to turn in the exercises of the assginment for course credit.

## Background <a id="section-background"></a>
In this lesson we will learn how to read data from files, write data to files, write new functions and use logic.  One of the most important tasks in data analytics is the ability to read data files into your Python scripts and to write files with results or reformatted data.  The process of reading and writing files is also referred to as File Input/Output or File IO.  Next, functions are a critical component of all modern programming languages. They provide several benefits.  They help you to simplify your programs and encapsulate commonly used code into reusable bits of code. Lastly, programs must be able to make decisions and logic is key to programming decision making.

## Tips for Success <a id="section-tips"></a>
As a reminder, here is a quick list of the tips for success from [Assignment 6 - Python Basics](A06-Python-Basics.ipynb)

* <i class="fas fa-clone"></i> **Do not cut-and-paste**.
* <i class="fas fa-wrench"></i> **Do each practice excercise**.
* <i class="fas fa-clock"></i> **Take time to understand**.
* <i class="fas fa-question-circle"></i> **Ask for help**.
* <i class="fas fa-users"></i> **Do not compare yourself**.

## 1. File Input/Output (IO) <a id="section-1"></a>
Python provides a built-in function named `open()` that allows you to open a file either for reading or writing. We will learn to use it to do both.  There are two types of files that Python can open: plain **text** files and **binary** files.  A text file contains human-readable text.  It may have some special encodings, which we will learn about, but in general you can open it with any text editor and read it.  A binary file is one where the contents are machine-readable. The language of the computer is 1's and 0's (binary). These files store information in patterns of 1's and 0's that are not legible by a human. If you open one with a text editor you will often see a series of illegible characters. Text and binary files can store the same information but binary files can be organized to access data more quickly and store data more compactly.  In data analytics, many data files are plain text files, often in tab-delimited, or comma-separated formats.  In this lesson, we will learn to read and write text files only. Binary files are more complicated and if you see a binary file that needs reading there is often a Python library with functions that you can use to read the file for you.

### 1.1. Opening a File <a id="section-1.1"></a>
In Python, opening a file asks Python to give you **access** to a file.  As part of the open process, Python will also reserve a **file handle** which is like a cursor in a word processor. The file handle indicates precisely where you are in the file.  When the file is first openend, the file handle is placed at the beginning of the file.

The `open()` function can take several arguments but only the file name is required if we want to open a file for reading.  The `open()` function will then return a file **object** that will allow us to perform operations on the file--such as reading its contents.  

Suppose we had a file named `my_data.csv`.  We can open that file by simply passing the filename to the `open()` function:

```python
data = open("my_data.csv")
```

The above line of code will work only if the `my_data.csv` file is in the same directory where the script is being run.  If the file is located elsewhere you can provide the path to the file. This can be the absolute path or a relative path. Remember, we learned about the path in [Assignment 3: The Command-line](A03-Command-line.ipynb).  Suppose the file were located in our home directory we could use the following:

```python
data = open("~/my_data.csv")
```

Or for an absolute path to our home directory if located at `/Users/bob`:
```python
data = open("/Users/bob/my_data.csv")
```

Of if on a Windows machine:

```python
data = open("C:\Users\bob\my_data.csv")           
```

In all three examples of the `open()` function, it returns a **file object** and stores that object in a variable named `data`.  

### 1.2. The File Object <a id="section-1.2"></a>
In the previous section, the code examples used the `open()` function and created a new `data` varible that stores the
file object created by opening a file.  We have seen objects before. You may recall in the [Assignment 6 - Python Basics](A06-Python-Basics.ipynb) that strings are objects too. Objects have functions. Remember, we used the `format()` function of a string object to format the string for printing?  Similarly, the file object has functions that allows you to perform operations on the newly opened file.  The table below lists some commonly used functions of the file object.

| Function Name | Purpose | 
| ------------- | ------- |
| `read()` | Reads *all* of the contents from a file and returns it.|
| `readline()` | Reads one line at a time from the file. Repeated calls to this function will read successive lines in the file.|
| `write()` | Writes a line to the file.| 
| `seek()` | Moves the file handle to a new location in the file.|
| `close()` | Closes the file--no more reads from or writes to the file can occur. |


Just like with string objects, you can **call** an object's function by using the period `.`.  For example, to close the file you would use the following:

```python
data.close()
```
The above line of code calls the `close()` function that belongs to the file object stored in the `data` variable.

### 1.3. Reading a file <a id="section-1.3"></a>
Once the file is opened and you have a file object, you can read the file by using either the `read()` function or the `readline()` function. 

#### 1.3.1. The `read()` Function <a id="section-1.3.1"></a>

As indicated in the table of section 1.2, the `read()` function will read the entire contents of a file and return it.  You must store those contents in some variable for further action:

```python
# Open the file
my_file = open("~/my_data.csv")

# Store the file contents
contents = my_file.read()

# Do something with the file contents.
print(contents)
```

In the code above, the `open()` function creates a file object and gives your program access to the file. It also sets the file handle at the beginning of the file.  Next, the `my_file.read()` function will read all of the contents of the file and returns it as a large string object. That string object gets stored in the variable named `contents`. 

Lets read a real data file.  In the [Day 3 Exercises: The Command-line & Git](D03-Command_Line-git.ipynb) notebook we learned about the Iris dataset.  You can review the background in that notebook to learn about the dataset, but lets use the Iris dataset to practice.   A copy of the data file is found in this repository in the `data` folder named `iris_data.csv`. The following code will open that file, retrieve its contents print the number of characters in the file and then close the file:

In [None]:
# Open the iris_data.csv file. It is in the data folder.
iris_file = open("./data/iris_data.csv")

# Read all the contents of the iris data.
iris_data = iris_file.read()

# Print the number of characters in the file.
print(len(iris_data))

# Close the file.
iris_file.close()

To count the number of characters we can use the built-in `len()` function. It will return a number specifying the number of characters in the `iris_data` string object.

<div class="alert alert-warning">
    <b>Critical</b>: The <i>read()</i> function is simple to use, but is not practical for really large files. This is because variables are stored in your computer's main memory (its RAM).  For small files storing the entire contents in a variable (or main memory) will not be problematic. But consider a file of 6GB in size.  Does your computer have enough RAM to load such a file in main memory?
</div>

#### 1.3.2. The `readline()` Function <a id="section-1.3.2"></a>
The `readline()` function is similar to the `read()` function but it does not read the entire contents of the file, instead it will read only one line at a time.  This is really useful for large files. Unlike the `read()` function, you will not be storing the entire contents of a file in your computer's main memory. Instead, you only store one line at a time.  The following code example will retrieve the first two lines of a file:

```python
# Open the file
my_file = open("~/my_data.csv")

# Do something with the file contents
line1 = my_file.readline()
line2 = my_file.readline()
print(line1)
print(line2)
```
In the code sample above, the two calls to `my_file.readline()` each return a string object that gets stored in the variables `line1` and `line2` respectively.

Lets try the same approach on the Iris dataset:

In [None]:
# Open the iris_data.csv file. It is in the data folder.
iris_file = open("./data/iris_data.csv")

# Read all the first two lines of the Iris dataset and print them.
line1 = iris_file.readline()
line2 = iris_file.readline()
print(line1)
print(line2)

# Close the file.
iris_file.close()

We can learn a bit about the Iris dataset by printing the first two lines. We learn that the file is comma-seperated (although that was obvious from the file extension `.csv`). We also learn that the file has a header line that contains the names: `sepal_length,sepal_width,petal_length,petal_width,species`.  Lastly, we see that the the first columns of data are numeric continuous (floats) and the last column is text (a string).  

Take a moment to click on the `iris_data.csv` file in the `data` folder.  You will notice that JupyterLab gives you a nice tabular view of the data where the rows and columns are clearly visible.

<img src="media/A07-jupyter_data.png"></src>


From this view of the file in JupyterLab you see the first two lines of the file match the lines we printed in the cell above. However, when we printed the first two lines there is an extra line between the header and the first line of data: 

```
sepal_length,sepal_width,petal_length,petal_width,species

5.1,3.5,1.4,0.2,setosa
```

This extra line is not present in the data file! How did that extra line get there?  The answer is that when we read each line we are also reading in the line ending characters. Each line in text file must tell the computer where it ends. This is accomplished by the presence of special hidden characters at the end of each line.  In windows these hidden characters are the escape codes: `\r\n`; in OS X the code is `\r`; and in Linux it is `\n`.  If you need a reminder about escape codes, please see the section on Strings in the [Assignment 6: Python Basics](A06-Python-Basics.ipynb) notebook. So, each line read by the `readline()` function has this invisible end-of-line character sequence. The extra line is added by the `print()` function! The `print()` function will always add a new line to the end of anything it prints.  There are two ways to fix, or exclude, the extra line.

**Method 1**: The first way to avoid an extra line in the output is to tell the `print()` function not to add the line endings. We know the string already has them, so the print function does not need to add them.  This is done by giving the named argument `end` to the function and setting it to the empty string. For example:

```python
print(line1, end="")
```
The following code fixes the extra line in the output using this method:

In [None]:
# Open the iris_data.csv file. It is in the data folder.
iris_file = open("./data/iris_data.csv")

# Read the first two lines of the Iris dataset and print them.
line1 = iris_file.readline()
line2 = iris_file.readline()
print(line1, end="")
print(line2, end="")

# Close the file.
iris_file.close()

**Method 2**: The second way to avoid an extra line in the output is to remove the line ending from the string.  This is done by calling the `strip` function that belongs to the string object. The strip function removes all white space in front of and at the end of the string. For example:

```python
print(line1.strip())
```
The following code fixes the extra line in the output using this method:

In [None]:
# Open the iris_data.csv file. It is in the data folder.
iris_file = open("./data/iris_data.csv")

# Read the first two lines of the Iris dataset and print them.
line1 = iris_file.readline()
line2 = iris_file.readline()
print(line1.strip())
print(line2.strip())

# Close the file.
iris_file.close()

Notice in method #2 the print function is allowed to add line endings because we stripped them out with the `strip()` function.  

###  <i class="fas fa-puzzle-piece"></i> 1.4. Practice 
What would happen if you used both methods to exclude the end of line characters?  Write in the box below what you think would happen. Then, in the next cell write the code that would use both methods and see if it matches what you thought would happen.

### 1.5. Closing a file <a id="section-1.5"></a>
As we learned in the table of section 1.1.2, the `close()` function is used for closing a file. Once closed, no more read or write operations can be performed on the file. You would need to re-open the file if you want to perform additional operations.  The following code provides a complete example for opening, reading from a file and then properly closing it. 

```python
# Open the file
my_file = open("~/my_data.csv")

# Do something with the file contents
contents = my_file.read()
print(contents)

# Close the file
my_file.close()
```

You should always close any file you open before the program terminates.  Python will give a warning when you run the program if you fail to close a file.  Actually, despite the warning, Python will close the file for you. For a short program like the code above, the warning is only annoying. But for scripts that open lots of files it can be problematic if you forget to close a file.  This is because operating systems (Windows, OS X, Linux) have limits on the number of file handles you can have open at once.  If you open thousands of files and do not close them, you can use up all of the available file handles and no other program on your computer, including your program, can open files until your program terminates.

<div class="alert alert-warning">
    <b>Critical</b>: Always close open files in the code prior to the end of the program!
</div>

### 1.6 Writing to a File <a id="section-1.6"></a>

We learned to get access to a file we need to first use the `open()` function.  By default, the `open()` function will *only* open the file for reading. It will not allow us to write to the file.  We need to provide a second argument to tell Python to give us access to write to the file.  This second argument is called **mode**. To open a file for writing we need need to provide the `"w"` string. For example:

```python
# Open the file for writing.
my_file = open("~/my_data.csv", "w")
```

Notice, in the instruction above, the second argument to the `open` function is the string `"w"`. This tells Python to open the file for writing.   The `"w"`, or "write", mode will also create the file if it does not exist.  This is convenient because we do not have to precreate the file.  But, what if the file already exsits? If we use the `"w"` mode and the file exists then the file will be opend for writing but all existing contents will be removed. The file will become empty and the file handle will be set at the beginning of the file, ready for us to add new content.  If the file exists and we do not want to erase the contents we can use the `"a"` mode (for "append").  The `"a"` mode will open the file for writing, it will not empty the file contents, and it will set the file handle at the end of the file. Once the file is opened and ready for writing, then there are two ways we can write to a file: using the `write()` method of the file object or, using the `print()` function.

#### 1.5.1. The `write()` Function <a id="section-1.5.1"></a>
Writing to a file is relatively easy using the file handle's `write()` function by providing the string as an argument. For example:

```python
# Open the file for writing.
my_file = open("new_data.txt", "w")

# Write text to the file.
my_file.write("How now brown cow.\n")
my_file.write("The quick brown fox jumped over the lazy dog.\n")

# Close the file.
my_file.close()
```

In the code above, a file named `new_data.txt` will be created and two new lines will be added. Notice that at the end of each string provided to the `write()` function that we have a newline character `\n`. Unlike the print function, the `write()` function does not automatically add an end-of-line character to  the string. You have to specify where you want new lines.  For that reason, the following `write()` call would accomplish the same results as the two lines shown in the code above:

```python
my_file.write("How now brown cow.\nThe quick brown fox jumped over the lazy dog.\n")
```

The `write()` function doesn't expect the new line character to be at the end of the line. You control where those appear in the string.

Let's run the code and see what happens: 

In [None]:
my_file = open("new_data.txt", "w")

# Write text to the file.
my_file.write("How now brown cow.\n")
my_file.write("The quick brown fox jumped over the lazy dog.\n")

# Close the file.
my_file.close()

The above code does not print any output when the cell is run, but it does create a new file named `new_data.txt`. It should appear in the file browser to the right in JupyterLab. If you double click the file you can see the contents. Do they match what we expect?

#### 1.5.2. Write With `print()` <a id="section-1.5.2"></a>
The `write()` function provides an easy way to write contents to a file. But you can also use the `print()` function.  To do this, you provide a new named argument called `file` where you specify the file object.  For example:

```python
# Open the file for writing.
my_file = open("new_data2.txt", "w")

# Write text to the file.
print("How now brown cow.", file=my_file)
print("The quick brown fox jumped over the lazy dog.", file=my_file)

# Close the file.
my_file.close()

```
Notice in the code above that the `print()` function recieves a new named argument called `file` set to the name of the file object.  Also notice, we did not include the line endings in the string like we did for the `write()` function. This is because the print function will add the line endings (unless we tell it not to).  Lets run this code and see for ourselves if the output is identical to the output from Section 1.5.1

In [None]:
my_file = open("new_data2.txt", "w")

# Write text to the file.
print("How now brown cow.", file=my_file)
print("The quick brown fox jumped over the lazy dog.", file=my_file)

# Close the file.
my_file.close()

Open the file named `new_data2.txt`. Are the contents the same as `new_data.txt`?  

**Note**: You will not need to commit these two files at the end of this lesson, so feel free to delete the `new_data.txt` and `new_data2.txt` files if you like.

### 1.6. Character Encoding <a id="section-1.6"></a>
In the early days of computing, English text was represented in computers using the [ASCII character encoding](https://www.asciitable.com/) method. The language of computers is 1's and 0's and these 1's and 0's can be combined to form numbers.  Thus, the ASCII encoding used numbers to represent characters in the English language. For example, the character "A" was encoding using the number 65. A "B" was encoded using the number 66, etc. A lowercase "a" was encoding using the number 97, "b" as 98 and so forth.  The ASCII encoding is still used today, but it cannot be used for languages other than English, it can only represent 128 characters and those 128 numbers are already accounted for. 

The [UTF-8](https://en.wikipedia.org/wiki/UTF-8) character encoding however is a different encoding that supports 1,112,065 charcters.  There are so many characters that [emojis are encoded](https://emojipedia.org/unicode-8.0/) in the UTF-8 standard. There are other encodings besides UTF-8 as well. 

To explore encoding, click on the file named `languages.txt` in the `data` folder of this repository. This file was obtained from [Learn Python the Hard Way](https://learnpythonthehardway.org/python3/languages.txt).  You will see a list of languages, written in their own alphabets. This file is UTF-8 encoded.

Lets try opening this file in the way we just learned:

In [None]:
lang_file = open("data/languages.txt")
print(lang_file.read())
lang_file.close()

Uh oh! We get an error message stating that the file cannot be opened because the "codec can't decode byte...".  The `open()` function, by default, expects ASCII encoding.  If the text file uses a different encoding we must specify it using the argument named `encoding`.  The following example will open the file using the `UTF-8` encoding which will cause the contents of the file to be propery read.

In [None]:
lang_file = open(file="data/languages.txt", encoding="UTF-8")
print(lang_file.read())
lang_file.close()

Often, plain text data files are provided in ASCII format. This is because they typically contain only integers, floats and English style strings.  So you may not encounter too many data files with UTF-8 encoding. But it is possible!  You should be aware that other formats exist so you can read these different files.

### 1.7. Splitting a Line <a id="section-1.7"></a>
Lets look again at the Iris dataset.  We saw that the first line of data looks like the following:

```
5.1,3.5,1.4,0.2,setosa
```

The first four columns are continous numeric (or float) values. The last is a string value.   In the examples above, we used the `readline()` function to read the line as a single string.  How do we separate the values into their own variables?  The answer is the `split()` function of the string object.  The `split()` function will allow us to provide a character that serves as the **delimeter** and will "chop" the string into smaller individual strings by cutting on the delimter character.  To split the string on the comma, run the following code.  You will notice we have run similar code above, but this time we use the `split()` function to chop the line.


In [None]:
# Open the iris_data.csv file. It is in the data folder.
iris_file = open("./data/iris_data.csv")

# Read the first two lines of the Iris dataset and print them.
header = iris_file.readline()
row1 = iris_file.readline()

# Chop the line using the comma.
print(row1.strip().split(","))

# Close the file.
iris_file.close()

The `split()` command now returns a list of strings!  You can easily identify lists in Python because they are surrounded by square brackets `[]` and the elements are separated using commas (the commmas are not part of the strings in the list).  We will learn more about Python lists in a future lesson.  


### 1.8. Function Chains <a id="section-1.8"></a>
Notice in the code above this construct:  `row1.strip().split(",")`. What is that?  We know that the string object has a `strip()` function and the string object has a `split()` function but how are we able to connect them together like that?  Consider the following code. It behaves equivalently:

```python
row1_stripped = row1.strip()
row1_list = row1_stripped.split(",")
```
In the code above we strip the line endings from the string using the `strip()` function and store the resulting string object in a variable named `row1_stripped`. It is a string object.  Then in the second line we split the string (with the line endings removed) and store the result in a list object. 

We can avoid saving return values in variables if we never need to use them again because Python can "substitute" return values in place of the function call.  In this line of code, `row1.strip().split(",")`,  the `row1.strip()` function returns a string object. So, when Python executes the code, it will replace the `row1.strip()` portion with an "invisible" variable containing the returned string object. It can then run the  `split(",")` function on the invisible variable.

If you remember the return value of each function call, you can create these **chains** of function calls.

### 1.9. Unpacking <a id="section-1.9"></a>

In the code above, we have the variables in a list. But, what if we want to put these values directly into distinct variables? To do this we can use the **unpacking** method. Remember we used unpacking when we learned about program arguments in [Assignment 6 - Python Basics](A06-Python-Basics.ipynb)?  Unpacking simply copies the values of a list into a comma separated list of variables.  

Review and then run the following cell.  It unpacks the list after splitting into distinct variables and then uses those variables in a formatted string.

In [None]:
# Open the iris_data.csv file. It is in the data folder.
iris_file = open("./data/iris_data.csv")

# Read the first two lines of the Iris dataset and print them.
header = iris_file.readline()
row1 = iris_file.readline()

# Split the line using the comma and use unpacking to 
# put the values into separate variables.
sepal_length, sepal_width, petal_length, petal_width, species =  row1.strip().split(",")

# Now we can use the variables to print a formatted string.
print("Species: {}. Sepal length x width = {}cm x {}cm. Petal length x width = {}cm x {}cm.".format(species, sepal_length, sepal_width, petal_length, petal_width))

# Close the file.
iris_file.close()

The unpacking occurs in this line of code:

```python
sepal_length, sepal_width, petal_length, petal_width, species =  row1.strip().split(",")
```
Here, instead of returing a list of of 5 elements, the 5 elements resulting from the `split(",")` are stored in the five variables. 

### 1.10. Type Casting <a id="section-1.10"></a>
The last item we want to learn regarding import of data from files is setting the correct types for data we import. Remember in [Assignment 6 - Python Basics](A06-Python-Basics.ipynb) we learned of several basic Python data types:  **int**, **float**, **string** and **boolean**.  If we are reading data files then we want to be sure that the data we read gets stored in variables in the correct type.  By default, the `read()` and `readline()` functions return data from a file as a string object. However, the first data line shows that we have data that is not a string:

```
5.1,3.5,1.4,0.2,setosa
```
Even after unpacking, the data are separate strings.  In the Iris dataset, we want to convert the first 4 columns of data to floats and we can leave the last as a string.  Changing a variable from one data type to another is called **type casting**.  To convert string values to an integer or float we can use the `int()` and `float()` functions respectively.  Lets add type casting to the code we just ran:

In [None]:
# Open the iris_data.csv file. It is in the data folder.
iris_file = open("./data/iris_data.csv")

# Read all the first two lines of the Iris dataset and print them.
header = iris_file.readline()
row1 = iris_file.readline()

# Chop the line using the comma and use unpacking to 
# put the values into separate variables.
sepal_length, sepal_width, petal_length, petal_width, species =  row1.strip().split(",")

# Perform type casting to convert the string values to floats.
print(type(sepal_length))
sepal_length = float(sepal_length)
print(type(sepal_length))
sepal_width = float(sepal_width)
petal_length = float(petal_length)
petal_width = float(petal_width)

# Now we can use the variables to print a formatted string.
print("Species: {}. Sepal length x width = {}cm x {}cm. Petal length x width = {}cm x {}cm.".format(species, sepal_length, sepal_width, petal_length, petal_width))

# Close the file.
iris_file.close()

In the code above you will find the following lines:

```python
print(type(sepal_length))
sepal_length = float(sepal_length)
print(type(sepal_length))
```
Here we have two `print()` statements before and after the `float()` type casting function. The `type()` function used in the `print()` function can tell us what the variable type is. Notice in the output from the cell above that prior to the type cast the variable is a `str` (string) and after is now a `float`!

### <i class="fas fa-puzzle-piece"></i> 1.11 Practice
We just learned a lot about opening, reading and writing files!  Lets try to bring it all together through a practice exercise.   In the data folder of this repository you will find a file named `ecoli_data.txt`.  This data was downloaded from [UCI Machine Learning Repository Ecoli Data set page](https://archive.ics.uci.edu/ml/datasets/Ecoli). For this practice section we do not need to understand what the data is. If you are interested please read more on the data webpage.  Here, we only need to know what the data columns are.  Using this file, perform the tasks listed below.

#### Task 1
In the cell below, write Python code to open the file and print the first three lines of the file. 

In the cell below, enter the data types of each column of data.

#### Task 2
In the cell below, adjust your Python code to split the first three lines of data into lists and then print the lists instead of the line.  **Hint**: If you do not provide a delimeter to the `split()` function it will split on empty white space. You can read more about the `str.split` function [here](https://docs.python.org/3.3/library/stdtypes.html#str.split).

#### Task 3

Use unpacking to store the contents of the third line into 9 different variables (one for each column) and make sure the variables are of the correct data type for the value (i.e., a **float** for numeric values). Print out the types of each variable to prove to yourself that you have the data types correct. Write your code in the following cell

#### Task 4
Write python code in the following cell that will performs the following:
1. Open a file for writing, it should use UTF-8 encoding.
2. Write the text with Chinese characters shown below in the file. It should save the new file in UTF-8 encoding. 
3. Be sure to close your files!

```
滚滚长江东逝水，浪花淘尽英雄。是非成败转头空，青山依旧在，几度夕阳红。白发渔樵江渚上，惯看秋月春风。一壶浊酒喜相逢，古今多少事，都付笑谈中。是非成败转头空，青山依旧在，惯看秋月春风。一壶浊酒喜相逢，古今多少事，滚滚长江东逝水，浪花淘尽英雄。 几度夕阳红。白发渔樵江渚上，都付笑谈中.......
```

Now, read in the file again without UTF-8 econding. Print the contents. It should not look correct. This will prove to yours that you successfully wrote a file in the UTF-8 encoding format. 

Lastly, read in the file again but with UTF-8 encoding enabled.  Print the contents, they should look correct.

## 2. Python Documentation <a id="section-2"></a>
Before we continue lets pause to explore Python documentation.  We just learned about the `open()` function and how it can be used to provide access to a file for either reading or writing.  We learned that it can accept several input arguments including the file name, a named variable `mode` (for specifing writing or appending) and another named `encoding` (to set UTF-8 encoding).  But that function has more arguments. The online Python documentation can help us better learn how functions and objects work.  Below are links for Python Documentation pages:

- [Python 3 Documentation Home](https://docs.python.org/3/index.html). The home page for core Python documentation.
- [Python 3 Lanugage Reference](https://docs.python.org/3/reference/index.html). A walk through of the langage syntax.
- [Python 3 Tutorial](https://docs.python.org/3/tutorial/index.html). Quick lessons for learning Python.
- [Python 3 Built-in Function Reference](https://docs.python.org/3/library/functions.html). Use this for instructions for all built-in Python functions.

<div class="alert alert-warning">
    <b>Important</b>: As you develop your programming abilities you will often need to refer to the language documentation. Always make sure you know where to find it. It is okay to refer to documentation if you cannot remember syntax.
</div>

### <i class="fa fa-cogs"></i> Explore 

Take a moment to peruse the [Python 3 Built-in Function Reference](https://docs.python.org/3/library/functions.html).


## 3. Functions <a id="section-3"></a>
Functions are one of the most important developments in programming languages.  As we will learn, they allow you to encapsulate commonly used functionality into reusable chunks of code.  

### 3.1. More on function arguments <a id="section-3.1"></a>
Before we learn to write our own functions, lets take another look at function arguments. If you look at the built-in function reference for the [open() function](https://docs.python.org/3/library/functions.html#open), you will see a formal definition of the function that looks like this:

```python
open(file, mode='r', buffering=-1, encoding=None, errors=None, newline=None, closefd=True, opener=None)
```
Notice there are several arguments that you can provide to the `open()` function. We already used **file**, **mode** and **encoding** in the previous sections. We will not learn about those other arguments here, but you can read the online documentation to learn more. 

One thing we can immediately see from this definition, though, is that some of the arguments look like variables with assigned values (e.g. `bufferng=-1`). When an argument has an assignment in the function definition it means that variable gets a default value if the variable is not provided.   Notice that the `mode` argument is set to the string `'r'` which sets the mode for "reading". That's why the `open()` function always supports reading by default. You have to specify `mode='w'` to open a file for writing.

One other item to point out.  If you specify the variable names when you call the function then they can appear in any order. Otherwise, arguments must appear in the order that the function defintion describes. The following are all valid calls to the `open()` function:

```python
# Open a file for reading.
data_file1 = open('my_data1.csv', encoding='UTF-8')

# Open a second file for writing.
data_file2 = open('my_data2.csv', encoding='UTF-8', mode='w')

# Open a third file for appending.
data_file2 = open(encoding='UTF-8', mode='a', file='my_data3.csv')
```
In the first two open calls, the file variable is not named, so it must appear first because its first in the function definition.  Named variables afterwards can appear in any order. In the first call, notice there is no mode argument so it gets a default value of `'r'`.

### 3.2. Creating your own function <a id="section-3.2"></a>
Function consist of three parts:
1. A **definition** line that specifies the name of the function and its supported arguments.
2. Internal code that performs some task.
3. An optional **return value**. Functions do not have to return a value, but they can.

To define a function in python you begin the line with the keyword `def`, the function name follows and then the list of arguments.  For example:

```python
# Function definition line:
def my_function()

    # Task to perform:
    print("Hello World!")    
    
    # Return value
    return 1
```
The code above defines a function named `my_function()`. There are no arguments defined between the parentheses so this function has no arguments that the caller needs to provide. The function perfoms the task of printing "Hello World!" then it returns an integer value of 1.  

When writing functions, there are a few small rules:

- Functions can only have letters, numbers and underscores in the name.
- Function names cannot start with a number.
- Function arguments required by the function are always surrounded by parentheses and separated by commas.
- You must indent the internal lines of code that belong to the function.  
- You end your function by no longer indenting.  

Lets look at another example:

```python
def addition(val1, val2):
    result = val1 + val2
    return result
```

This function is named `addition()`, it accepts two arguments named `val1` and `val2`. Inside the function it performs the task of adding the numbers and then it returns the result.  

### 3.3. Calling your function <a id="section-3.3"></a>

Now that we have a function, we can use anytime to add two numbers. This may seem a bit uncessary since we can already do that with the `+` operator, but we will proceed to to demonstrate functions.  Now that we have the function defined we want to use it (or **call** it). The following code demonstrates how to call a function:

```python

# The function is defined:
def addition(val1, val2):
    result = val1 + val2
    return result

# Call our function.
sum = addition(100, 200)

# Print the return value.
print(sum)

```

Remember, the function definition ends when the indenting of lines of code stops.  This means the line of `sum = addition(100, 200)`, which is not indented, is not in the function. As the code shows, we can call our new `addition()` function just like we called `open()` or `print()`. We can **pass** arguments to it, and if our function returns a value we can store it in a variable. The code above passes the numbers 100 and 200 and then stores the return value in a new variable named `sum`.  

<div class="alert alert-warning">
    <b>Critical</b>: Python is very strict about indenting code.  Always pay close attention to the indentation of your code. If it is incorrect, your program will not run.
</div>

Lets run a new variation of the code above to see how we can call the `addition()` function multiple times in the same program:

In [None]:
def addition(val1, val2):
    result = val1 + val2
    return result

sum1 = addition(100, 200)
sum2 = addition(500, -700)
total = addition(sum1, sum2)
print(total)

Notice in the code above we called the function `addition()` three times. This demonstrates the power of functions. We can reuse them anytime we want. This reduces the size of our code (we don't have to repeatedly write code for the same thing), and it helps make our code more readable.


### 3.4. Variable Scope <a id="section-3.4"></a>
The next thing we need to learn about functions is the **scope** of variables.  Functions can use three types of variables:

1. Arguments.
2. Internal variables.
3. External **global variables**.

As we have already learned, the arguments of a function are specified in the function definition. The arguments that we provide to a function are actually variables.  Notice in the following code that the arguments `val1` and `val2` are used in the function.   

```python
def addition(val1, val2):
    result = val1 + val2
    return result
```
The code above also creates a second type of variable: an internal variable. The internal variable in the code above is named `result` and it stores the sum of the values of our two arguments.  Both the arguments and the internal variable are only valid within the function.  You cannot use them anywhere else except inside the function. This is their **scope**, or the portion of the program where the variables are valid.  To demonstrate that you cannot use a variable that was declared inside of a function anywhere else run the following code:


In [None]:
def addition(val1, val2):
    result = val1 + val2
    return result

sum = addition(100, 200)
print(result)

You should get an error message from the cell above. Does the message makes sense to you?  Python is complaining that it does not recognize the variable named `result`. This is because we are trying to use the variable in the main code of the program but it has no scope in that part of the problem. It only has scope inside of the function.  We can only use the value of `result` if we return it from the function and store it in a new variable (e.g., the `sum` variable). That is why the following code works.  The `sum` variable does have scope in the main code so we can print it. 

In [None]:
def addition(val1, val2):
    result = val1 + val2
    return result

sum = addition(100, 200)
print(sum)

The last type of variable is the global variable. A global variable has scope everywhere! It can be used in the main code of the program and in functions.  To demonstrate a global variable, run the following code:

In [None]:
val1 = 100

def addition(val2):
    result = val1 + val2
    return result

sum = addition(200)
print(sum)

We get the same output but the `addition()` function has changed. It now only accepts one argument. However, the interal code has not changed.  It still adds `val1` and `val2`.  The `val1` variable was declared outside of the function and was not passed in as an argument.  Also notice that the `val1` variable is not indented. That means it is part of the main code.  Any variable that is declared outside of a function, in the main code is a global variable. It can be used anywhere--its scope is global to the program.

While this code behaves properly, hopefully you recognized that having an addition function that only receives one argument and uses a global variable is a bit confusing!  For this reason, it is considered best practice to use global variables sparingly.  

### <i class="fas fa-puzzle-piece"></i> 3.5. Practice
#### Task 1
Now that you know how to write a function, lets create one!  Using either the iris dataset or the ecoli dataset (you choose), write Python code in the cell below that will read in the first 10 lines of the file.  The program should have one function named `get_next_line()` that receives a single argument: the file object.  Inside the function it should read the next line, split the elements into a list and return the list.  Your program should call the function 10 times and print the results returned.

#### Task 2
In the cell below adjust your code from the previous task to add up all of the values in the second column of data. In otherwords you will have 10 sets of values (one from each line read). We want the sum of the 10 values for only the second column and print that total. 

**Hint**: Your `get_next_line()` function will return a list.  You can access the second column of data in a list using brackets `[]` and the coordinate of the column. We will learn more about accessing values in lists later, but for now, here's an example:

```python
rowvals = get_next_line(data_file)
col2 = rowvals[1]
```
In the code above, the list object returned by your function is stored in the variable `rowvals`. We can get the second element in the list by using the value `1`. We use `1` because the first column value is `0`, the second, `1` and the third `2`, etc.. We start counting with `0`.

#### Task 3
Suppose instead of just calculating the sum of the second column, we wanted to get sums for all numeric columns (columns 2 through 8) and print each of those column sums separatly. Lets create a second function named `sum_data()` to do this. This function will receive one argument: the list object returned by the `get_next_line()` function and it will calculate the sums! 

**Hint**: use a global variable to store the totals and just have the function add to those totals.

## 4. Logic <a id="section-4"></a>
Logic in programming is centered around the truth of statements. Logic is critical for making decisions in your code. We will learn in a future lesson how to make decisions in the program using logic in control statements.  For now, lets learn some logic. We already learned about some mathematic operators that can be used in logic statements. The are:

| Symbol | Name | Purpose |
| ------ | ---- |------- |
| < | less-than | compares if one number is less than the other |
| > | greater-than | compares if one number is greater than the other  |
| <= | less-than-equal | compares if one number is less than or equal to another |
| >= | greater-than-equal | compares if one number is greater than or equal to another |
| == | equals | compares if two numbers are equal |
| != | not equals | compares if two numbers are not equal |

For example, the following is a logic statement:

```python
2 != 1
```

The truth value of that statement is `True` because 2 does not equal 1. It is a boolean value.  Here is another example: 

```python
1 != 1
```
The truth value of that statement is `False` because 1 does equal 1 (it does not, not equal 1).

Aside from these mathematical operations, there are a few other operators we can use for comparing booleans (or sets of logic statements):

| Operator |  Purpose |
| ------ | ------- |
| `and` | Results in `True` if two values are both `True` |
| `or` |  Results in `True` if at least one of two values are `True`.  |
| `not` | Flips the truth of a value. If it is `True` it will become `False` and vice-versa. |

This table shows the return value returned by **and** when comparing two values. The margins of the table are the possible states of the two values, the cells indicate the result that **and** returns. Notice it only results in `True` if both values are `True`.

|            | True    |  False |
| ---------  | ------- | ------ | 
| **True**   | True    | False  |
| **False**  | False   | False  |

This table shows the return value returned by **or** when comparing two values. The margins of the table are the possible states of the two values, the cells indicate the result that **or** returns. Notice it results in `True` if either of the values are `True`.

|            | True    |  False |
| ---------  | ------- | ------ | 
| **True**   | True    | True   |
| **False**  | True    | False  |

So, how do we perform a logic statement using **and**, **or** and **not**?  Consider the following code:

```python
comp1 = 2 != 1
comp2 = 1 != 1
result = comp1 and comp2
```
You will notice that the first two lines of code above are the logic statements we looked at above. This time, though, the value of the comparison is stored in the variables `comp1` and `comp2`.  Those will be boolean values (i.e., `True` or `False`).  The third line of the code uses the **and** comparision. If both `comp1` and `comp2` are `True` then it will return `True` and set the variable `result` to that value.  But, the `comp2` comparision is not `True`, so the **and** comparision will return `False`. Lets observe this by running the following cell: 

In [None]:
comp1 = 2 != 1
comp2 = 1 != 1
result = comp1 and comp2
print(comp1)
print(comp2)
print(result)

Lets use an **or** operator instead of **and**:

In [None]:
comp1 = 2 != 1
comp2 = 1 != 1
result = comp1 or comp2
print(comp1)
print(comp2)
print(result)

Notice the result is `True`.  Now, lets use a **not** and switch back to the **and**:

In [None]:
comp1 = 2 != 1
comp2 = 1 != 1
result = comp1 or (not comp2)
print(comp1)
print(comp2)
print(result)

Remember, the **not** operator flips the truth value. So, in the third line of code, the `comp2` value gets changed from `False` to `True` and that allows the **and** operator to return `True`.

## Expected Outcomes <a id="section-outcomes"></a>
At this point, you should feel comfortable with the following:


- Opening a file for read, write or appending data.
- Knowing what is a file object.
- Reading from a file using both `read()` and the `readline()` functions.
- Closing a file and why you should always close a file.
- Writing to a file using either `write()` or `print()`.
- Understaning character encoding of files.
- Splitting and unpacking a line of text from a file.
- Type casting values read from a file.
- How to define and call a function.
- Understanding variable scope.
- Understand logic in programming.


## What to Turn in? <a id="section-turn_in"></a>

Perform the following with Git:

1. Commit your changes to this notebook. 
3. Push your code to your remote repository.

Go to your online repository on GitHub and check that the changes you just pushed are present. If so, then send a message to the instructor indicating the assignment is turned in.