# Input and output

So far in this course, you've seen:

* 'Basic data types such as ints, floats, booleans and strings.'
* Operators (such as >, <, ==) for comparing two values.
* If, elif and else statements to control which sections of code are run.
* Loops (such as for, while) to perform repeated instructions.
* Functions for organising code into coherent blocks, potentially with inputs and outputs.
* Lists as data structures to hold variable names and work with sequences of data

Today, we will learn about three ways of inputting data information within a program. 

We will also study how to save (write) processed data to new output files.

By the end of this week, you will understand how to:
+ read data into your program from text and CSV files
+ import and use built-in Python modules
+ save data from a program
+ use command-line arguments to specify input and output data (e.g. file names)
+ use modularity to improve flexibility and reusability of your code.

### Three types of data input

1. Data (.txt file) input (using `open()` and file methods)

2. Module imports (.py file) (using `import`)

3. Command-line arguments (using `sys.argv`)

## 1. File input

Importing data from file to a program can make the program more flexible and reusable

### Example

```python
'''
The numbers 10, 20, 30 are 'hardcoded' into the program
The program can only ever work with those exact values.
'''
numbers = [10, 20, 30]
print(sum(numbers))
```

```python
'''
If the program instead reads data from a file, you can change the data by editing data.txt
No need to touch the code!
'''
with open("data.txt") as f:
    numbers = [int(line) for line in f]

### What is a file? 

A file is a set of bytes (8 bits) used to store data. 

What the data represents depends on the file type which is represented by the file extension. 

Examples of file types and file extensions:
- unformatted text (.txt, .dat)
- formatted text (.docx)
- spreadsheet/tabulated data (.xlsx, .csv)
- image (.png, .img)

(there are hundereds more)

### Opening and closing a file using a computer program

Consider the file system below

We want to import the data from `README.txt` to the program `program_1.py`

```python
CPA/
|
|--- Example_1/
        |
        |--- program_1.py
        |--- README.txt 
```




In `program_1.py`, we create a *file object* (with name `file`) using:

```python
with open('README.txt') as file
```

Just like other objects, you can give the file object a variable name of your choosing

```python
with open('README.txt') as file
```

```python
with open('README.txt') as my_data
```

We can use the `read` function to reads the contents of the file

Notice that the line that follows the `with` statement is indented.  

Once the indented block ends, the file is closed.


In [None]:
with open('README.txt') as file:
    print(file.read())


Another way to open and close a file is:

In [None]:
file = open('README.txt')
print(file.read())
file.close()

But this required us to remember to close the file, whereas `with opens()` closes the file automatically so we will use `with opens()` throughout the rest of the lab

### Using an imported file within a program

File objects are *iterable*: each item is a new line of the file

In [None]:
with open('README.txt') as file:
    for value in file:
        print('Line:', value)

File objects are not *subscriptable* (we can't access an individual element using an index)

In [None]:
with open('README.txt') as file:
    print(file[0])

Once the indented block after the `with` statement ends, the file is closed.

The variable that points to the file object goes out of scope.


In [None]:
with open('README.txt') as file:
    print(file.read())

# print(file.read())

We can *cast* the file object as different object type that makes it easier to manipulate the data within the computer program

By casting the file object as a list, the data:
- is iterable
- is subscriptable
- can be accessed (remains in scope) once the file is closed

Each element of the list is a new line of the file

In [None]:
with open('README.txt') as file:
    data = list(file)

    # Iterable
    for value in data:
        print(value)

    # Subsciptable
    print(data[0])

# Remains in scope after the file is closed
print(data[1])

**Comprehension check:** Write some code below to open and read the contents of the file 'temperature.csv'

In [None]:
with open('temperature.csv') as file:
    pass # Delete this line and replace with your code

### Reading different types of file


Every file is a set of bytes (eight bits) used to store data. 

The file type determines what these bytes represent. 

__Text files__: Human-readable data. <br>Bytes represent plain text characters <br>e.g. .py, .csv, .json, .txt

__Binary files__: Data that is not intended to be human-readable. <br>Bytes do not represent plain text characters, but other information about the file. <br>e.g. executable programs (.exe, .bin), images (.jpg, .png, .gif), audio (.mp3, .wav), video (.mp4, .avi), compressed files (.zip). 

**For readbility, we will be working with data stored in *text file* formats for the rest of the unit.**

### File Path

The file path is a string object that represents the location of a file on an operating system. 

For example, in `with open('README.txt') as file:`, the file path is `'README.txt'`.

#### How to construct a file path

The file path can be either:
- __Global (Absolute):__ The path to a file from the **root directory** of the file system. The root directory is the top-most directory of the file system.
- __Local (Relative):__ The path to a file relative to the current *working directory* (the directory where the program is being run) 



#### File paths on different operating systems 

The syntax for the file path is different on Windows and Mac/Linux

##### Windows

Each drive has its own root directory:
- `C:\` is the root directory of the C: drive.
- `D:\` is the root directory of the D: drive.

Directories are separated by a backslash character `\`

Example global path: 
'<span style="color:blue">C:\Users\YourUsername\Documents\ </span><span style="color:red">myfile</span><span style="color:green">.txt</span>'
<br><span style="color:black">

##### Linux/Mac

There is a single root directory for the entire file system, denoted by a forward slash `/`

Directories are separated by a forward slash character `/`

Example global path: '<span style="color:blue">/home/YourUsername/Documents/</span><span style="color:red">myfile</span><span style="color:green">.txt</span>'
<br><span style="color:black">

#### Example

All examples so far have imported a file within the same directory as the python program

Import 'README.txt' to 'program_1.py'

```python
CPA/
|
|--- Example_1/
        |
        |--- program_1.py
        |--- README.txt 
```
##### Local path
```python
with open('README.txt') as file:
```

##### Global path

(Assume that `CPA` is in the the directory `YourUsername`, which is on the root directory, in a directory called: `Users` (Windows) or `home` (Mac/Linux))

Windows: 
```python
with open('C:\Users\YourUsername\CPA\Example_1\README.txt' ) as file
```

Mac/Linux:
```python
with open('/home/YourUsername/CPA/README.txt') as file
```

***

Both are correct, but the local path is:
- shorter
- unchanged by the location of files, providing their location relative to each other doesn't change

For these two reasons, in this lab all examples with use:
- the __local path__, not the __global path__
- the notation for Mac/Linux systems (forward slash `/`). Change this to backslash `\` if you are using Windows


#### Downstream and Upstream files 

__Downstream files__: <br>Files that exist in the same directory as the current working directory (containing the program), or any of its subdirectories

__Upstream files__: <br>Files that exist in a *higher level* directory than the current working directory (containing the program)

#### Example: Downstream file


The file `rainfall.csv` is *downstream* of the file `program_3.py`


```python
Week_8/
|
|--- Example_3/
        |
        |--- program_3.py
        |--- my_directory/ 
               |
               |--- rainfall.csv
```

Import 'rainfall.csv' to 'program_3.py': 
```python
with open('my_directory/rainfall.csv') as file:
```

#### Example: Upstream file

The file `rainfall.csv` is *upstream* of the file `program_3.py`


```python
Week_8/
|
|--- Example_3/
        |
        |--- rainfall.csv
        |--- my_directory/ 
               |
               |--- program_3.py
```


Import 'rainfall.csv' to 'program_3.py': 
```python
with open('../rainfall.csv') as file:
```

`../` before `rainfall.csv` in the filepath, indicates one directory upstream

One directory upstream is denoted by `../`

Two directories upstream is denoted by `../../`

#### Example: A more complex file path


The same process can be used to access directories that are:
- downstream of an upstream directory
- but not downstream of the current working directory, containig the program

```python
Week_8/
|
|--- Example_4/
        |
        |--- my_program/ 
               |
               |--- program_4.py
        |--- my_data/ 
               |
               |--- wind_speed.csv
```

Import 'wind_speed.csv' to 'program_4.py': 
```python
with open('../my_data/wind_speed.csv') as file:
```


## 2. Module imports 

A module is a source code file that contains variables and/or functions.

The `import` keyword is used to load the names defined in a different pyhton file (module) into your current global scope.
 

### Example

There are different ways to use the `import` keyword

In [None]:
import math
from math import pi

print(math.sqrt(16))   # sqrt is used with prefix math.

print(pi)              # pi can be used without the prefix

### Reading a CSV file without importing a module

In a CSV file
- Text file that stores tabular data.
- Each line in a CSV file represents a row in a table.
- The values within each row are separated by a specific delimiter, usually a comma (,).



In [None]:
with open('temperature.csv') as file:
    print(file.read())

Look closely at the data type of the second line of the file - the individual numerical values, seperated by commas, are stored as a single string. 

In [None]:
with open('temperature.csv') as file:
    file = list(file)
    print(file[1])
    print(type(file[1]))

We can transform this string to numerical data:
1. Convert file object to list
2. Access line of file conyaining numerical values
3. Convert string to list of values with commas removed
4. Remove the new line character from any strings in the list
5. Cast all strings as integer values

In [None]:
with open('temperature.csv') as file:
    data = list(file)                   
    line = data[1]            
    values = line.split(',')   
    values = [v.replace('\n', '') for v in values] 
    values = [int(v) for v in values]              
    print(values)

mean = sum(values) / len(values)
print(mean)

That's a lot of work to get the data in a useable form!

### The `csv` module

The `csv` module contains functions for reading and writing CSV (comma separated value) files
Documentation: https://docs.python.org/3/library/csv.html

#### A function for reading a CSV file

```
csv.reader(csvfile)
```

The function takes:
- Positional argument: `csvfile`
- Optional arguments: `dialect`, `**fmtparams`

The function returns:
- A reader object which, like a file object, is *iterable* but not *subscriptable*.
- We can convert the reader object to a list if we want to access/manipulate the values

We are also told:  
- *If csvfile is a file object, it should be opened with newline=''.*

- e.g. `with open('my_file.csv', newline='') as file:`

- This is so that newline characters are not altered in the imported file which is particularly important where the file was created on a different operating system




In [None]:
import csv

with open('temperature.csv', newline='') as file:
    
    data = csv.reader(file)
    
    for value in data:
        print(value)

By importing a function from a module, we have avoided the need to seperate the values manually.

   **Comprehension check:** 
   
   Edit the code above to:
   
   - print the second line only
   - convert the numerical values in this line to a numerical data type 

#### Writing/saving a file

To write (save) data to a file, rather than read it, a second argument (the mode), must be given when opening the file:

- `'a'` - Append - Creates a new file if it *does not* already exist, adds (appends) to the file if it *does* already exist
- `'w'` - Write - Creates a new file if it *does not* already exist, overwrites the file if it *does* already exist

```python
with open('my_file.csv', 'a') as file:

with open('my_file.csv', 'w') as file:
```

#### A function for writing a CSV file

```
csv.writer(csvfile)
```

The function takes:
- Positional argument: `csvfile`
- Optional arguments: `dialect`, `**fmtparams`

The function returns:
- A writer object

We are also told:  
- *If csvfile is a file object, it should be opened with newline=''.*

- i.e. `with open('temperature.csv', newline='') as file:`. 

In [None]:
import csv

with open('output_data.csv', 'w', newline = '') as file:
    
    writer = csv.writer(file)

    '''
    Writes input argumnent to the file we opened, must be an iterable (e.g. list)
    '''
    writer.writerow([1,2,3])    


    '''
    Writes input argumnent to the file we opened
    Must be an iterable of iterables (e.g. nested list)
    '''
    writer.writerows([[1,2,3],[4,5,6]])

**Comprehension check:** Edit the code above to write the matrix shown below to the file

$
\begin{bmatrix}
1 & 0 & 0\\
0 & 1 & 0\\
0 & 0 & 1
\end{bmatrix}
$

There are many other modules for easy handling of different file types (including binary file types) e.g.:
- `json`: JSON files
- `zipfile`: ZIP files
- `PyPDF2`: PDF files
- `pyxlsb`, `openpyxl`: Excel files
- `Pillow`: image files

## 3. Command line arguments



We can run a python program from the terminal e.g.

```python
python3 add_numbers.py
```

- `python3` runs the Python interpreter

- `add_numbers.py` is the file path


When you run a program from a terminal, you can add extra arguments after the filepath to pass information to the program.

**Example in a terminal**

```python
python3 add_numbers.py 1 2
```

- `python3` runs the Python interpreter

- `add_numbers.py` is the file path

- `1` and `2` are command-line arguments passed to the program

### The sys module


To access the command line arguments within a program we can use the biult in sys module. 

The command-line arguments are stored as **strings**, in a list called sys.argv

- `sys.argv[0]` → the script name

- `sys.argv[1]`, `sys.argv[2]`, etc. → the arguments you pass


**add_numbers.py**
***
```Python
import sys

# Access the command line argument and convert string data to numerical data
num1 = float(sys.argv[1])
num2 = float(sys.argv[2])

result = num1 + num2
print(result)
```
***


### Example: file names 


In this example, the filenames to read and write are passed as command line arguments

**In a terminal**
```python 
python3 file_copy.py data.txt backup.txt
```

**file_copy.py**
***
```python
import sys

input_file = sys.argv[1]
output_file = sys.argv[2]

with open(input_file, "r") as f_in:
    contents = f_in.read()

with open(output_file, "w") as f_out:
    f_out.write(contents)
```
***

### Making code robust

It's important to try and write code so that it is robust to error when using it. 

Consider the new version of file_copy.py below. 

It includes a couple of checks for:
- the use of command line arguments (uses default values if none exist)
- the correct number of command line arguments

**In a terminal**
```python 
python3 file_copy.py data.txt backup.txt
```

**file_copy.py**
***
```python
import sys

"""
Check if the file was run without entering 
command line arguments
"""
if len(sys.argv) == 1:
    input_file = 'data.txt'
    output_file = 'results.txt'
    
"""
Check if an incorrect number of command line 
arguments was entered 
"""
elif len(sys.argv) != 3:
    print("Incorrect number of command line arguments")
    print("Usage: python file_copy.py <input_file> <output_file>")
    sys.exit(1)

input_file = sys.argv[1]
output_file = sys.argv[2]

with open(input_file, "r") as f_in:
    contents = f_in.read()

with open(output_file, "w") as f_out:
    f_out.write(contents)
```
***

# Summary

We have studied three type of data input:

1. Data
- Data files can be imported using `open()` and file methods
- The filepath needs to be specified to import a file

2. Module 
- `import` can be used to import .py files
- The csv module can be used for reading and writing csv files

3. Command-line arguments
- When running a python program from the terminal, extra arguments after the filepath can be used to pass information
- `sys.argv`is a list object that stores any command line arguments so they can be accessed within the program


Now that you've finished reading the notes, please do the comprehension checks for this week on Blackboard.