# Input and output

So far in this course, you've seen:

* 'Basic data types such as ints, floats, booleans and strings.'
* Operators (such as >, <, ==) for comparing two values.
* If, elif and else statements to control which sections of code are run.
* Loops (such as for, while) to perform repeated instructions.
* Functions for organising code into coherent blocks, potentially with inputs and outputs.
* Lists as data structures to hold variable names and work with sequences of data

Today, we will learn about three ways of inputting data information within a program. 

We will also study how to save (write) processed data to new output files.

By the end of this week, you will understand how to:
+ import and use standard Python modules
+ read data into your program from text and CSV files
+ use command-line arguments to specify input and output data (e.g. file names)
+ save data from a program in multiple formats
+ use modularity to improve flexibility and reusability of your code.

## Three type of data input

1. File input (using `open()` and file methods)

2. Command-line arguments (using `sys.argv`)

3. Module imports (using `import`)

<img src="https://github.com/engmaths/SEMT10002_2024/blob/main/img/modularity_overview.png?raw=true" width="80%">
</p>

## File input

Importing data from file to a program can make the program more flexible and reusable

## Example

```python
'''
The numbers 10, 20, 30 are 'hardcoded' into the program
The program can only ever work with those exact values.
'''
numbers = [10, 20, 30]
print(sum(numbers))
```

```python
'''
If the program instead reads data from a file, you can change the data by editing data.txt
No need to touch the code!
'''
with open("data.txt") as f:
    numbers = [int(line) for line in f]
```

## What is a file? 

A file is a set of bytes (8 bits) used to store data. 

What the data represents depends on the file type which is represented by the file extension. 

Examples of file types and file extensions:
- unformatted text (.txt, .dat)
- formatted text (.docx)
- spreadsheet/tabulated data (.xlsx, .csv)
- image (.png, .img)

(there are hundereds more)

## Opening and closing a file using a computer program

Consider the file system below

```python
CPA/
|
|--- Example_1/
        |
        |--- program_1.py
        |--- README.txt 
```


We can import the data from `README.txt` to the program `program_1.py`, 

In `program_1.py`, we create a *file object* (with name `file`) using:

```python
with open('README.txt') as file
```

Just like other objects, you can give the file object a variable name of your choosing

```python
with open('README.txt') as file
```

```python
with open('README.txt') as my_data
```

We can use the `read` function to reads the contents of the file

Notice that the line that follows the `with` statement is indented.  

Once the indented block ends, the file is closed.


In [None]:
with open('README.txt') as file:
    print(file.read())


FileNotFoundError: [Errno 2] No such file or directory: 'README.txt'

## Using an imported file within a program

Like all objects, the file object type has a set of specific properties and behaviours.

File objects are *iterable*: each item is a new line of the file

In [None]:
with open('README.txt') as file:
    for value in file:
        print('Line:', value)

Line: Computer programming and algorithms

Line: SEMT10002



File objects are not *subscriptable* (we can't access an individual element using an index)

In [None]:
with open('README.txt') as file:
    print(file[0])

TypeError: '_io.TextIOWrapper' object is not subscriptable

Once the indented block after the `with` statement ends, the file is closed.

The variable that points to the file object goes out of scope.


In [None]:
with open('README.txt') as file:
    print(file.read())

# print(file.read())

FileNotFoundError: [Errno 2] No such file or directory: 'README.txt'

We can *cast* the file object as different object type that makes it easier to manipulate the data within the computer program

By casting the file object as a list, the data:
- is iterable
- is subscriptable
- can be accessed (remains in scope) once the file is closed

Each element of the list is a new line of the file

In [None]:
with open('README.txt') as file:
    file = list(file)

    # Iterable
    for value in file:
        print('Line:', value)

    # Subsciptable
    print(file[0])

# Remains in scope after the file is closed
print(file[1])

FileNotFoundError: [Errno 2] No such file or directory: 'README.txt'

# Example

# Example

# Comprehension check

## File Path

The file path is a string object that represents the location of a file on an operating system. 

## How to construct a file path

A file path has three parts:
1. <span style="color:blue">__Directory Path__</span>: the location of the directory containing the file, on the file system. Nested directories are separated by a:
    - forward slash `/` (Mac/Linux)
    - backslash `\` (Windows)
3. <span style="color:red">__File Name__</span>: the name of the file
4. <span style="color:green">__File Extension__</span>: used to indicate the file type

Example: 
'<span style="color:blue">C:\Users\YourUsername\Documents\ </span><span style="color:red">myfile</span><span style="color:green">.txt</span>'
<br><span style="color:black">



The file path can be either:
- __Global (Absolute):__ The path to a file from the **root directory** of the file system. The root directory is the top-most directory of the file system.
- __Local (Relative):__ The path to a file relative to the current *working directory* (the directory where the program is being run) 



## File paths on different operating systems 

The syntax for the file path is different on Windows and Mac/Linux

#### Windows

Each drive has its own root directory:
- `C:\` is the root directory of the C: drive.
- `D:\` is the root directory of the D: drive.

Directories are separated by a backslash character `\`

Example global path: 
'<span style="color:blue">C:\Users\YourUsername\Documents\ </span><span style="color:red">myfile</span><span style="color:green">.txt</span>'
<br><span style="color:black">

#### Linux/Mac

There is a single root directory for the entire file system, denoted by a forward slash `/`

Directories are separated by a forward slash character `/`

Example global path: '<span style="color:blue">/home/YourUsername/Documents/</span><span style="color:red">myfile</span><span style="color:green">.txt</span>'
<br><span style="color:black">

## Example

All examples so far have imported a file within the same directory as the python program

The input argument to the `open` function is the **file path**

```python
CPA/
|
|--- Example_1/
        |
        |--- program_1.py
        |--- README.txt 
```
#### Local path

Import 'README.txt' to 'program_1.py' using the local path
```python
with open('README.txt') as file:
```

#### Global path

Import 'README.txt' to 'program_1.py' using the global path

(Assume that `CPA` is in the the directory `YourUsername`, which is on the root directory, in a directory called: `Users` (Windows) or `home` (Mac/Linux))

Windows: 
```python
with open('C:\Users\YourUsername\CPA\Example_1\README.txt' ) as file
```

Mac/Linux:
```python
with open('/home/YourUsername/CPA/README.txt') as file
```




***

Both are correct, but the local path is:
- shorter
- unchanged by the location of files, providing their location relative to each other doesn't change

For these two reasons, in this lab all examples with use:
- the __local path__, not the __global path__
- the notation for Mac/Linux systems (forward slash `/`). Change this to backslash `\` if you are using Windows


## Downstream and Upstream files 

__Downstream files__: <br>Files that exist in the same directory as the current working directory (containing the program), or any of its subdirectories

__Upstream files__: <br>Files that exist in a *higher level* directory than the current working directory (containing the program)

## Example: Downstream file

The file `rainfall.csv` is *downstream* of the file `program_3.py`


```python
Week_8/
|
|--- Example_3/
        |
        |--- program_3.py
        |--- my_directory/ 
               |
               |--- rainfall.csv
```


Import 'rainfall.csv' to 'program_3.py': 
```python
with open('my_directory/rainfall.csv') as file:
```

## Example: Upstream file

The file `rainfall.csv` is *upstream* of the file `program_3.py`


```python
Week_8/
|
|--- Example_3/
        |
        |--- rainfall.csv
        |--- my_directory/ 
               |
               |--- program_3.py
```


Import 'rainfall.csv' to 'program_3.py': 
```python
with open('../rainfall.csv') as file:
```

`../` before `rainfall.csv` in the filepath, indicates one directory upstream

One directory upstream is denoted by `../`

Two directories upstream is denoted by `../../`

## Example: A more complex file path

The same process can be used to access directories that are:
- downstream of an upstream directory
- but not downstream of the current working directory, containig the program

```python
Week_8/
|
|--- Example_4/
        |
        |--- my_program/ 
               |
               |--- program_4.py
        |--- my_data/ 
               |
               |--- wind_speed.csv
```

Import 'wind_speed.csv' to 'program_4.py': 
```python
with open('../my_data/wind_speed.csv') as file:
```


## Reading different types of file

Every file is a set of bytes (eight bits) used to store data. 

The file type determines what these bytes represent. 

__Text files__: Human-readable data. <br>Bytes represent plain text characters <br>e.g. .py, .csv, .json, .txt

__Binary files__: Data that is not intended to be human-readable. <br>Bytes do not represent plain text characters, but other information about the file. <br>e.g. executable programs (.exe, .bin), images (.jpg, .png, .gif), audio (.mp3, .wav), video (.mp4, .avi), compressed files (.zip). 

Text file: `.txt`

In [None]:
with open('README.txt') as file:
    print(file.read())

Computer programming and algorithms
SEMT10002



Text file: `.csv`

In [None]:
with open('temperature.csv') as file:
    print(file.read())

Jan,Feb,Mar,Apr,May,Jun,Jul,Aug,Sep,Oct,Nov,Dec
6,5,8,10,13,16,18,18,15,13,8,7



Binary file: `.png`

To open a binary file, we must give a second argument (the mode) within the parentheses of the `open` function (try running the code without this)

`rb` represents `r`ead and `b`inary

In [8]:
with open('snake.png', 'rb') as file:
    print(file.read())

FileNotFoundError: [Errno 2] No such file or directory: 'snake.png'

The data shown may look confusing but it shows a series of bytes (8 bit binary number)

`\x` indicates a hexadecimal (base 16) number which is another way to represent a byte 

Like text files, the bytes of data in a binary file each encodes a meaning. 

However, unlike text data, the encoding of binary files is not a simple mapping from bytes to human readable characters. 

For example, a binary `.png` file is structured into a series of chunks, composed of bytes, that are used to reconstruct the image when the file is opened in an image viewer or editor.

**For ease of use, we will be working with data stored in *text file* formats for the rest of the unit.**