# Introduction to Computer Programming

## Week 8: Reading and writing data

* * *

<img src="https://github.com/engmaths/EMAT10007_2023/blob/main/weekly_content/img/full-colour-logo-UoB.png?raw=true" width="20%">
</p>

### Open the Google Colab notebook for this class

Either:

- Click [this link](https://colab.research.google.com/github/engmaths/SEMT10002_2024/blob/main/weekly_labs/Week_08_Reading_Data/week_08_lab.ipynb) to open this notebook in Google colab.  You'll need to sign in with a Google account before you can run it.  When you do, hit `Ctrl+F9` to check it all runs.

or

- Download it to your local computer using `git clone https://github.com/engmaths/SEMT10002_2024` or just use `git pull` to refresh if you've done this already.
- Navigate to the subfolder `weekly_labs/Week_08_Reading_Data/` and open the notebook `week_08_lab.ipynb`.  For example, in Visual Studio Code, use `Ctrl+K Ctrl+O` to open a folder and select the folder just mentioned.  Then you can open the notebook file by clicking on it in the left hand explorer sidebar.

# Modularity

<img src="https://github.com/engmaths/SEMT10002_2024/blob/main/img/modularity_overview.png?raw=true" width="80%">
</p>

# Recap of the videos

- We can open a file containing data using the `open` function and the file path 
```
open('data/temperature.csv')
```
- The data file may be located upstream or downstream of the current working directory
- Different types of data file:
    - __Text files__: Human-readable data. Bytes represent plain text characters 
    - __Binary files__: Data that is not intended to be human-readable. Bytes do not represent plain text characters, but other information about the file. 
- By importing a module, we can access additional functions, to handle tasks that we are likely to encounter.
- This can avoid writing longer and more complicated code from scratch.
- Details of the functions associated with a module can be found in the module documentation
- The `csv` module provides functions for handling data using CSV files

# Aim 

To become comfortable with:
- Opening files from upstream and downstream locations using the file path
- Understanding the importance of *where* we run our program
- Opening text files using a computer program and reading the data contained in the file
- Importing a module you have seen before (`csv`) and using the functions from this module to read input data and save output data
- Looking at documentation for modules for other file types (e.g. JSON files) and finding eqivalent functions for reading and saving data

There is a set of files for you to download that we will be working with/editing today. 

You can download these files from the command line/ terminal application. 

1. Navigate to the directory where you want to download the files using `cd`. 

2. Download the .zip file by running: `curl -O https://raw.githubusercontent.com/engmaths/SEMT10002_2024/main/weekly_labs/Week_08_Reading_Data/week_8.zip`

3. Then unzip the file by running: `unzip week_8.zip`

(Fallback option: Files also available to download as zip file (Week_8.zip) on Blackboard, Week 8)

Until now, we have seen code examples run in a Google Colab .pynb file

We will complete today's exercises in an IDE (VS Code) and the terminal

By doing this, we will learn about the importance of the __current working directory__ (the location *where* we run a program)

If using VS code IDE, you must open the folder containing both the program and the data. 

File >> Open >> (Folder to open)<br>
(To exit: File >> Close Folder)<br><br>
OR<br><br>
File >> Open >> (File to open)<br>
File >> Add Folder to Workspace >> (Folder to open)
<br>
(To exit: File >> Close Workspace)

# Example: Importing a downstream file

Consider the file system below. 

Open the file `student_marks.csv` using `program_A.py` and read the contents

```
Week_8/
|
|--- Example A/
        |
        |--- program_A.py
        |--- student_marks.csv
```



# Example: Importing an upstream file

Consider the file system below. 

Open the file `student_marks.csv` using `program_B.py` and read the contents

```
Week_8/
|
|--- Example_B/
        |
        |--- student_marks.csv
        |--- my_program/ 
               |
               |--- program_B.py
```

# Example: Using imported numerical data and saving a CSV file

Consider the file system below. 

a) Open the file `student_marks.csv` using `program_C.py` and read the contents<br>
b) Find the mean of the student marks 

```
Week_8/
|
|--- Example_C/
        |
        |--- my_program/ 
               |
               |--- program_C.py
        |--- my_data/ 
               |
               |--- student_marks.csv
```

c) Using the table below, find the grade associated with each student mark.<br>Save the student ID number, marks and associated grades as a new csv file called `student_grades.csv`, in the same directory as `student_marks.csv`

| Grade | Range    | 
| :------: | :---------------:   | 
| A       | mark $\geq 70$             | 
| B       | $70 >$ mark $\geq 60$             | 
| C       | $60 >$ mark $\geq 50$            | 
| D       | $50 >$ mark $\geq 40$            | 
| F       | mark $< 40$             | 

In [296]:
    grades = []

    for m in marks:
        if m >= 70:
            grades.append('A')
        elif m >= 60:
            grades.append('B')
        elif m >= 50:
            grades.append('C')
        elif m >= 40:
            grades.append('D')
        else:
            grades.append('E')

    print(grades)

['C', 'A', 'A', 'B', 'D', 'D', 'C', 'B', 'B', 'A', 'B', 'C']


d) Edit `program_C.py` so that instead of saving the student grades as a new file, it edits `student_marks.csv` to add the grades as a new column

# Lab exercises

Complete the exercises by working in an IDE and/or the terminal

# Exercise 1 - Importing data from upstream and downstream locations

In the directory, 'Week_8', sub-subdirectory 'Lab_Exercise_1', find and modify the program `program_1.py` to:
1. Import the file rainfall.csv and display the contents of the file
1. Import the file unit_description.txt and display the contents of the file
1. Import the file cat.png and display the contents of the file

# Exercise 2 - Reading and using data from text files

In the directory, 'Week_8', sub-subdirectory 'Lab_Exercise_2', find and modify the program `program_2.py` to:
1. Import the file rainfall.csv and print the mean and standard deviation of the monthly rainfall values. Standard deviation $\sigma = \sqrt{\frac{1}{N} \sum_{i=1}^{N} (x_i - \mu)^2}$, where $N$ is the total number of values, $x_i$ is each individual value in the population, $\mu$ is the mean of all values. 
2. Import the file file xyz_data.csv and print a list containing the sum of x and z for each row in xyz_data.csv
3. Import the file unit_description.txt and print the third line


# Exercise 3 - Saving data to a csv file

In the directory, 'Week_8', sub-subdirectory 'Lab_Exercise_3', find and modify the program `program_3.py` to:
1. Save the list, `cities`, to a csv file
2. Save `cities` and `populations` to a csv file, where cities is the first *row* and population is the second
2. Save `cities` and `populations` to a csv file, where cities is the first *column* and populations is the second
4. Import the file file xyz_data.csv. <br>Create a list containing the sum of x and z for each row in xyz_data.csv. <br> Save columns x and z to a new file, along with a third column containing the values of the list you just created. 

# Exercise 4 - Real world programming example 

### Sensor data from a robot 

In the directory, 'Week_8', sub-subdirectory 'Lab_Exercise_4':
1. Create a Python file called program_4.py
2. Import sense.csv, the data from 3 sensors mounted on the front of a robot that sense distance to the nearest object. If the first item imported begins with `\ufeff`, include an additional argument when calling the `open` function. The additional argument is `encoding='utf-8-sig'`. 
3. The units of the distance measurements are given in cm. <br>Round the values to 1 d.p. then convert the values so they are represented in mm. <br>Save the new values as file sense_mm.csv 

# Exercise 5 - Exploring other modules for importing data

Read the information about using the `json` module to import JSON files. 

Exercise 5 is given at the end of this information 

## JSON files

* JSON stands for JavaScript Object Notation
* Widely used for data transfer as:
     * both human and computer readable
     * useable across different programming languages
* Used for storing representations of programming objects (e.g. dictionaries, lists, tuples)

The typical JSON file format looks very similar to a Python Dictionary.

- Enclosed in curly braces
- Key: Represents a variabale name
- Value: Represents a value

```
{
  "Unit Name": "Computer Programming and Algorithms",
  "Unit Code": "SEMT10002",
  "Student ID": ["ab154", "fr877", "hg897", "ke890", "fe285", "lo730", "tr834", "ml965", "po892", "gw002", "wz303", "yu648"],
  "Mark": [56, 72, 74, 66, 43, 47, 57, 65, 62, 71, 62, 55]
}
```

Objects can be stored and recalled using a key-value pair.

This makes the JSON file a convenient way to store programming objects: 
- primitaves (e.g. ints, floats, Booleans)
- data structures (e.g. lists, tuples, dictionaries)

# Reading a JSON file

Consider the file system below.

```
Week_8/
|
|--- Example 9/
        |
        |--- program_9.py
        |--- student_marks.json
```

We can open the file `unit_data.json` using `program_9.py` and read the contents using 

In [430]:
with open('unit_data.json') as file:
    print(file.read())

{
  "Unit Name": "Computer Programming and Algorithms",
  "Unit Code": "SEMT10002",
  "Student ID": ["ab154", "fr877", "hg897", "ke890", "fe285", "lo730", "tr834", "ml965", "po892", "gw002", "wz303", "yu648"],
  "Mark": [56, 72, 74, 66, 43, 47, 57, 65, 62, 71, 62, 55]
}



As we saw with CSV files the imported data stored is a string for each line in the JSON file

Several steps would be required to access each value stored in the file

In [275]:
with open('unit_data.json') as file:
    file = list(file)
    for f in file:
        print(type(f))

<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>


# The `json` module

Certain modules were installed on your computer automatically when Python was installed

These include a module called `json` for handling JSON files 

All functions from the `json` module are listed here: https://docs.python.org/3/library/json.html

# Reading a file using the `json` module

The `load` function (prefixed with module name `json`) converts the file object to a dictionary. 

The key-value pairs can now be accessed as a dictionary object

In [279]:
with open('unit_data.json') as file:
    
    dictionary = json.load(file)

    print(type(dictionary))

    print(dictionary["Unit Code"])

<class 'dict'>
SEMT10002


Let's look at the documentation for `json.load`

```
json.load(fp, *, cls=None, object_hook=None,...)
```

The function takes:
- A positional argument `fp`
- Several optional arguments idenitifed by `*`, `=` or `**` sign

# Example: Using imported numerical data

Import the data stored in 'unit_data.json' and find the mean of the student marks

In [283]:
with open('unit_data.json') as file:
    
    dictionary = json.load(file)

    marks = dictionary["Mark"]

    print(sum(marks)/len(marks))

60.833333333333336


# Saving data to a JSON file using the `json` module

To save data to a JSON file, first format the as a dictionary.

To create a new file and open it for writing, we use the argument `w` when opening the file

We can use the `dump` function (prefixed with module name `json`) to save the dictionary 

In [412]:
colours = {'red': [1,0,0], 
           'blue': [0,1,0], 
           'green': [0,0,1]
          }

with open('colours.json', 'w') as file:
    json.dump(colours, file)

The `load` function can be used to read the JSON file again

In [410]:
with open('colours.json') as file:
    colours2 = json.load(file)

print(colours2)

rbg


# Example: Using imported numerical data and saving a JSON file

Save the student ID number, marks and mean mark as a new JSON file called `student_grades.json`, in the same directory as `unit_data.json`

In [414]:
with open('unit_data.json') as file:
    
    dictionary = json.load(file)

    marks = dictionary["Mark"]

    new_dictionary = {
                      'Student ID': dictionary['Student ID'],
                      'Mark': dictionary['Mark'],
                      'Mean mark': sum(marks)/len(marks)
                     }
    
with open('student_grades.json', 'w') as file:

    json.dump(new_dictionary, file)

    

# Limitations of `json` module

* Saved tuples will be converted into lists
* Only a single primitive variable can be saved (a list can be used to group several primitives)
* The names of the variables are not saved

# Practise importing json files
In the directory, 'Week_8', sub-subdirectory 'Lab_Exercise_5', the file `sensor_data.json` contains data from multiple sensors installed in a manufacturing plant. <br>Each sensor records its ID, timestamp, and the temperature it measures. Modify `program_5.py` to:
1. Import the data from `sensor_data.json` and print the values for each sensor as 3 individual lists 
2. Calculate the average temperature for each sensor and assign each value to a variable 
3. Save these values to a new JSON file `mean_sensor_data`, so that they can be callable when imported to a program using keys: `mean_sensor_1`, `mean_sensor_2` and `mean_sensor_3`. 