# Introduction to Computer Programming

## Week 8.1: Reading data

* * *

<img src="img/full-colour-logo-UoB.png" alt="Bristol" style="width: 300px;"/>

# Aims

In this video we will:

* Introduce CSV files
* Introduce the concept of downstream files
* Show how data in a CSV file can be loaded into Python

# CSV files

* CSV stands for comma-separated values
* CSV files are text files that contain data which is separated by commas
* CSV files (.csv) are commonly used to store data
* This data could be from Python code, Excel, or databases

# Example

Consider the data of the temperature at different times
| Time (hours) | Temperature (C) |
| :-: | :-: |
| 0600 | 10.3 |
| 0700 | 10.6 |
| 0800 | 12.1 |
| 0900 | 12.7 |
| 1000 | 13.5 |
| 1100 | 14.3 |
| 1200 | 15.1 |

This data is stored a CSV file called `temperature.csv`, which can be found on Blackboard
<img src="img/csv_file.png" style="width: 250px;"/>

# Importing data
* Imagine that we have created a folder on our computer called ICP
* This folder contains our main python program called `main.py` and the .csv file called `temperature.csv`
```python
ICP/
|---main.py
|---temperature.csv
```

In this case, we could open the file using the command
```python
file = open('temperature.csv', 'r')
```
The string 'r' means we are only opening the file to read the data in there

* However, you might have a folder within the ICP folder that contains all of your data (.csv) files
* Imagine that this folder is called `sample_data`
```python
ICP/
|---main.py
|---sample_data/
        |---temperature.csv
```
Files that exist in the same folder as your Python (.py) file, or any of its subfolders, are called **downstream files**

To load a file in a subfolder, we just add the name of the subfolder followed by a slash (/) before the file name

In [30]:
file = open('sample_data/temperature.csv', 'r')

Once we have opened a file using the `open` function, we can use a `for` loop to access its contents

In [31]:
for line in file:
    print(line)

Time (hour),Temperature (C)

600,10.3

700,10.6

800,12.1

900,12.7

1000,13.5

1100,14.3

1200,15.1



# Closing files

When you are finished reading in the contents of a file, the file needs to be closed using the `close` method

In [32]:
file.close()

# A better method for reading files

* It is better to use the `with` keyword with the `open` function to read files
* This automatically closes the file when reading operations are finished

In [39]:
with open('sample_data/temperature.csv', 'r') as file:
    for line in file:
        print(line)

Time (hour),Temperature (C)

600,10.3

700,10.6

800,12.1

900,12.7

1000,13.5

1100,14.3

1200,15.1



# Some points

* The variable `file` is iterable; we could use a `for` loop to go through the file line-by-line
* The variable `file` is **not** ordered; we cannot access elements using an index
* The `open` function reads the files as **strings**

In [34]:
with open('sample_data/temperature.csv', 'r') as file:
    for line in file:
        print(type(line))

<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>


# Converting read data

We could convert the `file` object into a list using the `list` function to make it ordered

In [37]:
with open('sample_data/temperature.csv', 'r') as file:
    L = list(file)
print(L)

['Time (hour),Temperature (C)\n', '600,10.3\n', '700,10.6\n', '800,12.1\n', '900,12.7\n', '1000,13.5\n', '1100,14.3\n', '1200,15.1\n']


But each row of data has been combined into a single string that needs to be manually separated

# The `csv` package

* The `csv` package makes it a bit easier to load .csv files
* To load the .csv file, we add the following command to the top of our .py file

In [38]:
import csv

# Using the `csv` package

Once loaded, we can read in data from a .csv file as follows

In [44]:
with open('sample_data/temperature.csv', 'r') as file:
    read = csv.reader(file)
    L = list(read)
print(L)

[['Time (hour)', 'Temperature (C)'], ['600', '10.3'], ['700', '10.6'], ['800', '12.1'], ['900', '12.7'], ['1000', '13.5'], ['1100', '14.3'], ['1200', '15.1']]


Now the file as been read and automatically separated into a nested list!

# Post-processing

The data is still imported as strings, so some further post-processing is needed to convert strings into floats when necessary.

This can done using loops or list comprehensions along with the `float` function

# Summary

* CSV (comma-separated values) files are commonly used to store data
* Downstream files are those which are contained in the same folder as your .py file or one of its sub-folders
* The `open` function can be used to open .csv files
* It's better to use the `with` keyword and easier with the `csv` package 