![SI-Notebooks-Hor-Break-Col.svg](attachment:SI-Notebooks-Hor-Break-Col.svg)

# <span style="color:#484848;"> File Handling in Python </span>

### <span style="color:#00aba1;"> Keywords </span>
`Python`, `Files`, `Directory`

### <span style="color:#00aba1;"> Notebook Info </span>

**Author(s):** Rafael Silva, Hugo Silva, Ana Fred

**Date of creation:** dd/mm/aaaa (*)

**Last update:** dd/mm/aaaa (*)

**Last revision:** dd/mm/aaaa (*)

# <span style="color:#00aba1;"> 1. Overview </span>

## <span style="color:#484848;"> 1.1. Introduction </span>

While it is possible to use a spreadsheet environment for loading and visualizing signals, its use can be limited when dealing with multiple recordings and especially when they are of long duration (i.e. large file size). It is also more difficult and time consuming to develop methods and algorithms to process and analyze these signals. Thus, robust and efficient computational environments based on programming languages such as Python are a useful tool to perform such tasks. In addition, specialized libraries for visualizing and analyzing data, solving equations, applying mathematical functions, signal processing and using Artificial Intelligence algorithms can be extremely useful for scientific projects.

In this lesson, we will learn how to create a standard Python environment suitable for scientific computing, which includes file manipulation, plotting, and other features provided by Python libraries.

> 📋 **NOTE:** In general, Python libraries are well documented and provide example applications, such as [Matplotlib](https://matplotlib.org/index.html) and [NumPy](https://numpy.org/). However, there are also educational websites like [W3Schools](https://www.w3schools.com/) and [Programiz](https://www.programiz.com/There) for learners, as well as online forums and communities that respond to public requests and questions.     

## <span style="color:#484848;"> 1.2. Objectives </span>

By the end of this class you should be able to:

* Load formatted text files in Python ('.txt' and '.csv')

* Use basic functions for signal visualization and plot customization

# <span style="color:#00aba1;"> 2. File Handling </span>

To be safe, please run the following piece of code to save the current directory in a variable.

In [1]:
import os
cwd = os.getcwd()

## <span style="color:#484848;"> 2.1. Directory </span>

An operating system organizes its files and folders in an hierarchical manner (i.e., the directory). To be able to handle files (open, create, update, delete) it is important to be aware of the main navigation commands in a Python environment. Unless we specify the whole or a relative path to a file, we will only have access to scripts and files that are in the same directory as we are. These commands are more useful in the interactive mode (i.e., the console), since in Python IDEs we can use the navigation buttons. 

* To show the present location (or path) of the directory, use the command `pwd`:

In [2]:
pwd

'C:\\Users\\rafa9\\Universidade de Lisboa\\Ana Sofia Cacais do Carmo - ScientISST\\ScientISST Notebooks\\Demo\\T. Teaching\\T001_PythonFilesandPlots'

This is the path to the folder where this notebook is located. Note that each level is separated by a double backslash `\\`. 

* To list all files and folders within the current directory we can use the `os.listdir()` function:

In [3]:
os.listdir()

['.ipynb_checkpoints',
 'resources',
 'S3_PythonFilesandPlots.ipynb',
 'T001_PythonFilesandPlots.ipynb']

To navigate to other folders, we use the command `cd`.

* To move to a folder within the current directory, we type the folder's name with double quotation marks ``"<folder_name>"``:

In [4]:
cd "resources"

C:\Users\rafa9\Universidade de Lisboa\Ana Sofia Cacais do Carmo - ScientISST\ScientISST Notebooks\Demo\T. Teaching\T001_PythonFilesandPlots\resources


* To move up one directory level, we type `cd ..`:

In [5]:
cd ..

C:\Users\rafa9\Universidade de Lisboa\Ana Sofia Cacais do Carmo - ScientISST\ScientISST Notebooks\Demo\T. Teaching\T001_PythonFilesandPlots


We can also specify the whole path to the desired folder:

```
cd "<the path goes here>"
```

To make sure we are using the correct directory, run the following piece of code:

In [6]:
os.chdir(cwd)

## <span style="color:#484848;"> 2.2. Read files using Python built-in functions </span>

For creating, reading and updating files, Python includes various functions, namely the function `open()`. This function receives two main parameters: 
* **filename**: the name of the file in the current directory, or the whole path to it

* **mode**: `"r"` to read, `"a"` to append and `"w"` to write a file

Let's open the `ecg_op3.csv` file by creating a file object called `f`:

In [7]:
f = open("resources/ecg_op3.csv", "r")

Since the file is now accessible to the Python environment, we can interact with it. 

Let's use the `readline()` method associated with the file object, which reads one line at a time:

In [8]:
line = f.readline() # first line of the file
line

'1780\n'

Note that `\n` is the character for a new line. When we no longer need information from the file, we need to close it:

In [9]:
f.close()

To read the whole file, we can create a loop that reads each line and saves it on a list:

In [10]:
f = open("resources/ecg_op3.csv", "r")
ecg_data = [] # empty list to save the signal

for i in f:
    row = f.readline() # reads line
    row = row.strip('\n') # removes the \n character
    ecg_data.append(row) # adds line to the list
    
f.close()

We can now access the values inside the `ecg_data` list:

In [11]:
ecg_data

['1774',
 '1727',
 '1667',
 '1616',
 '1588',
 '1581',
 '1588',
 '1613',
 '1637',
 '1654',
 '1667',
 '1669',
 '1667',
 '1664',
 '1666',
 '1664',
 '1671',
 '1675',
 '1679',
 '1679',
 '1676',
 '1675',
 '1679',
 '1686',
 '1700',
 '1713',
 '1728',
 '1741',
 '1745',
 '1754',
 '1760',
 '1760',
 '1757',
 '1758',
 '1753',
 '1755',
 '1763',
 '1776',
 '1793',
 '1808',
 '1824',
 '1841',
 '1853',
 '1858',
 '1861',
 '1857',
 '1857',
 '1861',
 '1869',
 '1877',
 '1899',
 '1917',
 '1936',
 '1953',
 '1968',
 '1977',
 '1984',
 '1986',
 '1994',
 '2001',
 '2017',
 '2032',
 '2059',
 '2078',
 '2091',
 '2096',
 '2095',
 '2085',
 '2078',
 '2070',
 '2067',
 '2067',
 '2071',
 '2070',
 '2074',
 '2071',
 '2064',
 '2057',
 '2044',
 '2035',
 '2027',
 '2011',
 '1998',
 '1972',
 '1945',
 '1915',
 '1887',
 '1862',
 '1840',
 '1819',
 '1797',
 '1776',
 '1751',
 '1733',
 '1713',
 '1693',
 '1676',
 '1652',
 '1632',
 '1613',
 '1590',
 '1567',
 '1547',
 '1523',
 '1512',
 '1501',
 '1497',
 '1492',
 '1490',
 '1489',
 '1479',
 

Note that each value of the list is a string, and not a numeric representation (e.g., integer, float). To change all values to integer we could have indicated in the previous *for* loop the data type of the rows:
```python
f = open("resources/ecg_op3.csv", "r")
ecg_data = [] 

for i in f:
    row = f.readline().rstrip('\n') 
    ecg_data.append(int(row)) # <----- define row type with int()
    
f.close()
```

## <span style="color:#484848;"> 2.3. Read files using Python libraries </span>

Python libraries can make some tasks easier to perform and more intuitive. For example, the Numpy library has a single function to read text files, that performs the same steps described previously, called `loadtxt`. Let's see how it works:

In [12]:
import numpy as np
np.loadtxt?

The first parameter **fname** corresponds to filename or filepath of our file. Let's use this function to read the `ecg_op3.csv` file:

In [13]:
ecg_data = np.loadtxt('resources/ecg_op3.csv')
ecg_data

array([1780., 1774., 1749., ..., 1511., 1513., 1515.])

All it took was one line of code! Note that the function outputs a *numpy array* object, which is a Numpy data structure similar to lists and optimized for scientific computation.

Depending on the format of our data we can specify more parameters in this function, namely:
* **dtype**: the data-type of the resulting array (optional)
* **comments**: the character that indicates a comment (optional)
* **delimiter**: the character used to separate values (optional)
* **skiprows**: number of rows to skip (optional)
* **usecols**: which columns to read (optional)

Now let's try to load the `ecg_op2.csv`, knowing that this file uses the `#` character for comments and uses `\t` (tab) as its value delimiter:

In [14]:
ecg_data = np.loadtxt('resources/ecg_op2.csv', comments='#', delimiter='\t')
ecg_data

array([[  0.,   0.,   0., ..., 254.,   0., 254.],
       [  1.,   0.,   0., ..., 254.,   0., 254.],
       [  2.,   0.,   0., ..., 254.,   0., 254.],
       ...,
       [ 13.,   0.,   0., ..., 254.,   0., 254.],
       [ 14.,   0.,   0., ..., 254.,   0., 254.],
       [ 15.,   0.,   0., ..., 254.,   0., 254.]])

Note that the first two rows of the file have been ignored. The output is a numpy array object, in which each row is another array containing the values of the columns separated by commas (i.e. a matrix) :

In [15]:
ecg_data[0]

array([   0.,    0.,    0.,    0.,    0., 1780., 1706.,    0.,  254.,
          0.,  254.,    0.,  254.,    0.,  254.,    0.,  254.])

Numpy arrays follow the same indexing and selecting notations as lists:
```
array[line, column]
```
Thus, to extract an entire column from an array and save it to a variable we use `:`:
```
col = array[:, column]
```

In [16]:
first_column = ecg_data[:,0] # all lines from the first column
first_column

array([ 0.,  1.,  2., ..., 13., 14., 15.])

✏️ **EXERCISE:** 

1. Using the capabilities of *NumPy*, load the `ecg_op2.csv` into the variable `ecg_data`. Take into account that:

* The *NumPy* array should only contain values of type integer ('int')

* The function must ignore lines that begin with '#'

* The tab character is used as a value delimiter

* You should only read the 'NSeq', 'AI1_raw' and 'AI1_mv' columns

> ⚡ **TIP:** check the first line of the .csv file for the indexes of the desired columns and add them in a list.

2. Using the `ecg_data` variable, extract the 'AI1_mv' column into a numpy array called `ecg_mv`.

3. Read the first line of the `ecg_op2.csv` file (i.e. metadata) and save it as a dictionary called `ecg_data_meta`. You should import the **ast** library and use the *literal_eval* function. Print the dictionary to check your output.

> ⚡ **TIP:** Use the example from II.2 and remove the `#` character from the line before using it in the function.


### Further Reading:

* https://www.w3schools.com/python/python_file_handling.asp

* https://www.w3schools.com/python/python_file_open.asp

* https://www.w3schools.com/python/python_file_write.asp

<img style="height:1.2cm" align="left" src="../../_Templates/template_resources/logo.png">