# Introduction to structured input and output data formats 📁
## Online Book
As a remainder, this is the follow along notebook for the [Introduction to structured input and output data formats](https://computer-science-tutorials.readthedocs.io/en/latest/Introduction/index_p2.html#introduction-to-structured-input-and-output-data-formats) lecture of the course.

{{ badges }}

## Introduction
👋 Hello, fellow Data Adventurers! 🚂 All aboard the Data Express, our main stops are:

- 📄 **Comma-separated Values (CSV):** A file format where data items are separated by commas, offering us neat and organized columns and rows!
- 📄 **JavaScript Object Notation (JSON):** A file format where data transforms into **key-value pairs**, providing an intuitive *map* ️🗺️ to find our data

As we follow this journey on data, we will also dive deeper into **iterable objects**, unraveling new iterables like dictionaries 📚 and tuples 📃, uncovering nested dictionaries and how to effortlessly create for loops using the ```range``` function or go through JSON files using the  ```json``` module! 💡🧙‍♂️📂

## Motivation
Our main motivation is to learn how to 🛠️ **Mine the Black Gold of the 21st Century!** 🛠️ In our hyper-connected era, data has become a key 🌟 **digital asset** 🌟 – a resource as pivotal and transformative as the oil booms of the 20th century!

Endless streams of data flow through the Internet, encapsulating priceless information. From our simplest preferences 🍕🎮 to complex societal trends 📈🌎, data fuels the engines of the digital economy, which is now the backbone of our global economy! 💰💰💰

## Basic Concepts
### File Manipulation in Python
In order to work with files in Python, we need to open them first. We can do this using the ```open()``` function, which takes two arguments: the file name and the mode. The mode can be either ```r``` for reading, ```w``` for writing, or ```a``` for appending. If we don't specify the mode, the default is ```r```.

```python
file = open("file.txt", "r")
```

Once we are done working with the file, we need to close it using the ```close()``` function.

```python
file.close()
```

Normally, we use the ```with``` statement to open files. This way, we don't need to worry about closing them, as the ```with``` statement will automatically close the file once we are done working with it.

```python
with open("file.txt", "r") as file:
    # do something with the file
```

### Tabular Data 🔢
Tabular data is data that is organized in a table. Each row represents a single data item, and each column represents a feature of the data item. There are many file formats that can be used to store tabular data, such as CSV, TSV, Google Sheets, or Excel.

### Comma Separated Values (CSV) 📃
Comma Separated Values (CSV) is a file format that is used to store tabular data. Each row is represented by a line, and each column is separated by a comma. The **first line** of the file usually contains the **column names**, for instance, let us consider that we want to store data about the students in a class. We can store the data in a CSV file as follows:

```csv
Name, Surname, Age, Grade
Peter, Parker, 15, 9
Mark, Grayson, 17, 11
Mary, Jane, 16, 10
Eve, Wilkins, 17, 11
Rex, Sloan, 18, 11
```

### Exercise 1: Sensor readings to CSV File
Let´s start with an example which could be really handy for your IoT projects. In this example, we will use the module random to generate random sensor readings from a biometric sensor and store them in a CSV file. The CSV file will have the following format:

```csv
Time, Heart Rate, Blood Pressure, Body Temperature
2023-10-12 00:00:00, 80, 120, 36.5
2023-10-12 00:00:01, 81, 121, 36.6
2023-10-12 00:00:02, 82, 122, 36.7
````

We will use the following functions:

- ```range(n)``` returns an iterable object that contains the numbers from 0 to ```n - 1```. "Check the documentation":https://docs.python.org/3/library/stdtypes.html#range for more information.
- ```random.uniform(a, b)``` returns a random floating point number between ```a``` and ```b```. "Check the documentation":https://docs.python.org/3/library/random.html#random.uniform for more information.
- ```time.strftime(format)``` returns a string representing the current time, formatted according to the given format. "Check the documentation":https://docs.python.org/3/library/time.html#time.strftime for more information.
- ```time.sleep(seconds)```: suspends the execution of the current thread for the given number of seconds. "Check the documentation":https://docs.python.org/3/library/time.html#time.sleep for more information.

We will also use formatted strings to write the sensor readings to the file. Check the [tutorial](../Introduction/tutorials/Variables.ipynb) on string variables for more information.


In [None]:
import random  # We will use the random module to fake sensor readings
import time    # We will use the time module to get the current time

patient_id = 'WA1001' # The ID of the patient

with open('sensor_readings_WA1001.csv', 'w') as file:
    # Write the header line
    file.write('Time, Heart Rate, Blood Pressure, Body Temperature\n')

    # Write the sensor readings every second for 10 minutes
    for i in range(10):
        # Generate random sensor readings
        heart_rate = random.uniform(60.0, 100.0)
        #TODO: Generate random blood pressure and body temperature readings

        # Get the current time
        current_time = time.strftime('%Y-%m-%d %H:%M:%S')

        # Write the sensor readings to the file
        #TODO: Write the random sensor readings to the file

        # Wait for 1 second
        time.sleep(1)

### Javascript Serial Object Notation (JSON)
CSV files are really handy to represent tabular data, but depending on your application, or your programming taste, you might like to use other type of syntax which better represents the structure of the variables in your program. This is where JSON comes into play!

Let us disect the name JSON:
- J is for Javascript, which is another very popular programming language.
- S is for Serial. Serializing is the process of readying a variable so that it can be stored in a file.
- O is for Object. Objects is another word for variable
- N is for Notation, which is another word for syntax.

Ok, so, putting the pieces together, JSON is a notation or syntax defined to store variables in files in an organized way. The JSON syntax is very simple:

- **Curve brackets**: Are used to specify the beginning ```{``` and end ```}``` of an object.
- **Comma separated list of key-value pairs**: Within the curve brackets, we need to specify the attributes or properties of the object. We will use what is called Key-Value pairs. The Key is the identifier (normally the name) of each individual attribute of the object, and the value is the value it takes.
- **Colon-separated Key-Value pairs**: We will use ```:`` to map each attribute with its corresponding value using.

This is an example of a JSON object:

```json
{
    "name":"Wilson",
    "surname":"Fisk",
    "alias":"Kingpin",
    "age": 49
}
```

Note that this object has 4 attributes, *keyed* name, surname, alias, and age. We can assign numeric or text values (actually, we could also use numbers as keys.

### Dictionaries
Dictionaries are a type of iterable object that use JSON notation:

```python
kingping_dict = {
    "name":"Wilson",
    "surname":"Fisk",
    "alias":"Kingpin",
    "age": 49
}
```

You can access the values of the properties using the keys:




In [None]:
kingping_dict = {
    "name":"Wilson",
    "surname":"Fisk",
    "alias":"Kingpin",
    "age": 49
}


Dictionaries, are mutable, meaning that we can add new key-value pairs to a dictionary:

We can iterate over dicts in a for loop:


In [None]:
# use items() to access key value pairs

# use keys() to access just keys

# Or values() to access just the values

### Tuples
Ok, we are going to introduce yet another type of iterable which is very handy. They are called Tuples and the main difference with respect to lists is that they cannot be modified, so they are useful when we are dealing with static context in our programs. Tuples use parenthesis ```(``` and ```)``` instead of brackets, and we cannot pop() or apprend members of an tuple. Once they are defined, they cannot change:


In [None]:
a_tuple = (1, 2, 3, 4) # Tuples are static, once created, they cannot be modified

a_list = [1, 2, 3, 4] #Lists can change

# Indexing works just the same in both types of iterables

# We can use append() and pop() methods to modify lists, and also dicts!

# We can also modify an indiviual member of a list

#But not tuples, tuples are immutable

### Example 2: Patients form
Ok, let us build another simple program. We are going to build an interactive form, asking patients to fill in some basic information like name, surname, age, and gender. Our program will auto-generate a patient ID


In [5]:
patient_id = 'WA0001'
patients_data = {"patient_id": patient_id}
patient_keys = ("name", "surname", "age", "gender")


### Json module
The JSON module is quite handy because it allows us to store dictionaries in files so that we can use them in other programs:



In [6]:
import json
# Use dumps to dump to a file


In [None]:
import json
# Use loads to load back from a file


### Nested iterables
We can nest iterables into another iterables and use indexing to access members:


In [None]:
a_matrix = [[1, 2, 3, 4],
            [5, 6, 7, 8]]

patients_bio = [["2023-10-09 13:54:56", 90, 70, 36],
                ["2023-10-09 13:54:57", 91, 70, 36],
                ["2023-10-09 13:54:58", 92, 71, 36]]



### Extra: IN Operator
So far, we have used the ```in``` operator in for loops, but we can also use it in ``ìf``` clauses.
What do you think it will do?
