# Working Notebook

Welcome to the _Programming with Python_ course! We will be using this notebook to go through the lecture materials, as well as to work _together_ on practical examples and exercises.

## first thing: let's familiarise with the environment

Let's talk about **Jupyter Notebooks** for a second.

In [None]:
# code cell

Text Cell

---

$\rightarrow$ _Adapted from_ : [**Software Carpentries: Programming with Python**]()

## Arthritis Inflammation
We are studying **inflammation in patients** who have been given a new treatment for arthritis.

There are `60` patients, who had their inflammation levels recorded for `40` days.
We want to analyze these recordings to study the effect of the new arthritis treatment.

To see how the treatment is affecting the patients in general, we would like to:

1. Process the file to extract data for each patient;
2. Calculate some statistics on each patient;
    - e.g. average inflammation over the `40` days (or `min`, `max` .. and so on)
    - e.g average statistics per week (we will assume `40` days account for `5` weeks)
    - `...` (open to ideas)
3. Calculate some statistics on the dataset.
    - e.g. min and max inflammation registered overall in the clinical study;
    - e.g. the average inflammation per day across all patients.
    - `...` (open to ideas)


![3-step flowchart shows inflammation data records for patients moving to the Analysis step
where a heat map of provided data is generated moving to the Conclusion step that asks the
question, How does the medication affect patients?](
https://raw.githubusercontent.com/swcarpentry/python-novice-inflammation/gh-pages/fig/lesson-overview.svg "Lesson Overview")


### Data Format

The data sets are stored in
[comma-separated values] (CSV) format:

- each row holds information for a single patient,
- columns represent successive days.

The first three rows of our first file look like this:
~~~
0,0,1,3,1,2,4,7,8,3,3,3,10,5,7,4,7,7,12,18,6,13,11,11,7,7,4,6,8,8,4,4,5,7,3,4,2,3,0,0
0,1,2,1,2,1,3,2,2,6,10,11,5,9,4,4,7,16,8,6,18,4,12,5,12,7,11,5,11,3,3,5,4,4,5,5,1,1,0,1
0,1,1,3,3,2,6,2,5,9,5,7,4,5,4,15,5,11,9,10,19,14,12,17,7,12,11,7,4,2,10,5,4,2,2,3,2,2,1,1
~~~

Each number represents the number of inflammation bouts that a particular patient experienced on a
given day.

For example, value "6" at row 3 column 7 of the data set above means that the third
patient was experiencing inflammation six times on the seventh day of the clinical study.

Our **task** is to gather as much information as possible from the dataset, and to report back to colleagues to foster future discussions.

### Let'make a plan

- Problem description (step by step) in NATURAL LANGUAGE (**strict rule**) - imagine you're explaining this to someone who doesn't know **anything** about programming.
- What do we need to start
- Where do we start

I'll go first - let's create a dummy file to practice named dummy, three rows, 7 values

1. read the file
    - read the file one line at a time
2. store the data from the file into a data structure or format

In [1]:
dummy_datafile = open("dummy.csv")

In [2]:
patients = [] # list()

for line in dummy_datafile:
    patients.append(line)

In [3]:
print(patients)

['0,0,1,3,1,2,4\n', '0,1,2,1,2,1,3\n', '0,1,1,3,3,2,6']


### Small Diversion about **Python Typing Mechanism**

1. Python typing is _dynamic_ (as opposed to _static_): each variable gets its type by the value it's been assigned to. No need to declare a type for a variable. 

2. Python typing is _strong_ (as opposed to _weak_): the type of each variable will always remain the same, unless the variable is re-defined, or explicitly casted to another compatible type!

In [4]:
name = "valerio"

In [5]:
type(name)

str

In [6]:
name / 2

TypeError: unsupported operand type(s) for /: 'str' and 'int'

In [7]:
name * "maggio"

TypeError: can't multiply sequence by non-int of type 'str'

In [8]:
name = 2

#### Going back to our Data case

In [9]:
patients

['0,0,1,3,1,2,4\n', '0,1,2,1,2,1,3\n', '0,1,1,3,3,2,6']

In [10]:
for patient in patients:
    print("Patient info: " + patient)

Patient info: 0,0,1,3,1,2,4

Patient info: 0,1,2,1,2,1,3

Patient info: 0,1,1,3,3,2,6


### Storing the data from file in a better format:

In [1]:
dummy_datafile.close()  # first we need to close the file handler previously opened, otherwise the buffer has been read already and there's nothing else to read.

NameError: name 'dummy_datafile' is not defined

In [12]:
patients = []

with open("dummy.csv") as dummy_datafile:
    for line in dummy_datafile:
        line = line.strip()
        if (len(line) == 0):
            continue
        inflammation_data = line.split(",")
        patients.append(inflammation_data)
        

In [13]:
patients

[['0', '0', '1', '3', '1', '2', '4'],
 ['0', '1', '2', '1', '2', '1', '3'],
 ['0', '1', '1', '3', '3', '2', '6']]

Play with what we have so far: iteration

In [14]:
for patient in patients:
    print(type(patient))

<class 'list'>
<class 'list'>
<class 'list'>


In [15]:
patients[0][0]

'0'

In [20]:
patients[2][:3]

['0', '1', '1']

(_fancy word_) **Slicing**

![slicing example](https://swcarpentry.github.io/python-novice-inflammation/fig/python-zero-index.svg)

Source: [Software Carpentries](https://swcarpentry.github.io/python-novice-inflammation/02-numpy/index.html)

Now let's move to the _real_ data file: **how can we re-use the same algorithm?**

In [21]:
def process_inflammation_data(datafile_path : str) -> list:
    
    patients = []
    with open(datafile_path) as datafile:
        for line in datafile:
            line = line.strip()
            if not line:  ## line is empty line
                continue
            patient_data = []
            for value in line.split(","):
                patient_data.append(int(value))
            patients.append(patient_data)
            
    return patients

In [22]:
process_inflammation_data("dummy.csv")

[[0, 0, 1, 3, 1, 2, 4], [0, 1, 2, 1, 2, 1, 3], [0, 1, 1, 3, 3, 2, 6]]

_now we have 60 patiens to deal with_ - how can we do that?

What if we also add in a reference ID for each patient? (see `data/inflammation02.csv`)

Let's practice with our new data structure

Let's get on with the _real deal_ : let's gather some statistics!

--- 

Well done for reaching this point! 🎉

**GREAT TIME FOR A BREAK NOW!** ☕️🧁🍪

---


Dealing with more _realistic cases_ ❌

Putting our helmets on (_with some testing_) ⛑

Now it's time to rethink about our Data (Abstractions): let's define our own **new type**!