## Introductory Programming with Python (Part 1)

### 1. Your Learning Journey 
 - Solving Real Problems 

### What You'll Gain

By the end of our workshop, you'll be able to:  
- Extract key insights.
- Make data visual.
- Solve problems.
- Build solutions.
- Develop skills that transfer to many diferent fields. 
 


### 2. Python installation and usage.

- There are different ways to install python on a laptop. 
- The official Python installer.  

#### Python virtual environments

- Useful for working on 2 or more projects with different dependenies

#### Tools for creating virtual environments: virtualenv and conda.

- conda works reasonably well on personal computers not on HPC clusters 


#### Setup

- Python installed locally or
- Acess our training cluster via JupyterHub

Any setup questions before we proceed?

#### Jupyter Notebooks 
- Great for sharing
- Store executable code, textual documentation, and visualizations.

In [None]:
# Plot of a sinc function from -10 to 10

## 3. A brief introduction to Python
### Why Python is so popular? 
- allows you to focus on solving big problems. 
- flexible, practical, and powerful.
- open source code for solving virtually any problem

### How Python stands out from other programming languages?

#### Simple, and human-friendly syntax. 

Languages with challenging syntax (e.g. Perl, C++, RegEx).   
Example of a language with a challenging syntax - regular expression designed to extract email addresses:   

`\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,7}\b`




Compare it to Python code to calculate average speed -  you'll write:


#### Python is Object-Oriented by Design: Blueprint for Clean Code

- Everything in Python is an object (even integers and strings)
- Example - every integer is an instance of the int class

In [None]:
# what is the type of int? 

In [None]:

#  what are the attributes of int? 

In [None]:
# what methods does int have? 

### Why should you care about OOP? 
#### Benefits of object-oriented programming 

- Code is organized into self-contained objects, 
- Data and methods are bundled together 
- You can create new classes by inheriting from existing ones.  
- Dynamic memory management.  
- Python works anywhere.


#### Python’s Trade-Offs

- The Global Interpreter Lock (GIL) restricts true parallel threading
- Pure Python performance is slow for heavy computation
- Startup Latency  
-  Limited native support for iOS/Android ecosystems

### Keypoints
####  Beginners Love Python because  
- Python has gentle learning curve: You'll write useful programs on day one  
- Python code is easily readable: You can understand your own code weeks later  
- Python gives you instant feedback: You can see results immediately without waiting fro the code to be compiled and then running it. 
- Python's got one of the best support communities out there. Huge open-source ecosystem, docs for everything, and tons of users happy to help beginners 

#### Why Scientists and Engineers Love It  
- Python offers clean data workflows: Tools like Pandas and NumPy transform raw data into actionable insights-
- Python Visualize clearly: You can build charts and graphs quickly with Matplotlib/Seaborn with just a few lines of code
- Simplify repetitive work: Quickly create scripts to handle routine tasks while you focus on discovery
- Python is proven in practice: It is used by NASA, Google, and researchers worldwide.

#### How Python Compares to Other Languages?  
| Language | Best For | Learning Experience | Real-World Analogy |
|---|---|---|---|
|Python| Data analysis, beginners	   | Like learning with training wheels	| Comfortable SUV - easy to drive anywhere |
|C++   | Game engines, performance	   | Like building a car engine	| Race car - powerful but complex|
Java   | Large business systems	       | Like assembling furniture	| Minivan - practical but lots of parts|
|R	   | Statistics, academic research | Like using a scientific calculator	| Lab equipment - specialized|


## 4. Getting to Know the Dataset

To base our learning on a realistic context, we’ll use a synthetic dataset from a fictional clinical trial evauating the therapeutic effect of a novel drug.

## 5. Basic concepts: variables, data types and built-in finctions

### 5.1 Variables

 Python as a command-line calculator:

In [None]:
# quick math with numbers 

- Math with variables

In [None]:
# math with variables a,b.c 


- variable is simply a name that refers to a value.
- variables let us apply the same logic to different data.  

Another example - storing weight in a variable called weight_kg:

In [None]:
# storing weight

#### How to name your variables
- you can use letters, digits, and underscores
- the name can’t start with a digit
- names are case-sensitive

In [None]:

 # try different variable names

Examples:
- `weight0` is valid, but `0weight` isn’t
- `weight_kg` is different from `Weight_kg`

### 5.2 Types of Data in Python

Some of the most common types:

- int **Integer** 
- float **Floating point number** 
- str **String**  
- complex **Complex number** 
- bool **Boolean**  

Python automatically determines a value’s data type.

#### Integers and Floating point numbers

- Use an interger to store patient's weight:
  

In [None]:
# create an integer variable

- We can use a floating point number if we need more precision:

In [None]:
# create an floating point variable

- We can convert between types if needed.  

Define weight as int:

In [None]:
# Define weight as int

Convert it to float:

In [None]:
# Convert it to float


#### Strings

- Strings are surrounded by single or double quotes:

In [None]:
# store patient id as text

#### When to choose one over the other:

- readability 
- avoiding escape characters

If your string contains a single quote, use double quotes to avoid escaping:


In [None]:
# examples of strings with single and double quotes

NOTE: 'sentence' prints representation of string, 'print(sentence)' shows the actual string!

### 5.3 Using Variables
- use variables to do calculations 

Convert weight to pounds:

In [None]:
# convert weight to pounds

Change a string:

In [None]:
# concatenate strings

Now the patient ID would look like this: inflam_001.

### 5.4 Built-in Functions
- built-in functions for common tasks.
- print() for example, shows information:

In [None]:
# print()

- parentheses tell Python: "Execute this function now!"  
- Arguments go inside the parentheses   
- Multiple arguments are separated with commas

 Display both the patient ID and the weight in kilograms:

In [None]:
# using print() with multiple arguments

#### Key Rules:

- Always use ( ) after the function name
- Arguments are the data you want to process
- Commas separate multiple arguments
- No limit on arguments (if syntax is correct)

You can see the full list of built-in functions in Python’s official docs: https://docs.python.org/3/library/functions.html

### 5.5 Checking Data Types
- type() - check what kind of data we’re working with. 

In [None]:
# type()

### 5.6 Arithmetic with Variables
- You can put math expressions directly inside print()

In [None]:
# math inside print()

This doesn’t change the value of weight_kg:

In [None]:
# value of weight_kg stays the same

- variables are like labels attached to values.
- to change the value of a variable, we have to assign it a new value 

In [None]:
# reassign weight

#### Calculating new value from old:

In [None]:
# calculate new value from old

#### Shortcut reassignment operators

In [None]:

x = 10
x += 5      # x is now 15
x *= 2      # x is now 30
x //= 7     # x is now 4 (floor division)
x

Assign several variables at once:

In [None]:
# assign several variables in one statement

Python lets variables change types dynamically:

In [None]:
# change types at rintime

#### Exercise 1.  (5 min)

A. What values do the variables *mass* and *age* have after each of the following statements? Test your answer by executing the lines.
  - mass = 47.5
  - age = 122
  - mass = mass * 2.0
  - age -= 20 



B. What does the following code print out?

In [None]:
first, second = 'Grace', 'Hopper'
third, fourth = second, first
print(third, fourth)

C. What are the data types of the following variables?   
- planet = 'Earth'
- apples = 5
- distance = 10.5

### 6. Introduction to Python Libraries.

- Libraries are collections of reusable code that solve common problems
- It’s best to load libraries only when you need them to keep your code clean and efficient.

We’ll use NumPy, which stands for Numerical Python. To tell Python that we’d like to start using NumPy, we need to import it:


In [None]:
# import a library

### 7. Loading Data

- Load data using a simple CSV reader loadtxt()
- For more complex or messy fles use genfromtxt() - it is more flexible. 

In [None]:
# simple .csv reader

This command only reads the file and prints its contents (we have not assigned the returned array to any variable). 

Let’s re-run np.loadtxt() and save the returned data:

In [None]:
 # save loaded data

Check that the data have been loaded:

In [None]:
# display data

### 7.1 Understanding the Data

Check what type of object our variable data refers to:

In [None]:
# type of the data object

What is the data type of the elements inside the array? 

In [None]:
# type of the data elements stored in the object


- Standard NumPy arrays are homogeneous containers.

In [None]:
# check the dimensions of the array

- 200 samples (rows) x 60 daily measurements (columns). 

- Array attributes like .shape or .dtype describe essential properties

### 7.2 Accessing Data in NumPy Arrays
#### Basic Indexing (0-Based): Accessing individual elements. 

In [None]:
# print first element

In [None]:
# print middle element

- In Python, arrays are indexed starting from 0.
- With "zero-based" indexing if an array has dimensions M×N, the valid indices range from 0 to M-1 for rows and 0 to N-1 for columns. 

#### Slicing: Accessing subarrays by specifying slices.
- Extract subarrays with `start:stop:step` syntax. 

In [None]:
# select the first ten days of values for the first four patients

We don’t have to start slices at 0:

In [None]:
# extract middle sections

Start and stop indices are optional. 

In [None]:
# default first and last elements

- default start index is 0
- default end index is the last element

____
#### **Note: You can also slice Strings**
We can take slices of character strings juat as we do it with numpy arrays:

In [None]:
# slice a string 

# Exercise 2. String Slicing (10 min)
Given the string word = 'algorithm'  

```
Indices:    [0] [1] [2] [3] [4] [5] [6] [7] [8] 
Characters:  a   l   g   o   r   i   t   h   m  
```

A. Slice the string:
  - What is word[:4] ?
  - What is word[5:] ?
  - What is word[3:6] ?   
  - What is word[2::3] ?

B. Negative Indexing
  - What is word[-1] ?
  - What is word[-3] ?
  - What is word[-4:-1] ?

C. Explain these operations:

1. word[2:-2]
   - What substring does it return?
   - How do positive and negative indices interact?
2. word[::-1]
   - What does this operation do
   - What does -1 mean in the step position?

In [None]:
# Verification template
word = "algorithm"
print("A1:", word[:4])     # ?
print("A2:", word[5:])     # ?
print("A3:", word[3:6])    # ?
print("A4:", word[2::3])   # ?
print("B1:", word[-1])     # ?
print("B2:", word[-3])     # ?
print("B3:", word[-4:-1])  # ?
print("C1:", word[2:-2])   # ?
print("C2:", word[::-1])   # ?

###  8. Analyzing the data

- NumPy provides high performance functions optimized for operations on entire arrays.

###  8.1. Case study - assessing the therapeutic effect of the treatment.

- 60 patients.
- The trial lasted for 40 days.
- Each row in the data represents a different patient.
- Each column represents a day of the trial.
- The numbers in the data show how many times each patient had inflammation each day.

The PI says the drug takes a few weeks to work, so we want to check if the inflammation severity really goes down after patients take it.

To figure this out, we will:
1. Find the average inflammation severity index per day across all the patients. This will help us see if the drug is working.
2. Create a graph to show this information clearly, so we can easily share it with others.

We'll first apply our code to a test dataset where we know the expected outcome.

### 8.2. NumPy functions.  

#### Key Functions

| Category	| Functions |
| --------- | --------- |
| Math	| np.sqrt, np.exp, np.sin |
| Aggregation |	np.sum, np.mean, np.max |
| Manipulation |	reshape, flatten, transpose |
| Logic	| np.where, np.logical_and |
| Linear Algebra |	np.dot, np.linalg.inv |
| Sets |	np.unique, np.intersect1d |


Let's start using NumPyto analyse our clinical trial data. 

First we find the average inflammation severity index:

In [None]:
# use np.mean() 

- here, we are calling the mean() function from the NumPy module.

Also we can use mean() as a method:

In [None]:
# use data.mean()

- data.mean() is a method that belongs to the *data* object itself.

_______________________________________________________________________________________________________
#### **Note: Not all functions require input**

Some functions can return a result without any input at all:

In [None]:
# functions without arguments

For functions that don’t take in any arguments, we still need parentheses ( ).

_______________________________________________________________________________________________________

Find the maximum value, the minimum value, and the standard deviation:

In [None]:
# amax(), amin(), std()

### 8.3 Extracting and Analyzing One Patient’s Data

Let's select all the inflammation data for the first patient and assign it to the variable patient_0:

In [None]:
# extract data for the first patient then compute mean

Instead of storing the row in a separate variable, we can perform operations directly on a slice of the data.

In [None]:
# compute mean directly on the 'view' of the data

 - fast and memory-efficient
 - no new data is created
 - more concise code

### 8.4 Summarizing the data (aggregation) 

- We often want to summarize the data by computing things like totals, means, minimums, or maximums. 
- In our case we may want to calculate these things for each patient or for each day.
- To do this, we perform calculations across rows or columns of our data.

The parameter 'axis' tells the function which direction to operate in:

- axis=1 → operate across columns (i.e. calculate a value for each row, such as per patient)
- axis=0 → operate down rows (i.e. calculate a value for each column, such as per day)

#### Example 1: Maximum Inflammation Per Patient

To calculate the maximum inflammation for each patient (looking across all the days), we do this:

In [None]:
# collapse columns, aggregare max

#### Example 2: Average Inflammation Per Day

To calculate the average inflammation per day (looking at all patients), we do this:

In [None]:
# collapse rows, aggregate mean

This array contains the average inflammation per day for all patients. To confirm, we check the shape of this data:

In [None]:
# check the shape

#### Example 3: Average Inflammation Per Patient

To calculate the average inflammation per patient (looking across all days), we do this:

In [None]:
# collapse colunms, aggregate mean

### 8.5. Change in Inflamation 
- To assess how inflammation changes from one day to the next, we can use the `np.diff()` function.

The `np.diff()` function takes an array and returns the differences between each pair of successive values.

#### Example 4: Change in Inflammation for One Patient

- patient 3 during the first week:

In [None]:
# extract data for patient 3, week 1

Now, let’s calculate the difference between each day’s inflammation:

In [None]:
# calculate diff 

The function calculates the following:
- 83 - 96
- 97 - 83
...
- 98 - 110

___
### Check your Understanding
#### Applying np.diff() to the whole dataset

If you want to apply `np.diff()` to a multi-dimensional array (like our dataset), you can specify which direction (axis) to calculate the differences.

- axis=1 will compute the differences across days for each patient (horizontally).
- axis=0 will compute the differences across patients for each day (vertically).

*Try it out!*


In [None]:
# diff, data, axis

___  

## 9. Visualizing data
- The best way to get insight is often to visualize data. 

- We use a library called **matplotlib** for creating plots and graphs. 
- Specifically, we use the **pyplot** module to make various kinds of charts.

In [None]:
# import matplotlib

#### 9.1 Heatmaps
- Plot a heatmap:

In [None]:
# create a heat map using the data

- `imshow()` displays 2D array as a grid where each cell's color corresponds to the value in that cell.
- the color gradient helps us quickly see patterns in the data. 

In [None]:
# add colorbar and customize colormap

As we can see, the number of inflammation flare-ups decreases during the 60-day period, which aligns with the trial design.

##### 9.2. Plotting data averaged across patients.
To understand the trend in inflammation over time, we can calculate the average inflammation per day across all patients and visualize it:

In [None]:
# Plot mean 

- calculated and plotted the average inflammation per day
- the result supports our earlier visual observations

Let's explore two additional statistics for a more complete picture.

In [None]:
# Plot max

In [None]:
# Plot min

- Both the maximum and minimum values follow a similar pattern to the average
- This confirms that all patients respond to the treatment and our code is working correctly

### 10. Grouping plots
- goal - visualize multiple related aspects side by side for comparison.
- use Figures and Subplots.
- create a figure with three subplots: average, maximum, and minimum inflammation

In [None]:
# 1. Create a figure and specify the overall size

# 2. Create three subplots in a 1 row x 3 columns grid

# 4. Plot average inflammation on the first subplot

# 5. Plot maximum inflammation on the second subplot

# 6. Plot minimum inflammation on the third subplot

# 7. Adjust the layout so plots don't overlap

# 8. Save the figure as a PNG file

# 9. Display the figure

- **fig = plt.figure(figsize=(10.0, 3.0))**: Creates a new figure with dimensions 10x3 inches.

- **axes1 = fig.add_subplot(1, 3, 1), axes2 = fig.add_subplot(1, 3, 2), axes3 = fig.add_subplot(1, 3, 3)**: Creates three subplots in a 1-row, 3-column grid.

- **axes1.plot(np.mean(data, axis=0))**: Plots the average inflammation per day on the first subplot.

- **axes2.plot(np.amax(data, axis=0))**: Plots the maximum inflammation per day on the second subplot.

- **axes3.plot(np.amin(data, axis=0))**: Plots the minimum inflammation per day on the third subplot.

- **axes1.set_ylabel('average'), axes2.set_ylabel('max'), axes3.set_ylabel('min')**: Labels the y-axes for each subplot.

- **fig.tight_layout()**: Adjusts the spacing between the plots.

- **plt.savefig('inflammation.png') & plt.show()**: Saves the plot as a PNG and displays it.

## Exercise: Plotting Average Inflammation with Error Bars ( 10 min)
In this exercise, you’ll visualize the average inflammation across all patients for each day of the clinical trial. To better understand the variability in patient responses, you'll also include standard deviation as error bars.

### Instructions:

1. Compute the average inflammation per day across all patients using np.mean().
2. Compute the standard deviation per day using np.std().
3. Create a time axis using np.arange(), where each value represents a day.
4. Use plt.plot() to draw the line graph of average inflammation.
5. Use plt.errorbar() to add error bars representing standard deviation at each point.
6. Label the plot and axes appropriately.
7. Add a legend to distinguish the line and error bars.

### Hint:

- Make sure the x axis (days) matches the length of your average and standard deviation arrays.
- Use ecolor and capsize to style your error bars.

### Goal:

Your plot should show the average daily inflammation as a smooth curve, with error bars showing day-to-day variation across patients.

Here is the code to start with:

In [None]:
average = np.mean(?)
std = np.std(?)
days = np.arange(?)
plt.plot(?) # Line plot
plt.errorbar(days, ?, ?, color='blue', ecolor='?', capsize=?, label='Average ± SD')
plt.title('Daily Average Inflammation with Standard Deviation')
plt.xlabel('Day')
plt.ylabel('Inflammation')
plt.legend()
plt.show()

## 11. Python Lists
- lists are used to store multiple values together in one place.
- to make a list, just put your values inside square brackets and separate them with commas:

In [None]:
# list of odds 

- access list elements using their index
- list elements are numbered starting from 0 for the first element 

In [None]:
# print the first and the last elements 

#### Negative Indexing (accessing from the end):  

index -1 -> last element.     
index -2 -> second last element. 

#### Lists are mutable

- we can change an element by assigning a new value to an index:

In [None]:
names = ['Curie', 'Darwing', 'Turing']  # typo in Darwin's name
# correct Darwin's name

- We can't change an individual character in a string:

In [None]:
name = 'Darwin'
# try changing one on the characters using a string index

### Mutable and Immutable Objects
- Mutable objects can be changed after they are created while immutable data cannot. 

- Immutable objects: int, float, str, tuple, bool
- Mutable objects: lists, arrays, dictionaries, classes 

You can change individual elements, append new elements, or reorder the list. For example:

In [None]:
mild_salsa = ['peppers', 'onions', 'cilantro', 'tomatoes']
# assign hot_salsa to mild_salsa 
# change hot_salsa
# check values

In the example above, both mild_salsa and hot_salsa point to the same list in memory, so when you change hot_salsa, it also affects mild_salsa.

If you want hot_salsa to be independent of mild_salsa, you need to make a copy of the list:

In [None]:
mild_salsa = ['peppers', 'onions', 'cilantro', 'tomatoes']
# create a copy of mild_salsa
# chahge the copy
# check values

In this case, modifying hot_salsa doesn’t affect mild_salsa because they now refer to different lists.

While modifying data in place is efficient (since it avoids copying large structures), it can make your code harder to follow, especially when multiple variables refer to the same data.

- The list() can convert other iterable objects into lists. For example, you can make a list from a string:

In [None]:
# use list() to construct a list from a string

### 12. Nested Lists

Since a list can contain any Python objects, it can  contain other lists.

For example, you could represent the products on the shelves of a small grocery shop as a nested list called veg:

![Image of veggies](https://swcarpentry.github.io/python-novice-inflammation/fig/04_groceries_veg.png)
To store the contents of the shelf in a nested list, you write it this way:

In [None]:
veg = [
    ['lettuce', 'lettuce', 'peppers', 'zucchini'],
      ]
# add two lists to the parent list veg

Here are some visual examples of how indexing a list of lists veg works. First, you can reference each row on the shelf as a separate list. For example, veg[2] represents the bottom row, which is a list of the baskets in that row.

![Image of veggies indexes](https://swcarpentry.github.io/python-novice-inflammation/fig/04_groceries_veg0.png)
Index operations using the image would work like this:

In [None]:
# Show the first shelf

In [None]:
# Show the last shelf

To reference a specific basket on a specific shelf, you use two indexes. The first index represents the row (from top to bottom) and the second index represents the specific basket (from left to right).

![Image of veggies indexes2](https://swcarpentry.github.io/python-novice-inflammation/fig/04_groceries_veg00.png)

In [None]:
# Show the first basket

In [None]:
# Show another basket

### 13. Heterogeneous Lists
Lists in Python can contain elements of different types. For example:

In [None]:
# create a heterogeneous list sample_ages[]

There are many ways to change the contents of lists besides assigning new values to individual elements:

In [None]:
# Append an element to the list odds[]

In [None]:
# Remove the first element

In [None]:
# Reverse the list

Be careful when you assign one list variable to another (e.g., new_list = old_list)
- you're not creating a new list 
- you're just pointing both variables to the same list. 

As a result, if you modify the list using one of the variables, the changes will be reflected in both, which can be unexpected if you're not familiar with how Python handles this.

In [None]:
odds = [3, 5, 7]
primes = odds
# change primes
# observe primes and odds

To avoid unintended modifications, you can create a copy of the list instead of assigning it directly. This way, changes made to the copied list won't affect the original.

In [None]:
odds = [3, 5, 7]
primes = list(odds)
# change primes
# observe primes and odds

Subsets of lists and strings can be accessed by specifying ranges of values in brackets, similar to how we accessed ranges of positions in a NumPy array. This is commonly referred to as “slicing” the list/string.

In [None]:
date = 'Monday 4 January 2016'
# select day using start:stop

If you want to take a slice from the beginning of a sequence, you can omit the first index in the range:

In [None]:
# select day without specifying the first index

And similarly, you can omit the ending index in the range to take a slice to the very end of the sequence:

In [None]:
months = ['jan', 'feb', 'mar', 'apr', 'may', 'jun', 'jul', 'aug', 'sep', 'oct', 'nov', 'dec']
sond = months[8:12]
# replace fixed las index with expressions that will work for lists of any lengths 

## 14 Loops in Python
- Loops allow you to repeat a block of code multiple times without writing it out again.

Example - accessing all numbers in a list

In [None]:
# create a list odds with 4 elements

- We can access list elements by index:

In [None]:
# print each of the elements

This is a bad approach for three reasons:

1. Not scalable

2. Difficult to maintain

3. Fragile

In [None]:
# try printing element 4 of the list with only 3 elements

- `for` loop iterates over a sequence and executes a block of code for each item in that sequence. 
- works only when you know how many times you want to loop.

In [None]:
odds = [1, 3, 5, 7]
# print elements using a for loop

The improved version uses a for loop to repeat an operation — in this case, printing — once for each thing in a sequence. The general form of a loop is:

In [None]:
for variable in collection:
    # do things using variable

Using the odds example above, the loop might look like this:  

![odd num](https://swcarpentry.github.io/python-novice-inflammation/fig/05-loops_image_num.png)

- Loop variable `num` takes the value of each element in the sequence.
- You can name it anything
- The collection you're looping through (`odds` in this case) is called iterable
- There must be a colon at the end of the `for` statement. It signals the start of the loop body
- Everything indented after : runs in each iteration
- We must indent anything we want to run inside the loop. Python uses indentation instead of braces or a command to signify the end of the loop body (e.g. end for). Everything indented after the for statement belongs to the loop body.

Here’s another loop that repeatedly updates a variable:

In [None]:
length = 0
names = ['Curie', 'Darwin', 'Turing']
for value in names:
   # increment length
# print length

It’s worth tracing the execution of this little program step by step.
1. Since there are three names in names, the statement on incrementing length will be executed three times. 
2. The first time around, length is zero and value of the loop variable is Curie. The statement on line 4 adds 1 to the old value of length and updates length to refer to that new value. 
3. The next time around, value is Darwin and length is 1, so length is updated to be 2.
4. After one more update, length is 3
5. since there is nothing left in names to process, the loop finishes and the print function on line 5 tells us our final answer.

This pattern demonstrates a fundamental programming concept: using loops to aggregate information about a collection. The counter variable (length) accumulates state across iterations, giving us meaningful information about the entire collection.

Note also that finding the length of an object is such a common operation that Python actually has a built-in function to do it called len:

In [None]:
# find length using len()


- len() is much faster than any function we could write ourselves 
- it will also give us the length of many other things 

### Generate a range of numbers from 1 to N
Python has a built-in function called range that generates a sequence of numbers. Range can accept 1, 2, or 3 parameters: [start, stop, step]

- If one parameter is given, range generates a sequence of that length, starting at zero and incrementing by 1. For example, range(3) produces the numbers 0, 1, 2.
- If two parameters are given, range starts at the first and ends just before the second, incrementing by one. For example, range(2, 5) produces 2, 3, 4.
- If range is given 3 parameters, it starts at the first one, ends just before the second one, and increments by the third one. For example, range(3, 10, 2) produces 3, 5, 7, 9.

In [None]:
# use range to print 4 numbers

#### Check your Understanding
Given the following loop:

In [None]:
word = 'oxygen'
for letter in word:
    print(letter)

How many times is the body of the loop executed?
- 3 times
- 4 times
- 5 times
- 6 times



## 15. Analyzing Data from Multiple Files
So far, we have evaluated our data analysis program using a single test file of clinical trial data. Now we are ready to anayze the whole set of 12 clinical trials provided by the PI.

As a final piece to processing our inflammation data, we need a way to get a list of all the files in our data directory whose names start with inflammation- and end with .csv. The following library will help us to achieve this:

In [None]:
import glob

The glob library contains a function, also called glob, that finds files and directories whose names match a pattern. We provide those patterns as strings: the character * matches zero or more characters, while ? matches any one character. We can use this to get the names of all the CSV files in the current directory:

In [None]:
# use glob.glob() to get all inflammation files in the directory data

As these examples show, glob.glob’s result is a list of file and directory paths in arbitrary order. This means we can loop over it to do something with each filename in turn. In our case, the “something” we want to do is generate a set of plots for each file in our inflammation dataset.

If we want to start by analyzing just the first three files in alphabetical order, we can use the sorted() built-in function to generate a new sorted list from the glob.glob output:

In [None]:
# get the list of filenames 
# loop over all the inflammation files  
   # make a figure showing mean, max and min for each of the files (reuse our plotting code)

The plots have some suspicious features not normally found in valid trials:  the maxima plots show unnatural noiseless linear rise and fall; and their minima plots show stepwise features.

The third dataset shows much noisier average and maxima plots that are far less suspicious than the first two datasets, however the minima plot shows that the third dataset minima is consistently zero across every day of the trial.

If we produce a heat map for the third data file we see the following:

In [None]:
# plot inflammation-03.csv

We can see that there are zero values sporadically distributed across all patients and days of the clinical trial, suggesting that there were potential issues with data collection throughout the trial. In addition, we can see that the last patient in the study didn’t have any inflammation flare-ups at all throughout the trial, suggesting that they may not even suffer from arthritis!

After spending some time investigating the heat map and statistical plots we gain some insight into the twelve clinical trial datasets.

The datasets appear to fall into two categories:

- ***seemingly “ideal” datasets*** that agree excellently with Dr. Maverick’s claims, but display suspicious maxima and minima (such as inflammation-01.csv and inflammation-02.csv)
- ***“noisy” datasets*** that somewhat agree with Dr. Maverick’s claims, but show concerning data collection issues such as sporadic missing values and even an unsuitable candidate making it into the clinical trial.

After reviewing these findings, we can conclude that the clinical data has been fabricated by Dr. Maverick based on the identified inconsistencies in the datasets.

Oh well! Lets continue using the data to learn how to program in python! 