# Table of Contents
* [Setup](#setup) 5 min
* [Fundamentals](#fundamentals) 10 min, [problem solving](#problem1)
* [Analyzing Data](#data) 25 min, [problem solving](#problem2) + [practice](#practice1)

### Setup <a class="anchor" id="setup"></a>
https://swcarpentry.github.io/python-novice-inflammation/setup.html

### Fundamentals <a class="anchor" id="fundamentals"></a>
https://swcarpentry.github.io/python-novice-inflammation/01-intro/index.html
* Basic data types (3): Integers, strings, floating-point numbers
* Use variable = value to assign a value to a variable 
* Variables are created on demand whenever a value is assigned 
* Use print(something) to display the value of something
* Built-in functions are always available to use

In [1]:
# Assign a value to a variable
## A variable is analogous to a sticky note with a name written on it
## Assigning a value to a variable is like putting that sticky note on a particular value
## A value to one variable does't change values of other seemingly related variables

weight_kg = 60
weight_kg

60

In [2]:
# Types of Data

## floating point numbers
print(type(60.3))

## integer numbers
print(weight_kg, type(weight_kg))

## strings
patient_id = '001'
print(patient_id, type(patient_id))

<class 'float'>
60 <class 'int'>
001 <class 'str'>


In [3]:
# Calculations
weight_lb = 2.2 * weight_kg
print("Weight in lbs:", weight_lb)

patient_id = 'inflam_' + patient_id
print(patient_id)

Weight in lbs: 132.0
inflam_001


#### Problem solving  <a class="anchor" id="problem1"></a>

In [5]:
## What values do the variables mass and age have after each of the following
mass = 47.5
age = 122

mass = mass * 2.0
print(mass)

age = age - 20
print(age)

95.0
102


In [6]:
## Python allows you to assign multiple values to multiple variables in one line by separating the variables and values with commas
## What does the following program print out?

first, second = 'Grace', 'Hopper'
third, fourth = second, first
print(third, fourth)

Hopper Grace


In [7]:
## What are the data types of the following variables?

planet = 'Earth'
print(type(planet))

apples = 5
print(type(apples))

distance = 10.5
print(type(distance))

<class 'str'>
<class 'int'>
<class 'float'>


### Analyzing Patient Data <a class="anchor" id="data"></a>
https://swcarpentry.github.io/python-novice-inflammation/02-numpy/index.html

In [10]:
import numpy # Numerical Python (library) https://numpy.org/doc/stable/

In [3]:
# Ask library to read file (array)
numpy.loadtxt(fname='data/inflammation-01.csv', delimiter=',')

array([[0., 0., 1., ..., 3., 0., 0.],
       [0., 1., 2., ..., 1., 0., 1.],
       [0., 1., 1., ..., 2., 1., 1.],
       ...,
       [0., 1., 1., ..., 1., 1., 1.],
       [0., 0., 0., ..., 0., 2., 0.],
       [0., 0., 1., ..., 1., 1., 0.]])

In [4]:
# Assign data output to 'data' variable
data = numpy.loadtxt(fname='data/inflammation-01.csv', delimiter=',')

In [6]:
print(data)

[[0. 0. 1. ... 3. 0. 0.]
 [0. 1. 2. ... 1. 0. 1.]
 [0. 1. 1. ... 2. 1. 1.]
 ...
 [0. 1. 1. ... 1. 1. 1.]
 [0. 0. 0. ... 0. 2. 0.]
 [0. 0. 1. ... 1. 1. 0.]]


In [7]:
print(type(data))

<class 'numpy.ndarray'>


In [8]:
# array contains one or more elements of the same type
## tells us that the NumPy array’s elements are floating-point numbers

print(data.dtype)

float64


In [9]:
# tells us that the data array variable contains 60 rows and 40 columns

print(data.shape)

(60, 40)


In [10]:
print('first value in data:', data[0, 0])

first value in data: 0.0


In [11]:
print('middle value in data:', data[30, 20])

# Programming languages like Fortran, MATLAB and R start counting at 1 because that’s what human beings have done 
# Languages in the C family (C++, Java, Perl, and Python) count from 0 as an offset from the first value in the array

middle value in data: 13.0


In [12]:
print(data[0:4, 0:10])

[[0. 0. 1. 3. 1. 2. 4. 7. 8. 3.]
 [0. 1. 2. 1. 2. 1. 3. 2. 2. 6.]
 [0. 1. 1. 3. 3. 2. 6. 2. 5. 9.]
 [0. 0. 2. 0. 4. 2. 2. 1. 6. 7.]]


In [13]:
print(data[5:10, 0:10])

[[0. 0. 1. 2. 2. 4. 2. 1. 6. 4.]
 [0. 0. 2. 2. 4. 2. 2. 5. 5. 8.]
 [0. 0. 1. 2. 3. 1. 2. 3. 5. 3.]
 [0. 0. 0. 3. 1. 5. 6. 5. 5. 8.]
 [0. 1. 1. 2. 1. 3. 5. 3. 5. 8.]]


In [14]:
small = data[:3, 36:]
print('small is:')
print(small)

small is:
[[2. 3. 0. 0.]
 [1. 1. 0. 1.]
 [2. 2. 1. 1.]]


In [15]:
print(numpy.mean(data)) # mean value

6.14875


In [16]:
import time
print(time.ctime())

Mon Jun 13 22:19:58 2022


In [17]:
maxval, minval, stdval = numpy.max(data), numpy.min(data), numpy.std(data)

print('maximum inflammation:', maxval)
print('minimum inflammation:', minval)
print('standard deviation:', stdval)

maximum inflammation: 20.0
minimum inflammation: 0.0
standard deviation: 4.613833197118566


In [18]:
patient_0 = data[0, :] # 0 on the first axis (rows), everything on the second (columns)
print('maximum inflammation for patient 0:', numpy.max(patient_0))

maximum inflammation for patient 0: 18.0


In [19]:
print('maximum inflammation for patient 2:', numpy.max(data[2, :]))

maximum inflammation for patient 2: 19.0


In [20]:
# ask for the average across axis 0 (rows in our 2D example)
print(numpy.mean(data, axis=0))

[ 0.          0.45        1.11666667  1.75        2.43333333  3.15
  3.8         3.88333333  5.23333333  5.51666667  5.95        5.9
  8.35        7.73333333  8.36666667  9.5         9.58333333 10.63333333
 11.56666667 12.35       13.25       11.96666667 11.03333333 10.16666667
 10.          8.66666667  9.15        7.25        7.33333333  6.58333333
  6.06666667  5.95        5.11666667  3.6         3.3         3.56666667
  2.48333333  1.5         1.13333333  0.56666667]


In [21]:
print(numpy.mean(data, axis=0).shape) # this is the average inflammation per day for all patients

(40,)


In [22]:
# if we average across axis 1 (columns in our 2D example), we get
print(numpy.mean(data, axis=1))

[5.45  5.425 6.1   5.9   5.55  6.225 5.975 6.65  6.625 6.525 6.775 5.8
 6.225 5.75  5.225 6.3   6.55  5.7   5.85  6.55  5.775 5.825 6.175 6.1
 5.8   6.425 6.05  6.025 6.175 6.55  6.175 6.35  6.725 6.125 7.075 5.725
 5.925 6.15  6.075 5.75  5.975 5.725 6.3   5.9   6.75  5.925 7.225 6.15
 5.95  6.275 5.7   6.1   6.825 5.975 6.725 5.7   6.25  6.4   7.05  5.9  ]


In [23]:
# A section of an array is called a slice
## We can take slices of character strings as well

element = 'oxygen'
print('first three characters:', element[0:3])
print('last three characters:', element[3:6])

first three characters: oxy
last three characters: gen


#### Problem solving  <a class="anchor" id="problem2"></a>

In [24]:
## What is the value of element[:4]? What about element[4:]? Or element[:]?
print('element[:4]', element[:4])
print('element[4:]', element[4:])
print('element[:]', element[:])

element[:4] oxyg
element[4:] en
element[:] oxygen


In [25]:
## What is element[-1]? What is element[-2]?
print('element[-1]', element[-1])
print('element[-2]', element[-2])

element[-1] n
element[-2] e


In [26]:
## Given those answers, explain what element[1:-1] does
print('element[1:-1]', element[1:-1])

element[1:-1] xyge


In [34]:
## How can we rewrite the slice for getting the last three characters of element,
## so that it works even if we assign a different string to element? 
## Test your solution with the following strings: carpentry, clone, hi

element = 'carpentry'
print('element[-3]', element[-3:])

element = 'clone'
print('element[-3]', element[-3:])

element = 'hi'
print('element[-3]', element[-3:])

element[-3] try
element[-3] one
element[-3] hi


In [19]:
## The expression element[3:3] produces an empty string, i.e., a string that contains no characters
## If data holds our array of patient data, what does data[3:3, 4:4] produce? 
## What about data[3:3, :]?

print(data[3:3, 4:4])
print(data[3:3, :])

print("shape", data[3:3, 4:4].shape)
print("shape", data[3:3, :].shape)

print("type", data[3:3, 4:4].dtype)
print("type", data[3:3, :].dtype)

[]
[]
shape (0, 0)
shape (0, 40)
type float64
type float64


In [20]:
## Arrays can be concatenated and stacked on top of one another
## using NumPy’s vstack and hstack functions for vertical and horizontal stacking

A = numpy.array([[1,2,3], [4,5,6], [7, 8, 9]])
print('A = ')
print(A)

B = numpy.hstack([A, A])
print('B = ')
print(B)

C = numpy.vstack([A, A])
print('C = ')
print(C)

A = 
[[1 2 3]
 [4 5 6]
 [7 8 9]]
B = 
[[1 2 3 1 2 3]
 [4 5 6 4 5 6]
 [7 8 9 7 8 9]]
C = 
[[1 2 3]
 [4 5 6]
 [7 8 9]
 [1 2 3]
 [4 5 6]
 [7 8 9]]


#### Practice <a class="anchor" id="practice1"></a>

Write some additional code that slices the first and last columns of A
and stack them into a 3x2 array

In [33]:
## Practice here

##
##
##
##
##

In [34]:
## Solution

A = numpy.array([[1,2,3], [4,5,6], [7, 8, 9]])
print('A = ')
print(A)

D1 = numpy.hstack((A[:, :1], A[:, -1:]))
print('D1 = ')
print(D1)

D2 = numpy.delete(A, 1, 1)
print('D2 = ')
print(D2)

A = 
[[1 2 3]
 [4 5 6]
 [7 8 9]]
D1 = 
[[1 3]
 [4 6]
 [7 9]]
D2 = 
[[1 3]
 [4 6]
 [7 9]]


In [35]:
## The patient data is longitudinal in that each row represents 
## a series of observations relating to one individual
## This means that the change in inflammation over time is meaningful 

## Calculate changes in the data contained in an array with NumPy
## The numpy.diff() function takes an array and 
## returns the differences between two successive values

## use it to examine the changes each day across 
## the first week of patient 3 from our inflammation dataset

patient3_week1 = data[3, :7]
print(patient3_week1)

[0. 0. 2. 0. 4. 2. 2.]


In [36]:
## returns the 6 difference values in a new array
## the array of differences is shorter by one element (length 6)

numpy.diff(patient3_week1)

array([ 0.,  2., -2.,  4., -2.,  0.])

In [38]:
## When calling numpy.diff with a multi-dimensional array
## an axis argument may be passed to the function to 
## specify which axis to process

## When applying numpy.diff to our 2D inflammation array data
## which axis would we specify?
numpy.diff(data, axis=1)

array([[ 0.,  1.,  2., ...,  1., -3.,  0.],
       [ 1.,  1., -1., ...,  0., -1.,  1.],
       [ 1.,  0.,  2., ...,  0., -1.,  0.],
       ...,
       [ 1.,  0.,  0., ..., -1.,  0.,  0.],
       [ 0.,  0.,  1., ..., -2.,  2., -2.],
       [ 0.,  1., -1., ..., -2.,  0., -1.]])

In [None]:
## If the shape of an individual data file is (60, 40) (60 rows, 40 cols)
## what would the shape of the array be after you run the diff() 

## 60, 39 
## because there is one fewer difference between columns than 
## there are columns in the data

In [None]:
## How would you find the largest inflammation change per patient? 
## Does it matter if the change in inflammation is an increase or decrease?

