# Analyzing Patient Data

## Overview

### Questions

- How can I process tabular data files in Python?

### Objectives

- Explain what a library is and what libraries are used for.
- Import a Python library and use the functions it contains.
- Read tabular data from a file into a program.
- Select individual values and subsections from data.
- Perform operations on arrays of data.

## Content

### Loading data into Python

In [1]:
import numpy

In [5]:
data = numpy.loadtxt(fname = 'inflammation-01.csv', delimiter = ',')

In [9]:
print(data)

[[0. 0. 1. ... 3. 0. 0.]
 [0. 1. 2. ... 1. 0. 1.]
 [0. 1. 1. ... 2. 1. 1.]
 ...
 [0. 1. 1. ... 1. 1. 1.]
 [0. 0. 0. ... 0. 2. 0.]
 [0. 0. 1. ... 1. 1. 0.]]


In [7]:
print(type(data))

<class 'numpy.ndarray'>


In [10]:
print(data.dtype)
print(data.shape)

float64
(60, 40)


!["data" is a 3 by 3 numpy array containing row 0: ['A', 'B', 'C'], row 1: ['D', 'E', 'F'], and
row 2: ['G', 'H', 'I']. Starting in the upper left hand corner, data[0, 0] = 'A', data[0, 1] = 'B',
data[0, 2] = 'C', data[1, 0] = 'D', data[1, 1] = 'E', data[1, 2] = 'F', data[2, 0] = 'G',
data[2, 1] = 'H', and data[2, 2] = 'I',
in the bottom right hand corner.](../fig/python-zero-index.svg)

If we want to get a single number from the array, we must provide an index in square brackets after the variable name, just as we do in maths when referring to an element of a matrix. Our inflammation data has two dimensions, so we will need to use two indices to refer to one specific value. In Python, indexing starts at `0`, and we access rows before columns. 

In [12]:
print(data[1,1])

1.0


### Slicing data

In [14]:
print(data[0:4, 0:10])
print(data[4, 0:10])

[[0. 0. 1. 3. 1. 2. 4. 7. 8. 3.]
 [0. 1. 2. 1. 2. 1. 3. 2. 2. 6.]
 [0. 1. 1. 3. 3. 2. 6. 2. 5. 9.]
 [0. 0. 2. 0. 4. 2. 2. 1. 6. 7.]]
[0. 1. 1. 3. 3. 1. 3. 5. 2. 4.]


In [17]:
print(data[5:, :10])

[[0. 0. 1. 2. 2. 4. 2. 1. 6. 4.]
 [0. 0. 2. 2. 4. 2. 2. 5. 5. 8.]
 [0. 0. 1. 2. 3. 1. 2. 3. 5. 3.]
 [0. 0. 0. 3. 1. 5. 6. 5. 5. 8.]
 [0. 1. 1. 2. 1. 3. 5. 3. 5. 8.]
 [0. 1. 0. 0. 4. 3. 3. 5. 5. 4.]
 [0. 1. 0. 0. 3. 4. 2. 7. 8. 5.]
 [0. 0. 2. 1. 4. 3. 6. 4. 6. 7.]
 [0. 0. 0. 0. 1. 3. 1. 6. 6. 5.]
 [0. 1. 2. 1. 1. 1. 4. 1. 5. 2.]
 [0. 1. 1. 0. 1. 2. 4. 3. 6. 4.]
 [0. 0. 0. 0. 2. 3. 6. 5. 7. 4.]
 [0. 0. 0. 1. 2. 1. 4. 3. 6. 7.]
 [0. 0. 2. 1. 2. 5. 4. 2. 7. 8.]
 [0. 1. 2. 0. 1. 4. 3. 2. 2. 7.]
 [0. 1. 1. 3. 1. 4. 4. 1. 8. 2.]
 [0. 0. 2. 3. 2. 3. 2. 6. 3. 8.]
 [0. 0. 0. 3. 4. 5. 1. 7. 7. 8.]
 [0. 1. 1. 1. 1. 3. 3. 2. 6. 3.]
 [0. 1. 1. 1. 2. 3. 5. 3. 6. 3.]
 [0. 0. 2. 1. 3. 3. 2. 7. 4. 4.]
 [0. 0. 1. 2. 4. 2. 2. 3. 5. 7.]
 [0. 0. 1. 1. 1. 5. 1. 5. 2. 2.]
 [0. 0. 2. 2. 3. 4. 6. 3. 7. 6.]
 [0. 0. 0. 1. 4. 4. 6. 3. 8. 6.]
 [0. 1. 1. 0. 3. 2. 4. 6. 8. 6.]
 [0. 0. 2. 3. 3. 4. 5. 3. 6. 7.]
 [0. 1. 2. 2. 2. 3. 6. 6. 6. 7.]
 [0. 0. 2. 1. 3. 5. 6. 7. 5. 8.]
 [0. 0. 1. 2. 4. 1. 5. 5. 2. 3.]
 [0. 0. 0.

#### Check your understanding

We can take slices of character strings as well.

In [18]:
element = 'oxygen'
print('first three characters:', element[0:3])
print('last three characters:', element[3:6])

first three characters: oxy
last three characters: gen


What is the value of `element[:4]`? What about `element[4:]`? Or `element[:]`?

##### Solution

In [19]:
print(element[:4])
print(element[4:])
print(element[:])

oxyg
en
oxygen


What is the value of `element[-1]`? What about `element[-2]`? Or `element[1:-1]`?

##### Solution

In [20]:
print(element[-1])
print(element[-2])
print(element[1:-1])

n
e
xyge


### Analyzing data

In [21]:
print(numpy.mean(data))

6.14875


In [22]:
maxval, minval, stdval = numpy.max(data), numpy.min(data), numpy.std(data)

print("max:", maxval)
print("min:", minval)
print("std:", stdval)

max: 20.0
min: 0.0
std: 4.613833197118566


In [23]:
patient_0 = data[0, :]
print("max for patient 0 is:", numpy.max(patient_0))

max for patient 0 is: 18.0


In [25]:
numpy.max([1, 2])

2

In [26]:
print("max for patient 2 is:", numpy.max(data[2, :]))

max for patient 2 is: 19.0


In [28]:
print(data.shape)
print(numpy.mean(data, axis=0))

(60, 40)
[ 0.          0.45        1.11666667  1.75        2.43333333  3.15
  3.8         3.88333333  5.23333333  5.51666667  5.95        5.9
  8.35        7.73333333  8.36666667  9.5         9.58333333 10.63333333
 11.56666667 12.35       13.25       11.96666667 11.03333333 10.16666667
 10.          8.66666667  9.15        7.25        7.33333333  6.58333333
  6.06666667  5.95        5.11666667  3.6         3.3         3.56666667
  2.48333333  1.5         1.13333333  0.56666667]


In [29]:
print(numpy.max(data, axis=1))

[18. 18. 19. 17. 17. 18. 17. 20. 17. 18. 18. 18. 17. 16. 17. 18. 19. 19.
 17. 19. 19. 16. 17. 15. 17. 17. 18. 17. 20. 17. 16. 19. 15. 15. 19. 17.
 16. 17. 19. 16. 18. 19. 16. 19. 18. 16. 19. 15. 16. 18. 14. 20. 17. 15.
 17. 16. 17. 19. 18. 18.]


### Exercise: Stacking Arrays


In [None]:
# Arrays can be concatenated and stacked on top of one another, 
# using NumPy’s vstack and hstack functions for vertical and horizontal stacking, respectively.

# Write some additional code that slices the first and last columns of A, 
# and stacks them into a 3x2 array. Make sure to print the results to verify your solution.

In [37]:
A = numpy.array([[1,2,3], [4,5,6], [7, 8, 9]])
print('A = ')
print(A)

B = numpy.hstack([A, A])
print('B = ')
print(B)

C = numpy.vstack([A, A])
print('C = ')
print(C)

# D = numpy.array([[1,3], [4,6], [7,9]])
print(A[:, :1])
print(A[:, :-1])
D = numpy.hstack((A[:, :1], A[:, 2]))
print('D = ')
print(D)

A = 
[[1 2 3]
 [4 5 6]
 [7 8 9]]
B = 
[[1 2 3 1 2 3]
 [4 5 6 4 5 6]
 [7 8 9 7 8 9]]
C = 
[[1 2 3]
 [4 5 6]
 [7 8 9]
 [1 2 3]
 [4 5 6]
 [7 8 9]]
[[1]
 [4]
 [7]]
[[1 2]
 [4 5]
 [7 8]]
D = 
[1 4 7 3 6 9]


In [39]:
o = 'oxygen'
o[-0]

'o'

## Key Points

- Import a library into a program using `import libraryname`.
- Use the `numpy` library to work with arrays in Python.
- The expression `array.shape` gives the shape of an array.
- Use `array[x, y]` to select a single element from a 2D array.
- Array indices start at 0, not 1.
- Use `low:high` to specify a `slice` that includes the indices from `low` to `high-1`.
- Use `numpy.mean(array)`, `numpy.max(array)`, and `numpy.min(array)` to calculate simple statistics.
- Use `numpy.mean(array, axis=0)` or `numpy.mean(array, axis=1)` to calculate statistics across the specified axis.
