Using Numpy, Scipy, Pandas, and other common libraries.

# NumPy
[numpy](https://numpy.org) (**Num**erical **Py**thon) is a library for working with arrays and doing math.
For more information on what it can do, check out its [reference manual](https://docs.scipy.org/doc/numpy/reference/index.html).

In [1]:
import numpy as np

In [4]:
arr = np.array([[1, 2, 3], [4, 5, 6]])
print(arr)

[[1 2 3]
 [4 5 6]]


# SciPy
[scipy](https://scipy.org/scipylib/index.html) (**Sci**entific **Py**thon) is a library for scientific computing. For more information on what it can do, check out its [reference manual](https://docs.scipy.org/doc/scipy/reference/).

In [36]:
from scipy import stats

In [37]:
arr = np.random.randint(0, 10, size=10)
print(arr)

arr_z = stats.zscore(arr)
print(arr_z)

[3 9 8 8 2 5 9 8 8 2]
[-1.16382875  1.01835015  0.65465367  0.65465367 -1.52752523 -0.43643578
  1.01835015  0.65465367  0.65465367 -1.52752523]


# Pandas
[pandas](https://pandas.pydata.org/pandas-docs/stable/index.html) is how we work with tabular data (e.g., csv files).

In [17]:
import pandas as pd

You can load spreadsheets and operate on them.

In [27]:
df = pd.read_csv('../data/spreadsheet.tsv', sep='\t', index_col='subject')
df.head()

Unnamed: 0_level_0,age,height,weight
subject,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
sub-01,26.468044,70.899705,185.094866
sub-02,34.814886,69.466607,156.702993
sub-03,22.101511,67.156297,169.517613
sub-04,58.034693,65.289444,175.863245
sub-05,51.040495,65.591512,186.114354


You can easily search across or manipulate them.

In [31]:
df_distinguished = df[df['age'] > 50]
df_distinguished.head()

Unnamed: 0_level_0,age,height,weight
subject,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
sub-04,58.034693,65.289444,175.863245
sub-05,51.040495,65.591512,186.114354
sub-12,50.980788,69.104519,188.953467
sub-14,53.805803,60.265333,169.292895
sub-17,64.132667,59.882165,183.114347


In [34]:
df_agesorted = df.sort_values(by='age')
df_agesorted.head()

Unnamed: 0_level_0,age,height,weight
subject,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
sub-06,21.02479,67.406173,197.256212
sub-03,22.101511,67.156297,169.517613
sub-08,24.92806,68.526091,137.744372
sub-01,26.468044,70.899705,185.094866
sub-18,30.149441,63.471716,136.122029


Plus there are a lot of methods for summarizing the data.

In [28]:
df.mean(axis=0)

age        41.478407
height     64.748765
weight    167.989704
dtype: float64

In [35]:
df.mean(axis=1).head()

subject
sub-01     94.154205
sub-02     86.994829
sub-03     86.258474
sub-04     99.729127
sub-05    100.915453
dtype: float64

In [22]:
df.describe()

Unnamed: 0,age,height,weight
count,20.0,20.0,20.0
mean,45.680612,65.630476,176.48913
std,13.466405,4.872738,20.006082
min,22.7059,55.643615,142.222583
25%,34.928344,62.748147,166.648853
50%,46.219516,66.438489,176.499587
75%,58.354779,68.553542,183.749243
max,64.020537,76.184899,221.132238
