# Analysis of Subject Data

By [Serena Bonaretti](https://sbonaretti.github.io/), 2022  
Content under Creative Commons license CC-BY-NC-SA 4.0   
Code under GNU-GPL v3 License  

---

- The aim of this notebook is to calculate descriptive statistics for a group of subjects from tabular data  
- This notebook can be attached to the *Material* paragraph of your paper

- *Structure of the subject dataframe (i.e. table)*  
Each *row* corresponds to a *subject*, and each *column* corresponds to a *characteristic*. E.g.:

| subject_id |      age      |  gender | anatomy | laterality
|:----------:|:-------------:|:-------:| :------:| :---------:
| ID_01      | 66            | f       | hip     | r
| ID_02      | 53            | m       | ankle   | l

**What the notebook does:**
- Reads a table containing data about subjects, such as gender, age, anatomy, anatomy laterality. Data are organized in a tabular file, which can be in `.csv` (open file format) or `.xlsx` (proprietary file format)  
- Gets:
  - Number of subject
  - Average, standard deviation, max, and min age 
  - Number of subjects per gender
  - Anatomies
  - Number of anatomies per laterality (left/right)  
- Prints out dependencies for reproducibility

To read and query the data, it uses the python package `pandas`  

---

Imports:

In [1]:
import pandas as pd

---
## 1. Load the data

- Load the tabular data in the file *subects_template.csv* using the pandas function `read_csv()`:

In [2]:
df = pd.read_csv("./material/subjects_template.csv")

- Print out the table. Please note that the index starts from 0:

In [3]:
df

Unnamed: 0,subject_id,age,gender,anatomy,laterality
0,ID_01,66,f,hip,r
1,ID_02,53,m,ankle,l
2,ID_03,50,f,hip,r
3,ID_04,52,f,hip,l
4,ID_05,73,m,hip,l
5,ID_06,67,m,ankle,r
6,ID_07,71,m,hip,r
7,ID_08,56,f,ankle,l
8,ID_09,60,m,hip,l
9,ID_10,78,f,hip,r


- If the table is too long, you can:
    - Make the table scollable by right-clicking on the table, and then `Enable scolling to output`, or
    - Show only the first five rows with the command: `df.head()`


In [4]:
df.head()

Unnamed: 0,subject_id,age,gender,anatomy,laterality
0,ID_01,66,f,hip,r
1,ID_02,53,m,ankle,l
2,ID_03,50,f,hip,r
3,ID_04,52,f,hip,l
4,ID_05,73,m,hip,l


## 2. Get number of subjects
- The number of subjects coincides with the number of rows:

In [5]:
df.shape[0]

30

## 3. Age: Get average, standard deviation, max, and min 
- Calculate average, standard deviation, max, and min for the values in the column `age`)

In [6]:
print (df["age"].min())
print (df["age"].max())
print (round(df["age"].mean(),2))
print (round(df["age"].std(),2))

50
79
64.57
8.85


## 4. Gender: Get number of female and male
- Count the unique values in the column `gender` using `value_counts()`:

In [7]:
df["gender"].value_counts()

f    15
m    15
Name: gender, dtype: int64

## 5. Anatomy: Get organ 
- Count the unique values in the column `anatomy`:

In [8]:
df["anatomy"].value_counts()

hip      22
ankle     8
Name: anatomy, dtype: int64

## 6. Laterality: Get number of right and left
- Count the unique values in the column `laterality`:

In [9]:
df["laterality"].value_counts()

r    15
l    15
Name: laterality, dtype: int64

--- 
## Dependencies

In [10]:
%load_ext watermark
%watermark
%watermark --iversions

Last updated: 2022-06-23T17:56:57.338154+02:00

Python implementation: CPython
Python version       : 3.9.7
IPython version      : 7.29.0

Compiler    : Clang 10.0.0 
OS          : Darwin
Release     : 21.1.0
Machine     : x86_64
Processor   : i386
CPU cores   : 8
Architecture: 64bit

pandas: 1.3.4

