# Data handling assessment

In [9]:
from DataGen import *   # generates your data
import matplotlib.pyplot as plt ## this has imported matplotlib's pyplot as plt
plt.rcParams.update({'font.size': 14, 'figure.figsize': [16, 7]}) # sets better defaults for matplotlib graphs

import numpy as np  ## this has imported numpy which you can refer to as np
import scipy.stats as stats ## this has importated SciPy's statistics package as stats, you may find this useful.
import scipy.signal as signal ## imports the signal processing module as signal
import scipy.optimize as fit  ## imports the optimization tool box as fit

import glob  # imports the glob module
import warnings  # prevents printing of some warnings
warnings.filterwarnings('ignore')

### Each student has a unique data set to process. 
> 1) To initailise your unique data set insert your 9 digit student number below as a string i.e. inside the inverted commas and run the cell.

The length of ID should be 9


In [None]:
ID = 'replace this text with your student number, leave the inverted commas but do not leave any spaces' # replace these numbers with you student ID
print(len(ID))

### 1) Loading files
- It is useful to be able to load multiple data sets from a folder.

> 1) Run the cell below to initialize your data sets, they will appear in the 'Data sets' folder where you launched this notebook from.

In [11]:
dataFolder(ID) ## this creates your data files to process

- The data that you want to load is contained in .csv files
- Each file contains a single number

> 2) Make a sorted list of these csv file names.    
>3) Then loop through this list and load each csv file.   
>4) On each iteration of the loop print the data you have loaded.    



### 2) An experimenter has measured the size of muscle cells and wants to summarise their findings for the whole population of cells
- Each data point corresponds to the area of a cell in µm$^2$  

>1) Run the cell below to initialize your data set on cell size

In [None]:
CellSize = getCellSize(ID) # this loads your data for this task in to the numpy array called 'CellSize'

>2)  Plot a histogram of CellSize with 60 bins.  
>3) Then print appropriate summary statistics e.g. mean and SD or median and 25th & 75th percentiles.    
>4) Print some text to justify your choice.  

### 3) An experimenter has made an extracellular recording from a motor nerve and wants to determine the change in the mean firing rate of this nerve during a voluntary contraction.


- The sample rate of the recording was 10kHz
- The data is recorded in µV
- The recording is 10 seconds in length
- The voluntary contraction began at 5 seconds and lasted 5 seconds.


> 1) Run the cell below to generate your nerve data as an array


In [None]:
nerve = getSpike(ID) # this loads your data for this task in to the numpy array called 'Nerve'

>2) In the cell below plot the nerve recording with the correct time scale.    
>3) Detect spikes in the 5 seconds before contraction and calculate the mean spike rate in Hz. Print a description of this information.       
>4) Then determine by how much the mean firing rate of the nerve increased during the voluntary contractaion. Print a description of this information. 

### 4) An experimenter wants to determine whether a new drug is effective in increasing platelet count in patients with thrombocytopenia (low platelet count). 


- A platelet count below 150,000 platelets µl<sup>-1</sup>  indicates a clinical risk.

- The experimenter has administered the new drug to 900 thrombocytopenia patients and measured platelet counts before and after treatment with the drug.

>1) Run the cell below to load your platelet count data


In [None]:
platelets = getPlatelets(ID) # this loads your data for this task in to the numpy array called 'mRNAs'

- The 1st column of "platelets" is the pre-treatment counts and the 2nd contains the data for the same patients post-treatment.
- Each row corresponds to the platelet count per µl for each patient.

>2) Plot a graph showing the histograms of the platelet counts. Use 40 bins for each histogram.    
>3) Determine whether the drug has a significant effect. Print some text justifying your conclusion.
>4) Print some text describing whether the new drug effective at increasing platelet counts to healthy levels?

### 5)  An experimenter is measuring the drug concentration in the blood to estimate its elimation rate.

- Samples were taken every 5 minutes
- Each data point is the concentration of drug in µg l<sup>-1</sup>

>1) Run the cell below to generate your drug measurements

In [None]:
drug = getDrug(ID) # this loads your data for this task in to the numpy array called 'drug'

>2) Plot the data as a scatter plot i.e. points for markers rather than lines between points.  
>3) Define the appropriate equation for an [exponential decay](https://en.wikipedia.org/wiki/Exponential_decay) which describes the clearnce of a drug from the body.  
>4) Fit this equation to your data and plot the best fit.   
> 5) Determine the rate constant of drug clearance and print this with some descriptive text.


### 6) An experimenter has recorded the time of wheel running events and wants to generate a ciradian actogram of this data.

An actogram is a graphical representation showing when activty occured over time

![Example of an Actogram](Example_Actogram.png)


- The experiemnt ran for 35 days
- The time stamps of wheel running were saved to a .csv file with 1 file for each day
> 1) Run the cell below to generate you data which will appear in the 'Circadian data' folder

In [None]:
getCircadian(ID) # this will generate lots of files in the 'Circadian data' folder.

> 2) Generate a graph similar to the example shown above using the data files in the "Circadian data" folder. Hint: use '|' as the markers.  
> 3) Judging from your actogram, on what day was the light cycle altered? print your answer with some descriptive text under your graph

