# Judi's Project
This project is using the publicly available data from the Human Connectome Project (HCP)

Data available here: https://db.humanconnectome.org
Data dictionary available here: https://wiki.humanconnectome.org/display/PublicData/HCP+Data+Dictionary+Public-+500+Subject+Release?src=contextnavpagetreemode

Test whether performance differences in a line orientation task are explained:  
*) gender
*) characteristics of hippocampus (area, volume, surface) 

For this: 
*) extract appropriate data and save in pandas data frame
*) plot distribution of spatial orientation performance
*) create score for left and right hippocampal area for each person
*) normalise hippocampal area for each person
*) plot all possible combinations
*) check assumptions
*) linear regression 

NECESSARY:

    load source data from a file
    plot at least one histogram of the data, with title and labelled axes
    create at least one plot of analysis results, with title and labelled axes
    use at least one numpy array
    use short but descriptive variable names in your code
    document your code: use markdown in your .ipynb and/or directly comment your python code with # or ''' or """

MINIMUM OF SIX:

    use an if-elif-else clause
    use a for loop
    use a while loop
    write at least one function, include a docstring
    print out some results in at least one nicely formatted string, using string operator % or .format() method
    use at least one vectorized math operation on an array
    use at least one matrix operation on a 2D array
    create a figure with multiple axes (i.e., use plt.subplots(nrows, ncols))
    do a statistical test - show that the test assumptions hold for your data
    manipulate and analyze data in a pandas series or dataframe
    use an image processing algorithm
    use a clustering algorithm
    use some other non-trivial algorithm: e.g. regression, curve fitting, signal analysis…
    version control your code using git: create a local repository and make at least 5 commits while developing your code


In [None]:
#Load packages required
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline


In [None]:
#Load data from source file
HCP_data = pd.read_csv('C:/Users/Judi/Documents/Project_HCP/unrestricted_jhuber_6_23_2017_7_25_10.csv')
#HCP_data = pd.read_csv('C:/Users/Judita/HCP_Data/unrestricted_jhuber_7_24_2017_4_28_55.csv')

HCP_data.shape

In [None]:
gender_grouping = HCP_data.groupby('Gender')
gender_grouping.head()

In [None]:
#plot at least one histogram of the data, with title and labelled axes
gender_hist = gender_grouping.IWRD_TOT.hist(bins=10)
#plt.Axes.set_xlabel('label')
plt.xlabel("IWRD")
plt.ylabel("frequency")
plt.title("FIRST PLOT")
plt.legend(())

#PLOT SEVERAL TESTS

In [None]:
HCP_data.columns

In [None]:
# select hippocampal & temporal lobe information

#select appropriate columns
filter_col = [col for col in HCP_data.columns 
            if col.startswith('FS_L_Parahippocampal') | 
            col.startswith('FS_R_Parahippocampal') |
            col.startswith('FS_L_Hipp') | 
            col.startswith('FS_R_Hipp') |
            col.startswith('Age') |
            col.startswith('IWRD_TOT') |
            col.startswith('VSPLOT_TC') |
            col.startswith('Gender')] 
filter_col



In [None]:
HCP_data_f = HCP_data[filter_col] #use filter in data frame to select specified columns
HCP_data_f = HCP_data_f[~HCP_data_f.FS_L_Hippo_Vol.isnull()]
HCP_data_f = HCP_data_f[~HCP_data_f.IWRD_TOT.isnull()]
#new_FS = HCP_data_f[~HCP_data_f.FS_HPCvol_sum_norm.isnull()]
HCP_data_f.shape #check that correct shape 
#HCP_data_f_m.shape #check that correct shape 


In [None]:
HCP_data_f

In [None]:
#create composite scores as well (matrix operations!) 

#normalise volume by supratentorial volume 
HCP_data_f['FS_L_Hippo_Vol_norm'] = HCP_data_f.FS_L_Hippo_Vol / HCP_data.FS_SupraTentorial_Vol *100
HCP_data_f['FS_R_Hippo_Vol_norm'] = HCP_data_f.FS_R_Hippo_Vol / HCP_data.FS_SupraTentorial_Vol *100
HCP_data_f['FS_HPCvol_sum'] = HCP_data_f.FS_R_Hippo_Vol + HCP_data_f.FS_L_Hippo_Vol *100

HCP_data_f['FS_HPCvol_sum_norm'] = HCP_data_f.FS_HPCvol_sum / HCP_data.FS_SupraTentorial_Vol *100
HCP_data_f.head() 

In [None]:
HCP_data_f.plot.scatter(x='VSPLOT_TC', y='FS_R_Hippo_Vol')


In [None]:
#create at least one plot of analysis results, with title and labelled axes

In [None]:
sns.set(style="ticks")

sns.pairplot(HCP_data_f,  dropna=True, hue = "Age")

In [None]:
# write own function

In [None]:
#use at least one numpy array

    use an if-elif-else clause 
    use a for loop X
    use a while loop
    write at least one function, include a docstring
    print out some results in at least one nicely formatted string, using string operator % or .format() method
    use at least one vectorized math operation on an array 
    use at least one matrix operation on a 2D array X
    create a figure with multiple axes (i.e., use plt.subplots(nrows, ncols)) X
    do a statistical test - show that the test assumptions hold for your data TO DO
    manipulate and analyze data in a pandas series or dataframe X
    use an image processing algorithm
    use a clustering algorithm
    use some other non-trivial algorithm: e.g. regression, curve fitting, signal analysis… X 
    version control your code using git: create a local repository and make at least 5 commits while developing your code X


In [None]:
# select which variable 