# Jupyter Notebook Tutorial
Coding Outreach Group Summer Workshop 6/5/2020

## Setting your working directory
This isn't necessary today if this Notebook is in the same folder/directory as the data.csv. But if you need to work with data from multiple directories and don't want a separate Notebook for each directory, this is useful. 

In [None]:
# Mac users can use command line within the notebooks, just make sure the command is in its own cell
# Otherwise, the notebook will treat the cell as a Python code 
pwd

In [None]:
pwd

In [None]:
ls

In [None]:
# Windows users can import os to change the working directory
import os
os.chdir(r"C:\Users\youruser\folder\folder\etc")
# Mac users can also use os.chdir in this format: os.chdir('/Users/KimNguyen/Desktop/Jupyter_Workshop')

# get current working directory
os.getcwd()

In [None]:
# list everything in current directory
os.listdir()

## Introduction to Pandas

In [None]:
# Import the pandas and numpy modules
import pandas as pd
import numpy as np


Importing and viewing your data

In [None]:
#reading in your data file
data = "data.csv"

#convert the data file from a csv to a pandas dataframe
DF = pd.read_csv(data, header = 0, index_col = 0)


In [None]:
#Some ways you can view your data to check that everything was imported and converted correctly

#prints the first 5 rows of the dataframe, you can also insert a specific number of rows in the parentheses DF.head(2)
DF.head()


In [None]:
# How will the dataframe change if we ran this line instead: DF = pd.read_csv(data, header = 1, index_col = 1)?
# Test it out here!
DF = pd.read_csv(data, header = 1, index_col = 1)
DF.head()


In [None]:
# But remember to reset to the correct dataframe format
DF = pd.read_csv(data, header = 0, index_col = 0)


In [None]:
#gives the dimensions of your dataframe (#rows, #columns)
DF.shape


In [None]:
#gives the types of data of the columns
DF.dtypes


In [None]:
#if you put all three previous lines together, you have to use print() for each, otherwise, the notebook will only print the last line.
print(DF.head())
print(DF.shape)
print(DF.dtypes)


Selecting specific data

In [None]:
# We can cut out a single column like this
# What's really returned here is a pandas series
#DF['Age']

#Or multiple columns by inserting a list of columns ['Age', 'Sex']
# What's returned in this case is a DataFrame
DF[['Age', 'Sex']]

#If you want to save any selection as it's own list or dataframe 
#DF2 = DF[['Age', 'Sex']]


In [None]:
# We can also select specific rows using .loc. 
# For this dataframe, this will give us specific participant numbered rows
DF.loc[1]


In [None]:
# We can specify row position and column position by passing in two arguments
#DF.loc[1, 'Age']

# Or multiple row and columns!
DF.loc[[1,2], ['Age', 'Sex']]


In [None]:
# Getting stats of specific columns
DF['Age'].describe()

# Getting stats of all columns, this will help later on when you graph the data
DF.describe()


In [None]:
# If you make changes to your dataframe or create a new dataframe, you can save it as a csv/excel/text file
DF.to_csv('new.csv', index=False)
# DF.to_excel('new.xls', index=False)
# DF.to_csv('new.txt', index=False)

More on Pandas: https://pandas.pydata.org/pandas-docs/version/0.15/tutorials.html

## Data Visualization

In [None]:
# Import seaborn and matplotlib modules for data visualization 
# We can combine these two
import seaborn as sns
import matplotlib.pyplot as plt 


More on matplotlib: https://matplotlib.org/gallery/index.html
More on seaborn: https://seaborn.pydata.org/

Colormap values: Accent, Accent_r, Blues, Blues_r, BrBG, BrBG_r, BuGn, BuGn_r, BuPu, BuPu_r, CMRmap, CMRmap_r, Dark2, Dark2_r, GnBu, GnBu_r, Greens, Greens_r, Greys, Greys_r, OrRd, OrRd_r, Oranges, Oranges_r, PRGn, PRGn_r, Paired, Paired_r, Pastel1, Pastel1_r, Pastel2, Pastel2_r, PiYG, PiYG_r, PuBu, PuBuGn, PuBuGn_r, PuBu_r, PuOr, PuOr_r, PuRd, PuRd_r, Purples, Purples_r, RdBu, RdBu_r, RdGy, RdGy_r, RdPu, RdPu_r, RdYlBu, RdYlBu_r, RdYlGn, RdYlGn_r, Reds, Reds_r, Set1, Set1_r, Set2, Set2_r, Set3, Set3_r, Spectral, Spectral_r, Wistia, Wistia_r, YlGn, YlGnBu, YlGnBu_r, YlGn_r, YlOrBr, YlOrBr_r, YlOrRd, YlOrRd_r, afmhot, afmhot_r, autumn, autumn_r, binary, binary_r, bone, bone_r, brg, brg_r, bwr, bwr_r, cividis, cividis_r, cool, cool_r, coolwarm, coolwarm_r, copper, copper_r, cubehelix, cubehelix_r, flag, flag_r, gist_earth, gist_earth_r, gist_gray, gist_gray_r, gist_heat, gist_heat_r, gist_ncar, gist_ncar_r, gist_rainbow, gist_rainbow_r, gist_stern, gist_stern_r, gist_yarg, gist_yarg_r, gnuplot, gnuplot2, gnuplot2_r, gnuplot_r, gray, gray_r, hot, hot_r, hsv, hsv_r, icefire, icefire_r, inferno, inferno_r, jet, jet_r, magma, magma_r, mako, mako_r, nipy_spectral, nipy_spectral_r, ocean, ocean_r, pink, pink_r, plasma, plasma_r, prism, prism_r, rainbow, rainbow_r, rocket, rocket_r, seismic, seismic_r, spring, spring_r, summer, summer_r, tab10, tab10_r, tab20, tab20_r, tab20b, tab20b_r, tab20c, tab20c_r, terrain, terrain_r, viridis, viridis_r, vlag, vlag_r, winter, winter_r

In [None]:
# Set seaborn context. 
# This affects things like the size of the labels, lines, and other elements of the plot, but not the overall style. 
# The base context is “notebook”, and the other contexts are “paper”, “talk”, and “poster”
sns.set_context("paper", font_scale = 1.5)


More about set context: https://seaborn.pydata.org/generated/seaborn.set_context.html

## Scatterplots

In [None]:
# A good ole vanilla scatterplot
plt.gcf().subplots_adjust(bottom=0.15) #adds room to the x-axis label to not cut off the text
plot1 = sns.scatterplot(x="Age",y="Word Count", palette = 'Set2', data=DF)
#saving the graph to your current directory, the higher the dpi value, the longer it'll take for the cell to run 
plt.savefig("plot1.png",dpi=100)


In [None]:
# You can also plot data by groups using hue = ""
plot2 = sns.scatterplot(x="Word Count",y="Composite (z)", palette = 'Set2', data=DF, hue = "Age Bins")
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.) #moves the legend outside the plot
# The saved png will cutoff anything outside the plot (legend), so it might be better to take a screenshot. I've yet to learn how to deal with this.
plt.savefig("plot2.png",dpi=100)


In [None]:
# Scatterplots by two grouping dimensions (Age Bins and Sex)
plot3 = sns.FacetGrid(DF, col="Age Bins", hue="Sex", palette = "Set2")
plot3.map(plt.scatter, "Word Count", "Composite (z)", alpha=1)
plot3.add_legend();
plot3.savefig("plot3.png",dpi=50)


In [None]:
# Scatterplots with bivariate distributions
plot4 = sns.jointplot(x="Word Count", y="Composite (z)", data=DF, kind="reg")
plt.savefig("plot4.png", dpi=100)

# Hexbin plot: A really cool plot that also includes distributions, but it works best with large datasets
with sns.axes_style("white"):
    plot5= sns.jointplot(x= "Word Count", y= "Composite (z)", kind="hex", color="Pink", data=DF)
plt.savefig("plot5.png", dpi=100)


More about scatterplot: https://seaborn.pydata.org/generated/seaborn.scatterplot.html

## Regression Plots

In [None]:
# Bivariate regression plot
# x and y are data variables, palette is your color scheme, data is your dataset (usually a dataframe), and aspect is the multiplier for the x-axis length
plot6 = sns.lmplot(x="Age", y="Word Count", palette = 'Set2', data=DF, aspect=1.5) 

# Optional: setting specific axes limits. 
# You can check your .describe() output from earlier to see the min and max of your variables
axes = plot6.axes 
axes[0,0].set_ylim(0,200) #set graph y-axis limits
axes[0,0].set_xlim(5,25) #set graph x-axis limits
plt.savefig("plot6.png",dpi=100) 


In [None]:
# Regression plot by two groups (Age Bins and Sex)
plot7 = sns.lmplot(x="Word Count", y="Composite (z)", palette = 'Set2', data=DF, aspect=1.5, hue = "Age Bins", col= "Sex") 
axes = plot7.axes 
axes[0,0].set_ylim(-2,2)
axes[0,0].set_xlim(0,200)
plt.savefig("plot7.png",dpi=100)


More about lm plot: https://seaborn.pydata.org/generated/seaborn.lmplot.html

## Bar Plots

In [None]:
# Regular bar plot with capped 95% confidence interval bars
plot8 = sns.barplot(x="Age Bins", y="Word Count", data=DF, palette= "Set2",ci=95, capsize= .15)
plt.savefig("plot8.png",dpi=100)


In [None]:
#Barplot with individual datapoints
plot9 = sns.catplot(x="Age Bins", y="Word Count", data=DF, palette="Pastel2")
plot9.map(sns.barplot,x="Age Bins", y="Word Count", data=DF, palette= "Set2",ci=95, capsize=.15)
plot9.set_axis_labels("Age Bins", "Word Count")
plt.savefig("plot9.png",dpi=100)


In [None]:
#Bar plots across two grouping dimensions
with sns.color_palette("Set2"):
    plot10 = sns.FacetGrid(DF, col="Sex", height=10, aspect=1.5)
    plot10.map(sns.barplot, "Age Bins", "Word Count", ci=95);
plt.savefig("plot10.png",dpi=50)


More on bar plots: https://seaborn.pydata.org/generated/seaborn.barplot.html

More on multi-plot grids: https://seaborn.pydata.org/tutorial/axis_grids.html

## Other Cool Plots

In [None]:
#Violin plots
plot11 = sns.violinplot(x="Age Bins", y="Word Count", data=DF, palette="Set2", hue="Sex")
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)
plt.savefig("plot11.png", dpi=100) #again, this will cutoff the legend that's outside the plot


More about violin plots: https://seaborn.pydata.org/generated/seaborn.violinplot.html

In [None]:
# Heatmaps, very useful for RSA or correlation comparison visualization
fake_data = np.random.rand(51, 51)
plot12 = sns.heatmap(fake_data, cmap="Blues")
plt.savefig("plot12.png", dpi=100)


More on heatmaps: https://seaborn.pydata.org/generated/seaborn.heatmap.html
More on clustermaps: https://seaborn.pydata.org/generated/seaborn.clustermap.html#seaborn.clustermap