<a href="https://colab.research.google.com/github/hallockh/neur_265/blob/main/notebooks/Plotting_02_26_28.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Plotting

This notebook will build on our previous Cell Types notebook to help us plot the data that we've pulled from the Allen Cell Types database.

### By the end of this notebook, you'll be able to:

* Create plots using `matplotlib.pyplot`
* Manipulate aspects of plots
* Create bar, box, and scatter plots from the Allen Cell Types metrics


## Step One: Get comfortable with our plotting tools

First, let's get set up for plotting by importing the necessary tool boxes.

In [None]:
# Tell Jupyter to plot our plots inline
%matplotlib inline

# Import matplotlib and "pyplot" module
# plt is the common abbreviation for matplotlib's pyplot module
import matplotlib as mpl
import matplotlib.pyplot as plt

First, let's create a random line using our favorite scientific computing toolbox, and show how we can use the `matplotlib.pyplot` module to plot it.

Useful functions:

* `plt.plot()` create a plot from a list, array, pandas series, etc.
* `plt.show()` show the plot (not strictly necessary in Jupyter, necessary in other IDEs)
* `plt.xlabel()` and `plt.ylabel()` change x and y labels
* `plt.title()` add a title

In [None]:
import numpy as np

# Generate a random line from 1 to 100 with 100 values
random_line = np.random.randint(1,100,100)
random_line

random_line_2 = random_line + 100

plt.plot(random_line)
plt.plot(random_line_2)
plt.show()

In [None]:
# Create a scatter plot to show the relationship between the two lines

plt.scatter(random_line,random_line_2)
plt.show()


The `plt.hist()` function works really similarly.

><b>Task:</b> In the cell below:
1. Generate a random list of 100 data points from a standard normal distribution (Hint: Use <code>np.random.standard_normal()</code>)
2. Plot a histogram of the data.

In [None]:
# Your code here


We can also set up multiple subplots on the same figure using `subplots`. This also creates separate **axes** (really, separate plots) which we can access and manipulate, particularly if you are plotting multiple lines. It's common to use the `subplots` command for easier access to axis attributes.

In [None]:
fig, ax = plt.subplots(2,2,figsize=(15,5))
ax[0,0].plot(random_line)
ax[0,0].set_ylabel('random values')

plt.ylabel('random values')
plt.show()

><b>Task:</b> Plot your second <code>random_line_2</code> in the upper-right hand subplot, put a scatter plot with <code>random_line</code> on the x-axis and <code>random_line_2</code> on the y-axis in the lower-left hand subplot, and put a histogram of your normally-distributed random data in the lower-right hand subplot.

In [None]:
# Your code here

There are *many, many* different aspects of a figure that you could manipulate (and spend a lot of time manipulating).

Style guides help with this a bit, they set a few good defaults. Below, we are setting figure parameters, and choosing a figure style (see all styles [here](https://matplotlib.org/gallery/style_sheets/style_sheets_reference.html), or how to create your own style [here](https://matplotlib.org/tutorials/introductory/customizing.html).).)

You can test how these parameters change our plots by going back and re-plotting the plots above.


><b>Task:</b> Use different figure styles to change the appearance of your scatter plot

In [None]:
# Set the figure "dots per inch" to be higher than the default (optional, based on your personal preference)
mpl.rcParams['figure.dpi'] = 100

# (Optional) Choose a figure style
print(plt.style.available)
plt.style.use('bmh')
plt.scatter(random_line,random_line_2)

## Step Two: Get metadata & electrophysiology data

Here, we'll condense the steps from the previous notebook into one cell.

In [None]:
try:
    import allensdk
    print('allensdk imported')
except ImportError as e:
    !pip install allensdk

from allensdk.core.cell_types_cache import CellTypesCache
from allensdk.api.queries.cell_types_api import CellTypesApi

import pandas as pd

# We'll then initialize the cache as 'ctc' (cell types cache)
ctc = CellTypesCache(manifest_file='cell_types/manifest.json')

human_df = pd.DataFrame(ctc.get_cells(species=[CellTypesApi.HUMAN])).set_index('id')
ephys_features = pd.DataFrame(ctc.get_ephys_features()).set_index('specimen_id')
human_ephys_df = human_df.join(ephys_features)
human_ephys_df.head()

## Step Three: Plot our ephys metrics

Our plotting goal for today is to compare spiny and aspiny cells in humans. We have two options: we could split the dataframe into aspiny and spiny (as you did in our CellTypes notebook), or use our plotting tools to plot the data separately.

Usefully, Pandas has some [built-in plotting tools](https://pandas.pydata.org/pandas-docs/stable/user_guide/visualization.html) to interact with Matplotlib, so we can actually just tell it to plot based on spiny and aspiny.

First, let's plot the **number of cells** that we have for each of the dendrite types. Recall that we can access a column by using the bracket notation, with the column name in parentheses in the bracket.

`value_counts()` is a method that will count up the number of instances of each value.

`plot()` is a Pandas method that will plot, depending on the `kind` argument you give it.

In [None]:
# For the different values in dendrite_type column, get the value_counts, and plot as a bar plot.
human_ephys_df['dendrite_type'].value_counts().plot(kind='bar')

# Add y label
plt.ylabel('Number of cells')

# Show the plot!
plt.show()

><b>Task:</b> Try plotting different features of the dataset (hint: look at the column headers). Try plotting these features using different styles (hint: check out the link in the *Markdown Cell* above to see different <code>kind</code> methods).

Our dataframe contains a *lot* of different metrics on these cells. Let's remind ourselves what we have available by accessing the `columns` attribute.

In [None]:
human_ephys_df.columns

Let's choose one of these columns and plot a boxplot. We'll do this with a call to pyplot ([examples here](https://matplotlib.org/gallery/pyplots/boxplot_demo_pyplot.html#sphx-glr-gallery-pyplots-boxplot-demo-pyplot-py)).

**Note**: This is actually *slightly* easier by using the methods of our [dataframe](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.boxplot.html), but it's a little bit more difficult to work with separate objects of the plot afterwards. Knowing how to create plots with a call to `plt.boxplots()` is a more universal way to work with various types of data structures, including dataframes, arrays, lists, etc.

><b>Task:</b> Create a boxplot that compares spiny, aspiny, and sparsely spiny with the following steps:
1. Save three different dataframes from your <code>human_ephys_df</code> by filtering for spiny, aspiny, and sparsely spiny. You can filter dataframes by using the syntax <code>dataframe[dataframe['column_name'] = 'variable_name']</code>
2. Assign the 'fast_trough_v_long_square' of each of your spiny, aspiny, and sparsely spiny dataframes to three different pandas series objects (like a dataframe, but only one dimension). For example, <code>spiny_ft = spiny_data['fast_trough_v_long_square']</code>
3. Create a list of your three different pandas series, and assign it to <code>data</code>. You can accomplish this by using the <code>series.tolist()</code> function.
4. Create a boxplot by using <code>plt.boxplot()</code> and don't forget to show it!
5. Once you're sure the boxplot is working, add a few lines of code to change the xticks, as well as add x and y labels ([see documentation here](https://matplotlib.org/api/axes_api.html#axis-limits)).

In [None]:
# Your code here!

### Plot this data as a scatterplot

The built-in scatterplot methods in Pandas are bit clunky, so we'll use `plt.scatter()` instead ([documentation here](https://matplotlib.org/api/_as_gen/matplotlib.pyplot.scatter.html#matplotlib.pyplot.scatter)).

The syntax for a scatter plot is slightly longer if we want to label multiple groups with different colors. We'll actually loop through groups in order to create our plot:

In [None]:
# Get possible dendrite types
dendrite_types = human_ephys_df['dendrite_type'].unique()

fig = plt.figure()

for d_type in dendrite_types:

    df = human_ephys_df[human_ephys_df['dendrite_type'] == d_type]

    plt.scatter(df['fast_trough_v_long_square'],
                df['upstroke_downstroke_ratio_long_square'],
                label=d_type)

plt.ylabel("upstroke-downstroke ratio")
plt.xlabel("fast trough depth (mV)")
plt.legend(loc='best')

plt.show()