# **The genetic basis of parental care evolution in monogamous mice.** 
### Andres Bendesky, Young-Mi Kwon,  Jean-Marc Lassance,  Caitlin L. Lewarch, Shenqin Yao, Brant K. Peterson, Meng Xiao He,  Catherine Dulac,  & Hopi E. Hoekstra
> *Nature* **2017**

> **ABSTRACT:** Parental care is essential for the survival of mammals, yet the mechanisms underlying its evolution remain largely unknown. Here we show that two sister species of mice, Peromyscus polionotus and Peromyscus maniculatus, have large and heritable differences in parental behaviour. Using quantitative genetics, we identify 12 genomic regions that affect parental care, 8 of which have sex-specific effects, suggesting that parental care can evolve independently in males and females. Furthermore, some regions affect parental care broadly, whereas others affect specific behaviours, such as nest building. Of the genes linked to differences in nest-building behaviour, vasopressin is differentially expressed in the hypothalamus of the two species, with increased levels associated with less nest building. Using pharmacology in Peromyscus and chemogenetics in Mus, we show that vasopressin inhibits nest building but not other parental behaviours. Together, our results indicate that variation in an ancient neuropeptide contributes to interspecific differences in parental care.

In [None]:
# @title Monogomous and Permiscuous voles
%%html
<div align='center'>
  <img src='https://drive.google.com/uc?export=view&id=15MZd0zThC4KbFyPUau8CoHZNtb6XHPTS' style="width: 85vw"></div>



### In preparation for this classroom session you read the paper "*The genetic basis of parental care evolution in monogamous mice*" as well as the "News and Views" commentary "*How to build a better dad*" written by Steven Phelps.<br>
---
# **Today** you will work directly with data from this paper.


In [15]:
#@title __execute__ this cell to prepare your notebook environment with the necessary python packages 

# and mount your google drive

import numpy as np
import urllib.request
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib
import seaborn as sns
import scipy
from IPython.display import display

plt.rcParams.update({'font.size': 18})

# from google.colab import drive
# drive.mount('/content/gdrive')

def create_figure():
  hfig = plt.figure()
  ax = hfig.add_axes([0.1,0.1,0.8,0.8])
  return ax

---
#Before working with the data, let's reflect on some of the important aspects of the experimental design.

### Diversity of parental care behavior is widespread in the animal kingdom. What was special about these two vole species in particular that enabled the researchers to interrogate the genetic basis of parental care behaviors?

__Type your answer__ in this cell markdown cell

### Below are 4 videos that demonstrate the parental behavior of mothers and fathers of each vole species. Each video is long (~20min) so you won't have time to watch them now, but you can flip through a few different points in each video really quick to get a general sense of differences. You can revisit these videos later on your own if you like. 
> ***note*** Do not spend more than 5 minutes looking at these videos here during class.


In [None]:
#@title Parental Behavior of a Peromyscus polionotus father:
%%html
<div align="center">
<video width=600 controls>
      <source src="https://static-content.springer.com/esm/art%3A10.1038%2Fnature22074/MediaObjects/41586_2017_BFnature22074_MOESM129_ESM.mp4" type="video/mp4">
</video>
</div>

In [None]:
#@title Parental Behavior of a Peromyscus maniculatus father:
%%html
<div align="center">
<video width=600 controls>
      <source src="https://static-content.springer.com/esm/art%3A10.1038%2Fnature22074/MediaObjects/41586_2017_BFnature22074_MOESM130_ESM.mp4" type="video/mp4">
</video>
</div>

In [None]:
#@title Parental Behavior of a Peromyscus polionotus mother:
%%html
<div align="center">
<video width=600 controls>
      <source src="https://static-content.springer.com/esm/art%3A10.1038%2Fnature22074/MediaObjects/41586_2017_BFnature22074_MOESM131_ESM.mp4" type="video/mp4">
</video>
</div>

In [None]:
#@title Parental Behavior of a Peromyscus maniculatus mother:
%%html
<div align="center">
<video width=600 controls>
      <source src="https://static-content.springer.com/esm/art%3A10.1038%2Fnature22074/MediaObjects/41586_2017_BFnature22074_MOESM132_ESM.mp4" type="video/mp4">
</video>
</div>

-----
# Let's work with the data!
* For each Figure, code has been written for you that imports the raw data that the authors provided with the online version of [the research article](https://www.nature.com/articles/nature22074#Sec28).
* The data will be loaded into a "*pandas*" [dataframe](https://realpython.com/pandas-dataframe/) format.
 *  Essentially, dataframes are like 'Microsoft Excell' sheets that we can access with python code and that have specific functions associated with them that makes data analysis easier. They have column headers (labels) and row indices to organise the data.
----




# <font color=green> Figure 1</font> :


In [None]:
#@title **execute** this cell to IMPORT & FORMAT the RAW DATA <br> (if you are interested seeing the code that makes that happen, you can double click this cell)

# URL of data to download
data_url = 'https://static-content.springer.com/esm/art%3A10.1038%2Fnature22074/MediaObjects/41586_2017_BFnature22074_MOESM110_ESM.xlsx'

# Get the data and save it locally as "data.xls"
data, headers = urllib.request.urlretrieve(data_url, './data.xls')

xls = pd.ExcelFile(data)
df_NestQuality = pd.read_excel(xls, sheet_name = 'panel C', skiprows = 1)
df_Licking = pd.read_excel(xls, sheet_name = 'panel D', skiprows = 1)
df_Huddling = pd.read_excel(xls, sheet_name = 'panel E', skiprows = 1)
df_Retrieving = pd.read_excel(xls, sheet_name = 'panel F', skiprows = 1)

### You now have a dataframe for the data underlying each of the Panels B-E of Figure 1 with the following names: 
- df_Licking
- df_NestQuality
- df_Huddling
- df_Retrieving

### Like each Figure panel, the data in the dataframe is grouped into 4 categories corresponding to the mothers and fathers of each species... <br>
### Pick a behavior and Take a look!


In [28]:
# Type the name of one of the four dataframes here and execute the cell to view
# (how is the appearance of the result different when you use the print() function to view?)


#### You can also just display the first few lines of a dataframe by using the head( ) method : 
`dataframe_name.head()` <br>
Try this in the cell below

In [None]:
# display only the first few lines of the dataframe you viewed above



### Did you notice 'NaN' at the end of some of the columns? <br>
### What does this mean and why are there NaN in the dataframe? <br>
 
> __hint__ look at the 'n' noted at the bottom of the Figure 1. 

__Type your answer__ in this cell markdown cell

### Let's plot the data to visualize it.
* [Seaborn](https://seaborn.pydata.org/) is a statistical data visualization package that has a lot of great plotting functions for dataframes.
* run the cell below to import the seaborn package as "`sns`" 

In [None]:
import seaborn as sns

* Since the authors used boxplots to display the data, let's do the same for now. <br> Seaborn has a [boxplot( )](https://seaborn.pydata.org/generated/seaborn.boxplot.html) function. <br> To use it, the most basic format would be `sns.boxplot(data = dataframe_name)`

 * **hint** : put a '`;`' at the end of the line of code to suppress text output associated with the plot

 * This function automatically finds the names of the columns in the dataframe to use as axis labels.

In [None]:
# Make a boxplot of the dataframe for the behavior you chose to work with



### Great! Now that you understand better how your dataframe is organized and what data it contains,  let's visualize a comparison that was not explicitly made in the paper : 
### **The difference in parental care behavior betwee sexes (ignoring species identity).**

### First we need to collect the data in a new way : We need a list of the data for all females (ie. "mothers" of both species combined) and a list of the data for all males ("fathers").

* To get the data from a whole column of a dataframe, "*index*" it using the column name (which is a string).
* Remember that indices are placed inside brackets ( `[ ]` ).
* The contents of a column can be assigned to a new variable name.

> ***note*** for the following sections, decide on one behavior to look at (licking, nest quality, etc) and apply the instructions to the corresponding dataframe of your choice. (in other words, be consistent about which dataframe you are indexing) 


In [None]:
# Assign the data for P.maniculatus fathers from your dataframe of choice to the variable 'fathers_man'
fathers_man = 


In [None]:
# And now assign the data for P. polionotus fathers to the variable 'fathers_pol'
fathers_pol = 


In [None]:
# Assign the data for P. maniculatus mothers to the variable 'mothers_man'
mothers_man = 


In [None]:
# Assign the data for P. maniculatus mothers to the variable 'mothers_pol'
mothers_pol = 


### **Execute** the following code cell to make a new dataframe from the variables you just created.

In [None]:
df_by_sex = pd.DataFrame({
    'female' : np.asarray(list(mothers_man) + list(mothers_pol)),
    'male' : np.asarray(list(fathers_man) + list(fathers_pol))
  })

In [None]:
# use the seaborn boxplot( ) function to plot the data in this new dataframe 'df_by_sex' 



### The authors reported a significant effect of sex for all of the behaviors (mothers and fathers differed significantly in their performance of each behavior). If this was such a significant effect, why do you think the authors did not put the plot you just made in the figure? 

__Type your answer__ in this cell markdown cell


-----
<br>

# <font color=green> Figure 2 </font>

In [None]:
#@title __execute__ this cell to import and format data for Figure 2

# URL of data to download
data_url = 'https://static-content.springer.com/esm/art%3A10.1038%2Fnature22074/MediaObjects/41586_2017_BFnature22074_MOESM112_ESM.xlsx'

# Get the data and save it locally as "data.xls"
data, headers = urllib.request.urlretrieve(data_url, './data.xls')

xls = pd.ExcelFile(data)

df = pd.read_excel(xls, sheet_name = 'panel C', header = None, skiprows = 3,names=['', '', '','Not_Cross_Fostered','Cross_Fostered','',''])
df_maniculatus_father_licking = df.reset_index()[['Not_Cross_Fostered','Cross_Fostered']]

### The authors said they found that there was a significant difference in the licking of *P. maniculatus* fathers that were cross-fostered versus those that were not. ***But*** they did not report the mean licking value in each condition. Let's calculate that now.

### By executing the last cell, you imported a dataframe called `df_maniculatus_father_licking` that contains data for the licking behavior of *P. maniculatus* fathers. There are two columns: "*Not_Cross_Fostered*" and "*Cross_Fostered*"

> You *could* get the mean of each column using the following method: <br>
`np.mean(dataframe_name['column_name'])`. <br> *However*, Dataframes have a bunch of nifty methods/functions associated with them that make this kind of calculation really easy. The [`mean( )`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.mean.html) method is one of them . Note that you don't even have to give this function arguments in order to use it. Apply this function to the dataframe `df_maniculatus_father_licking` below: 

In [None]:
# apply the mean() method to the df_maniculatus_father_licking dataframe



### Also, the data for this comparison is very visually 'smushed' in the Figure. 

### Use seaborn's boxplot() function that you just learned to plot the data in df_maniculatus_father_licking.





In [None]:
# use seaborn to plot the data from the df_maniculatus_father_licking dataframe here



### To test whether there is a significant difference between the behavior of two groups, one method researchers often use is the statistical '[t-test](https://en.wikipedia.org/wiki/Student%27s_t-test)'.
* python has many packages for statistical testing. [Scipy.stats](https://docs.scipy.org/doc/scipy/reference/stats.html) is one of them.
* [ttest_ind](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_ind.html#scipy.stats.ttest_ind) performs an independent students t-test

In [None]:
# Run this cell to test the null hypothesis 
# that the results under the two test conditions are the same.
scipy.stats.ttest_ind(df_maniculatus_father_licking['Not_Cross_Fostered'],
                      df_maniculatus_father_licking['Cross_Fostered'],
                      nan_policy = 'omit')

Ttest_indResult(statistic=2.3038228486155976, pvalue=0.027669007620192)

### How would you state the results of this t-test in terms of the p-value result? <bR>(Don't forget to note the 'significance level') 

***type*** your answer in this markdown

---

<br>

# <font color=green> Figure 3 : Section I</font>

In [16]:
#@title __execute__ this cell to import and format data for Figure 3

# URL of data to download
data_url = 'https://static-content.springer.com/esm/art%3A10.1038%2Fnature22074/MediaObjects/41586_2017_BFnature22074_MOESM114_ESM.xlsx'

# Get the data and save it locally as "data.xls"
data, headers = urllib.request.urlretrieve(data_url, './data.xls')

xls = pd.ExcelFile(data)

df = pd.read_excel(xls, sheet_name = 'panels B-E', header = None, skiprows = 1,names=['ID', 'Parent', 'Species','Retrieval','Huddling','Licking','NestQuality'])

### Take a moment to look at the dataframe for Figure 3. 
### Notice how it is organized it a bit differently than the dataframes from the other figures. 

In [None]:
# display the contents of the dataframe for Figure 3 'df'
df

### Each row corresponds to a different individual. Each column contains information about each individual. 

### We can get a list of criteria for each column by getting the "*unique*" values in that column. 
  * The [`unique( )`](https://numpy.org/doc/stable/reference/generated/numpy.unique.html) function is part of the numpy package (which was imported as "`np`")
  * Remember that dataframe columns are indexed by their column name 

In [None]:
# use the np.unique() function to print the list of criteria in the 'Species' column



### We can find individuals fulfilling certain criteria by asking which rows are equal to (```==```) that certain criteria in a given column. For example, to find which individuals are mothers we would type ```df['Parent']=='mother'```. In other words, we are stating "**the parent is a mother**." In return, we get an assessment of whether this statement was true or false for each row. 
### Try this in the cell below (note capitalization in column headers):

### For each dataframe row in which the statement "The parent is female" is true, there is a ' ```True``` ' and vice versa for each row in which the statement is false. The result is called a *boolean* array and is very useful for sorting through the dataframe to isolate information we are interested in.  <br>

### In the following cell, 
* 1) assign the expression above to a variable called '`bool_mother`' <br>
* 2) use that variable (a boolean array) to index the original dataframe 
> this isolates the data for just mothers from the original dataframe
* 3) assign the indexed original dataframe to a new one called **df_mothers**.

In [6]:
# Place your code here



#### Great! If you compared the size (```len()```) of *df_mothers* to *df*, you would see that *df_mothers* contains data from a smaller number of animals (because it leaves out all data for fathers).<br>
> ##### Use the code cell below to look at this for yourself (refer to earlier in the notebook for examples of how you examined the length of a dataframe before). <br>

In [None]:
# compare the length of df_mothers to the length of df



# The authors use Figure 3 to make the following statement: <br>
> "*The distribution of each component of parental care among the F2 mice encompassed the distributions of both species (Fig. 3b–e). On the basis of the largely unimodal distributions of parental behaviours among the F2 hybrids, which resembled P. maniculatus more closely than P. polionotus, the more extensive parental care of P. polionotus probably involves more than one genetic locus.*" <br>
### We can plot the data from Figure 3 in a different way that may help us visualize this better. We will visually compare the behavior data for all mothers in the F2 generation to the behavior data for mothers of each of the two species in the Parental generation individually. To do so, we will make smoothed histogram for each group and overlay them on the same plot.

* Create a boolean array for each Parent species and assign these booleans to the variables: <br>
 ' ```bool_polionotus``` ' and ' ```bool_maniculatus``` '
 > Take care to use the *df_mothers* dataframe you had created rather than the original *df* dataframe <br>
* Create a boolean array for the 'F2' species and assign this boolean to the variable: <br>
 ' ```bool_F2``` '
 
* Pick a behavior to plot (Retrieval, Huddling, Licking, or NestQuality).
* Index the df_mothers dataframe you just created to extract the data column corresponding to the behavior you chose. 
* Seaborn has a [kdeplot( )](https://seaborn.pydata.org/generated/seaborn.kdeplot.html#seaborn.kdeplot) function that you will use to plot the *distribution* of this data. This function creates a ***smoothed histogram plot***. 
> *KDE* stands for 'kernel density estimate'

In [18]:
# Create a the boolean array for P. polionotus
bool_polionotus = 

# Create a the boolean array for P. polionotus
bool_maniculatus = 

# Create a the boolean array for F2
bool_F2 = 



### Use the seaborn kdeplot( ) function to plot the behavior data for :
1. Individuals for which bool_polionotus is True
2. Individuals for which bool_maniculatus is True
3. Individuals for which bool_F2 is True
> ***hint*** index your dataframe using the booleans you created <br>
* Make sure to include the code for all three plots in the ***same code cell*** so that the distributions will be overlaid in the same plot.
* Include a 'color' argument in the function and specify a different color for each group. This way you can keep track of which data belongs to which group.

### **NOTE** To include the data for only a single behavior, the dataframe needs to also be indexed by the behavior name (column name)
> `dataframe[boolean index][behavior_name]`



In [None]:
# Create a kdeplot( ) for each of the three groups you are comparing
# for each group, plot data from only one behavior of interest.
sns.kdeplot()
sns.kdeplot()
sns.kdeplot()


### Compare your plot to the plots in Figure 3 of the paper. Where is the data for each of the lines you just plotted located in that figure? In the plot you just made, you can more explicitly examine overlap between the distributions of the different groups.

---

<br>

# <font color=green> Figure 3 : Section II</font>

# The authors noted that some of the parental behaviors were correlated with each other.
### What does that mean in other words? Describe using an example from the data in this paper. 

**Type** your answer in this markdown 

### The authors tested for correlations among behaviours in the F2 progeny. Let's compare that to correlations among behaviors in the Parental animals using the data from Figure 3.
* Look at either mothers or fathers by indexing the Figure 3 dataframe (`df`) by a boolean for df['Species']==mothers/fathers.
* Assign the output to a new dataframe variable called 'df_mothers' or 'df_fathers'
* The [`corr( )`](https://www.geeksforgeeks.org/python-pandas-dataframe-corr/) method of a dataframe calculates the correlations among all of its columns containing numerical values. <br> Use the `corr( )` method to calculate behavior correlations for:
 * The parental generation of one species
 * The F2 generation 

In [27]:
# create a dataframe containing only mothers or only fathers


# create a boolean to index all individuals in the F2 generation


# create a boolean to index all individuals 
# in the Parent generation of one species (P. polionotus or P. maniculatus)



In [None]:
# calculate the correlation among behaviors in the F2 generation



In [None]:
# calculate the correlation among behaviors in the Parent generation



### How do the correlations compare between these two groups?

### We can visualize correlations using a scatter plot with one variable on the x-axis and the other variable on the y-axis. You have already learned how to use the [`plt.scatter( )`](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.scatter.html) function.
### Which behaviors are the most and least correlated in the F2 generation? 
* Make a scatter plot of the two most correlated behaviors against each other 
* Make a scatter plot of the least two correlated behaviors against each other.

In [None]:
# make a scatter plot of the two most correlated behaviors against each other



### Does it look like you expected? Why or why not?

In [None]:
# make a scatter plot of the two least correlated behaviors against each other



### From this analysis the authors concluded that: <br>
> " *Some genetic loci affect multiple parental  behaviours, whereas nest building is more genetically independent from the other behaviours measured.* "
### __Discuss__ this conclusion and how the correlations you just calculated and visualized supports their statement. What would be different about the correlations you just calculated if the alternative (that all behaviors are genetically independent from each other) were true?



***your answer here***

---

<br>

# <font color=green> Figure 4 </font>

### Figure 4 proviides a visualization of the correlation coefficient table for fathers as well as mothers. The authors note that there is a stronger effect in the fathers. 
* Does your result match the one reported in the paper? We will have to load their source data for the figure to compare since they color-coded the results rather than providing the explicit quantifications.
 > (Note that in Figure 4, they also analyzed two other behaviors (Handling and Approach) that we do not have data for in Figure 3.)

In [None]:
#@title **execute** to load and display the raw correlation coefficient values plotted in Figure4

# URL of data to download
data_url = 'https://static-content.springer.com/esm/art%3A10.1038%2Fnature22074/MediaObjects/41586_2017_BFnature22074_MOESM116_ESM.xlsx'

# Get the data and save it locally as "data.xls"
data, headers = urllib.request.urlretrieve(data_url, './data.xls')

xls = pd.ExcelFile(data)

df_Fig4_fathers = pd.read_excel(xls,nrows=6,skiprows=3,header=None,names=['Nest quality', 'Time licking pup', 'Time huddling pup','Fraction of pups retrieved','Promptness to handle pup','Promptness to approach pup'])
df_Fig4_mothers = pd.read_excel(xls,nrows=6,skiprows=12,header=None,names=['Nest quality', 'Time licking pup', 'Time huddling pup','Fraction of pups retrieved','Promptness to handle pup','Promptness to approach pup'])
from termcolor import colored
print('')
print("\033[31m \033[1m" + 'Figure 4 -- F2 Mothers:' + "\033[0m")
display(df_Fig4_mothers)
print('')
print('')
print('')
print("\033[31m \033[1m" + 'Figure 4 -- F2 Fathers:' + "\033[0m")
display(df_Fig4_fathers)

---


## Resources
For additional Jupyter Notebook information and practice, see [this tutorial](https://www.dataquest.io/blog/jupyter-notebook-tutorial/) from DataQuest. 

## About this Notebook
This notebook was created by [Krista Perks](https://github.com/neurologic) for the Animal Behavior class at Wesleyan University.