# Introduction to using stdpopsim

## Welcome to the first stdpopsim workshop!

**Instructor:** Ariella Gladstein, postdoc at University of North Carolina, Chapel Hill  
**Helpers:**   
Andrew Kern   
Peter Ralph  
Murillo Rodrigues   
**Participants:** If you want, introduce yourselves in the Slack chat. What's your name? Where are you located? What do you work on?

**Banter/Question**
Please use the Slack #intro-workshop1 channel for participant-participant chatting and for asking for help. To ask a general question, write it in the #intro-workshop1 channel, to ask for one-one help DM @Peter Ralph, @Andy Kern, and @Murillo Fernando Rodrigues 

## Workshop Outline
1. Basics of using Jupyter Notebooks and Binder
2. Overview of stdpopsim
3. How to Navigate the stdpopsim library catalog
4. How to use the command line interface
5. How to use the Python API
6. Example analysis
7. How to ask for help
8. Some examples of what stdpopsim cannot currently do
9. Teaser of how to contribute
10. Using stdpopsim after the workshop

----------
## 1. Basics of using Jupyter Notebooks and Binder

### How to use Jupyter Notebook
Jupyter Notebooks have cells where you can write in Markdown and run code.  
To execute a cell, click the play button or press shift enter.

In [None]:
print('Try writing some Python here')

In [None]:
%%bash
echo 'We can also use Bash magic. Try writing some Bash here'

### MyBinder Specifics
- In the cloud
- If you are not active it can disconect
- If it seems slow, try to restart the kernel
- To save your work you must "save" and "Download" the notebook

---------
## 2. Overview of stdpopsim

### What is stdpopsim?
- Library of previously published population genetic models that can be used to simulate data
- Includes simple & complex demographic models
- Models have undergone rigorous quality control to ensure what we implement matches original publication

### Why is stdpopsim useful?
- Increase reproducibility in population genetics modeling
- Less work for simulating data to test new inference methods
- Facilitate comparisons among inference methdos

### Phase 1
Adrion et al. (2020). _A community-maintained standard library of population genetic models_. eLife. https://doi.org/10.7554/eLife.54967

- Focused just on demographic modeling
- Uses msprime as simulation engine
- Realistic genetic maps for each species

---------
## 3. How to Navigate the [stdpopsim library catalog](https://stdpopsim.readthedocs.io/en/latest/catalog.html)

### The Catalog is organized first by species.

_How many species are there?_

![](images/catalog.png)


### Each species has a set of defining attribute. 

_What are the attributes?_

![](images/species_attributes.png)

### Each species has defined genome parameters.

_What are the genome parameters?_

![](images/genome_params.png)

### Some species have a genetic map.

Genetic maps are stored on AWS and downloaded to cache when used.

![](images/genetic_maps.png)

### Some species have demographic models.

All models are from published models.

_What models are available? Are there any you recognize from the literature?_

![](images/models.png)

### Each model has a description and set of defining attributes. 

_What are the attributes?_

![](images/model_attr.png)

### Each model has a table of defined model parameters from the publication. 

_Can you find where in the original publication the model parameters are given?_

![](images/model_params.png)

### _Pick a species and demographic model you want to simulate_

---------------
## 4. How to use the command line interface
Resources:
- [Documentation](https://stdpopsim.readthedocs.io/en/latest/cli_arguments.html)
- [Tutorials](https://stdpopsim.readthedocs.io/en/latest/tutorial.html#running-stdpopsim-with-the-command-line-interface-cli)

### Run stdpopsim with the help option

In [None]:
%%bash
stdpopsim --help

`stdpopsim` uses a combination of [_positional arguments_](https://stdpopsim.readthedocs.io/en/stable/cli_arguments.html#Positional%20Arguments), which are required, and [_named arguments_](https://stdpopsim.readthedocs.io/en/stable/cli_arguments.html#Named%20Arguments), which are optional.

### Find your species and run stdpopsim with that species with the help option

In [None]:
%%bash
stdpopsim HomSap --help

### Find your model and run stdpopsim with that model with the help option

In [None]:
%%bash
stdpopsim HomSap --help-models OutOfAfrica_3G09

## Run a simulation
- pick the number of samples
- pick a chromosome
- decide on an output file
- do you want a recombination map?
- do you want a fraction of the chromosome?

In [None]:
%%bash
stdpopsim HomSap 10 10 10 -c chr22 -l 0.1 -d OutOfAfrica_3G09 -o OutOfAfrica_3G09.ts

---------------
## 5. How to use the Python API
Resources:
- [Documentation](https://stdpopsim.readthedocs.io/en/stable/api.html)
- [Tutorials](https://stdpopsim.readthedocs.io/en/latest/tutorial.html#running-stdpopsim-with-the-python-interface-api)

### Import stdpopsim

In [None]:
import stdpopsim

### Set your species
https://stdpopsim.readthedocs.io/en/stable/api.html#stdpopsim.get_species

In [None]:
species = stdpopsim.get_species("HomSap")

### Find your demographic model
https://stdpopsim.readthedocs.io/en/stable/api.html#stdpopsim.DemographicModel

In [None]:
for model in species.demographic_models:
    print(model.id)

### Set your demographic model
https://stdpopsim.readthedocs.io/en/stable/api.html#stdpopsim.Species.get_demographic_model

In [None]:
model = species.get_demographic_model('OutOfAfrica_3G09')

### Verify the simulated populations

In [None]:
print([pop.id for pop in model.populations])

### Set the number of samples

In [None]:
samples = model.get_samples(10, 10, 10)

### Specify the simulator
https://stdpopsim.readthedocs.io/en/stable/api.html#simulation-engines

In [None]:
engine = stdpopsim.get_engine('msprime')

### Set your chromosome
https://stdpopsim.readthedocs.io/en/stable/api.html#stdpopsim.Species.get_contig

In [None]:
contig = species.get_contig("chr22", length_multiplier=0.1)

### Ready to simulate!
https://stdpopsim.readthedocs.io/en/latest/api.html#stdpopsim.Engine.simulate

In [None]:
ts = engine.simulate(model, contig, samples)

### All together now!

In [None]:
import stdpopsim
species = stdpopsim.get_species("HomSap")
model = species.get_demographic_model('OutOfAfrica_3G09')
samples = model.get_samples(10, 10, 10)
engine = stdpopsim.get_engine('msprime')
contig = species.get_contig("chr22", length_multiplier=0.1)
ts = engine.simulate(model, contig, samples)

-------------
## 6. Example analysis
Let's suppose we wanted to check if a published model is a good approximation of our real data. To do this, we'll calculate a few population genetics statistics and see if the real data overlap our simulated data. (For this excercise we will simulate some data and pretend it is real data).


### Simulate our "real" data using the CLI
We will be simulating our "real" data using the human Out-of-Africa with archaic admixture into Papuans.  
_Find this model in the catalog._  

Let's run the CLI help command for this model

In [None]:
%%bash
stdpopsim HomSap --help-models PapuansOutOfAfrica_10J19

_How many populations are there?_  
_How many samples do we want to simulate?_  
_What chromosome do we want to simulate?_  
_Do we want to reduce the chromosome size?_

Let's do a dry run first.

In [None]:
%%bash
stdpopsim HomSap 10 10 10 -c chr22 -l 0.1 -d PapuansOutOfAfrica_10J19 -o PapuansOutOfAfrica_10J19.ts -D

_Does that look right?_

Let's simulate for real!

In [None]:
%%bash
stdpopsim HomSap 10 10 10 -c chr22 -l 0.1 -d PapuansOutOfAfrica_10J19 -o PapuansOutOfAfrica_10J19.ts

### Convert simulated "real" data to vcf format

Since we are pretending this is real data, let's convert tree output to vcf format. We can do this using [tskit](https://tskit.readthedocs.io/).  
https://tskit.readthedocs.io/en/latest/cli.html?highlight=vcf#vcf

In [None]:
%%bash
tskit vcf --ploidy 2 PapuansOutOfAfrica_10J19.ts > PapuansOutOfAfrica_10J19.vcf

Let's take a look at our vcf

In [None]:
%%bash
ls

In [None]:
%%bash
head PapuansOutOfAfrica_10J19.vcf

### Simulate a model to compare to "real" data with Python API and calculate stats

We'll simulate a simpler model with the same samples - human Three population out-of-Africa _n_ times.  
_Find this model in the catalog._  

_How many populations are there?_  
_How many samples do we want to simulate?_  
_What chromosome do we want to simulate?_  
_Do we want to reduce the chromosome size?_

#### Set up the simulation

In [None]:
import stdpopsim
species = stdpopsim.get_species("HomSap")
model = species.get_demographic_model('OutOfAfrica_3G09')
samples = model.get_samples(10, 10, 10)
engine = stdpopsim.get_engine('msprime')
contig = species.get_contig("chr22", length_multiplier=0.1)

The summary statistics we'll calculate are:
- mean genetic diversity (Tajima's pi) for each population

#### We will run the simulation and calculate the stats once to make sure everything looks correct

Make a list of sample chromosomes (nodes) from each population

In [None]:
sample_list = []
for pop in range(0, ts.num_populations):
    sample_list.append(ts.samples(pop).tolist())

print(sample_list)

Run the simulations and calculate the summary statistics

In [None]:
n = 10
pi_list = []
for i in range(n):
    ts = engine.simulate(model, contig, samples)
    pi_list.append(ts.diversity(sample_sets=sample_list))

### Plot stats from simulated and "real" data

Convert simulated stats to a dataframe (only because I like dataframes)

In [None]:
import pandas as pd
pi_df = pd.DataFrame(data=pi_list, columns=[pop.id for pop in model.populations])
pi_df

In [None]:
pi_df_melted = pd.melt(pi_df, var_name='Pops', value_name='Pi')

Plot data

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt
ax = sns.histplot(data=pi_df_melted, x="Pi", hue="Pops", bins=30)

---------
## 7. How to ask for help
- Have you read the documentation?
- Search open and closed [GitHub issues](https://github.com/popsim-consortium/stdpopsim/issues?q=is%3Aissue)
- Write a new GitHub issue 
- Join the PopSim Slack workspace and post in the #newbie-help channel (An invitation has been sent to all participants. If you have not recieved an invivation, email Ariella, Andy, or Peter)

---------
## 8. Some examples of what stdpopsim cannot currently do
- simulate species or demographic models that are not in the catalog 
    - if you want to do this, if it is a published model - submit it to stdpopsim, if it is not a published model, use a simulator (e.g. msprime, slim)
- simulate parameter values not from the published model (including priors)
- simulate selection (in the works!)
- simulate missing data and errors (on the horizon!)


----------
## 9. Teaser of how to contribute

-----------
## 10. Using stdpopsim on your on after the workshop
- Can still play in a Jupyter Notebook Binder
  - Using this one
  - In the Binder associated with the [stdpopsim GitHub repository](https://github.com/popsim-consortium/stdpopsim) [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/popsim-consortium/stdpopsim/master?filepath=stdpopsim_example.ipynb)
- Install stdpopsim locally following the [instructions in the documentation](https://stdpopsim.readthedocs.io/en/latest/installation.html)
- Consult the [stdpopsim documentation](https://stdpopsim.readthedocs.io/en/latest/installation.html)