<img src="../images/MalariaGEN.png" alt="MalariaGEN logo" width="375px" align="left">

**We would like to thank all MalariaGEN Plasmodium falciparum Community Project partners for their contribution. If you use this resource please remember to also site the following studies:**
[Pf6 partner studies](http://ngs.sanger.ac.uk/production/malaria/pfcommunityproject/Pf6/Pf_6_partner_studies.pdf) and [GenRe partner studies](http://ngs.sanger.ac.uk/production/malaria/Resource/29/20200705-GenRe-07-PartnerStudyInformation-0.39.pdf).

# Drug Resistant variants visualisation

This notebook allows you to use the Phenotyper tool to infer phenotypes from your own data. After achieving this, you can use our simple visualisation tools to see your results and compare them with our Pf6+ dataset, which stores over 13,500 samples with inferred phenotypes, collected across the world.

## Setup

### Running on Colab

In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [None]:
!cp -r drive/MyDrive/data_analysis .

### Running Locally

There are some steps you need to follow before opening the notebooks to run them locally. If you haven't already, please follow these [instructions](https://gitlab.com/malariagen/gsp/pf6plus/-/tree/add_jupyter_notebooks/notebooks#running-locally).

### Import the python scripts 

In [None]:
import pandas as pd
from data_analysis.plot_dr_prevalence import *
from data_analysis.plot_haplotype_frequency import *
from data_analysis.tabulate_drug_resistance import *
from data_analysis.interactive_widgets import *

In [None]:
%matplotlib inline 

In [None]:
from ipywidgets import interact, fixed
import bokeh.io

In [None]:
bokeh.io.output_notebook()

### Import data

In [None]:
#input Pf6+ data 
pf6plus_metadata = 'https://pf6plus.cog.sanger.ac.uk/pf6plus_metadata.tsv'
pf6plus = pd.read_csv(pf6plus_metadata, sep='\t', index_col=0, low_memory=False)

Here we filter out the samples which have IncludeInAnalysis set to True. This will filter out only the high-quality samples. (This includes a combination of QC samples for WGS and samples “included” in the GenRe analysis for AmpSeq)

In [None]:
pf6plus=pf6plus.loc[pf6plus.IncludeInAnalysis==True]

## Prevalence of Resistant Variants

Note: If your GRC doesn't have information on Drug Resistance, you can use the Phenotyper tool (`3_phenotyper.ipynb`), to get it. 



### Tabulate drug resistant variants


In [None]:
# use help(name_of_function) to access the documentation notes
help(tabulate_drug_resistant)

In [None]:
tabulate_drug_resistant(pf6plus, 'S-P')

In [None]:
tabulate_drug_resistant(pf6plus, 'S-P', population='WAF')

In [None]:
tabulate_drug_resistant(pf6plus,'S-P', year = [2007, 2010], bin=False)

In [None]:
tabulate_drug_resistant(pf6plus,'S-P', country='Mali', year = [2007,2010], bin=True)

In [None]:
tabulate_drug_resistant(pf6plus,'S-P', country='Gambia', year = [2007,2010], bin=True)

**The number of samples collected across different locations in different years varies widely in the Pf6+ data resource. To increase confidence in the plots shown below, a threshold is set to only include country (or) population/year combinations with n_samples>25. You can change this default value by using the `threshold` flag, but please be cautious.**

### Plot Drug Resistant Prevalence

In [None]:
help(plot_dr_prevalence)

In [None]:
plot_dr_prevalence(pf6plus, drugs=['S-P','Sulfadoxine','Chloroquine','Artemisinin','DHA-PPQ','Piperaquine'], country = 'Gambia', population = 'WAF')

In [None]:
plot_dr_prevalence(pf6plus, drugs=['S-P', 'Sulfadoxine', 'Chloroquine'], country = 'Gambia', population = 'WAF')

In [None]:
plot_dr_prevalence(pf6plus, drugs=['S-P'], country = 'Gambia')

In [None]:
plot_dr_prevalence(pf6plus, drugs=['S-P'], country = 'Mali', population = 'WAF')

### Plot most common haplotypes per population/country

In [None]:
help(plot_haplotype_frequency)

In [None]:
plot_haplotype_frequency(pf6plus, 'PfDHFR', num_top_haplotypes=5, populations = ['CAF'], years = None, bin=False)

In [None]:
plot_haplotype_frequency(pf6plus, 'PfDHFR', num_top_haplotypes=5, populations = ['CAF', 'EAF', 'ESEA', 'OCE', 'SAS', 'WAF', 'WSEA'], years = None, bin=False)

In [None]:
plot_haplotype_frequency(pf6plus, 'PfDHFR', num_top_haplotypes=5, countries = ['Mali'], bin=False)

## What else can I do? 

We are interested on the evolution of Kelch haplotypes in ESEA & would like to know how different are countries within this population & whether we can detect country-specific mutations. 


In [None]:
plot_haplotype_frequency(pf6plus, 'Kelch', populations =  ['ESEA'])

In [None]:
plot_haplotype_frequency(pf6plus, 'Kelch', countries = ['Cambodia'])

In [None]:
plot_haplotype_frequency(pf6plus, 'Kelch', countries = ['Vietnam','Cambodia'])

In [None]:
plot_haplotype_frequency(pf6plus, 'Kelch', countries = ['Laos'])

### An extra use case.. 
Explore how different are PfDHFR haplotypes between ESEA and WSEA


In [None]:
plot_haplotype_frequency(pf6plus, 'PfDHFR', populations = ['WSEA','ESEA','WAF','EAF'])