## Environment Setup

We are using the standalone version of the iedb population coverage tool. Make sure you have that installed first by running the following commands:

In [1]:
# install the iedb population coverage tool
! python ./env/iedb_PopTool/configure.py

* You must have 'numpy' and 'matplotlib' packages installed.
Run this command to install them:
$ pip install numpy matplotlib==2.0.0

That's it. You're all set!


In [2]:
# display population tool usage
! python ./env/iedb_PopTool/calculate_population_coverage.py --help

usage: python calculate_population_coverage.py [-h] -p [POPULATION] -c [MHC_CLASS] -f [FILE]

Created on: 03/06/2017 @author: Dorjee Gyaltsen @brief: calculates population
coverage - standalone version

optional arguments:
  -h, --help            show this help message and exit
  --list                list all population and ethnicity
  --plot PATH           generate a plot.
  --version             show program's version number and exit

required arguments:
  -p POPULATION [POPULATION ...], --population POPULATION [POPULATION ...]
                        select comma-separated area(s) or population(s)
  -c MHC_CLASS [MHC_CLASS ...], --mhc_class MHC_CLASS [MHC_CLASS ...]
                        select one or more comma-separated mhc class option -
                        I, II, combined
  -f FILE, --file FILE  a file containing a list of epitopes and associated
                        alleles (comma-separated)


In [3]:
# import dependencies
import pandas as pd

## MHC Class-I

In [4]:
df_1 = pd.read_csv('./data/Binding_Prediction/mhc1_test3/mhc1_test3_api_result.csv', sep=',')
df_1.to_csv('./data/PopulationCoverage/mhc_1/input.txt', index=False, header=False, sep='\t', columns=['peptide', 'allele'])

In [5]:
! python ./env/iedb_PopTool/calculate_population_coverage.py -p World -c I -f ./data/PopulationCoverage/mhc_1/input.txt > ./data/PopulationCoverage/mhc_1/output.txt

## MHC Class-II

In [6]:
df_2 = pd.read_csv('./data/Binding_Prediction/mhc2_test3/mhc2_test3_api_result.csv', sep=',')
df_2['allele'] = df_2['allele'].str.replace('/', ',')
df_2.to_csv('./data/PopulationCoverage/mhc_2/input.txt', index=False, header=False, sep='\t', columns=['peptide', 'allele'])

In [7]:
! python ./env/iedb_PopTool/calculate_population_coverage.py -p World -c II -f ./data/PopulationCoverage/mhc_2/input.txt > ./data/PopulationCoverage/mhc_2/output.txt

## Result Procession

In [8]:
def GetSeq (re_path, df):
    re_df = pd.read_csv(re_path, skiprows=6, sep='\t')
    cov_seqs = df.iloc[re_df.index[re_df['percent_individuals']!=0].tolist(), :]
    return cov_seqs

In [9]:
GetSeq('./data/PopulationCoverage/mhc_1/output.txt', df_1).to_csv('./data/PopulationCoverage/mhc_1/output_seqs.txt', sep='\t')
GetSeq('./data/PopulationCoverage/mhc_2/output.txt', df_2).to_csv('./data/PopulationCoverage/mhc_2/output_seqs.txt', sep='\t')