##### **Workflow example**

polyloxpgen contains two main methods
- polylox_merge: merge samples from multiple raw barcode files ([RPBPBR](https://github.com/hoefer-lab/RPBPBR) output)
- polylox_pgen: purge barcodes and compute pgen for single or multiple samples


##### **Installation**

to use polyloxpgen and the above methods, install it via

```
pip install polyloxpgen
```

In [4]:
# after installation, import the methods via
import polyloxpgen

ModuleNotFoundError: No module named 'polyloxpgen'

In [5]:
!pip freeze 

anyio==3.3.4
appnope==0.1.2
argon2-cffi==21.1.0
attrs==21.2.0
Babel==2.9.1
backcall==0.2.0
bleach==4.1.0
certifi==2021.10.8
cffi==1.15.0
charset-normalizer==2.0.7
debugpy==1.5.1
decorator==5.1.0
defusedxml==0.7.1
entrypoints==0.3
idna==3.3
ipykernel==6.4.2
ipython==7.28.0
ipython-genutils==0.2.0
jedi==0.18.0
Jinja2==3.0.2
json5==0.9.6
jsonschema==4.1.2
jupyter-client==7.0.6
jupyter-core==4.9.1
jupyter-server==1.11.1
jupyterlab==3.2.1
jupyterlab-pygments==0.1.2
jupyterlab-server==2.8.2
MarkupSafe==2.0.1
matplotlib-inline==0.1.3
mistune==0.8.4
nbclassic==0.3.4
nbclient==0.5.4
nbconvert==6.2.0
nbformat==5.1.3
nest-asyncio==1.5.1
notebook==6.4.5
numpy==1.21.3
packaging==21.0
pandas==1.3.4
pandocfilters==1.5.0
parso==0.8.2
pexpect==4.8.0
pickleshare==0.7.5
-e git+https://github.com/mauricelanghinrichs/polyloxpgen.git@8e8d9874b3c0dbebf311db7e200a6e3cd201d9ce#egg=polyloxpgen
prometheus-client==0.11.0
prompt-toolkit==3.0.21
ptyprocess==0.7.0
pycparser==2.20
Pygments==2.10.0
pyparsing==3.0.3
py

In [2]:
import numpy
numpy

<module 'numpy' from '/Users/m651s/opt/anaconda3/lib/python3.8/site-packages/numpy/__init__.py'>

In [3]:
import sys
sys.path

['/Users/m651s/Documents/Studium/PhD/01_projects/polylox_various/barcode_pipeline/polyloxpgen/examples',
 '/Users/m651s/opt/anaconda3/lib/python38.zip',
 '/Users/m651s/opt/anaconda3/lib/python3.8',
 '/Users/m651s/opt/anaconda3/lib/python3.8/lib-dynload',
 '',
 '/Users/m651s/opt/anaconda3/lib/python3.8/site-packages',
 '/Users/m651s/opt/anaconda3/lib/python3.8/site-packages/aeosa',
 '/Users/m651s/opt/anaconda3/lib/python3.8/site-packages/IPython/extensions',
 '/Users/m651s/.ipython']

##### **Notes on file format**

- all input and output files here use tab-separated values (TSV) files (saved as .txt)
- the required file formats (headers, columns, rows) can be seen in the examples folder
- the TSV files *can* (but maybe should not :) ) be opened with Excel
    - Excel alters barcodes sometimes falsely to dates or scientific numbers
    - to use Excel in a safe manner, see "Additional options for polylox_pgen" below

##### **polylox_merge**

below we merge two data sets (in the example folder) to a combined dataframe


In [5]:
### USER INPUT: the user has to specify the following information
# the (absolute or relative) paths of the input data sets
location_files_in = ['./sample1.barcode.count.txt', './sample2.barcode.count.txt']

# how the individual data sets are referred to ("samples")
sample_names =  ['Sample1', 'Sample2']

# folder/directory for the merged output dataframe
merge_location_dir_out = './'

# name of the output dataframe
merge_file_name_out = 'sample1_sample2_merged'
###

In [6]:
polyloxpgen.merge

AttributeError: module 'polyloxpgen' has no attribute 'merge'

In [7]:
# run polylox_merge by 
df_merged = polyloxpgen.polyloxmerge.polylox_merge(location_files_in, sample_names, merge_location_dir_out, merge_file_name_out)

AttributeError: module 'polyloxpgen' has no attribute 'polyloxmerge'

In [9]:
polyloxpgen.polyloxpgen.polyloxpgen.polyloxpgen

AttributeError: module 'polyloxpgen' has no attribute 'polyloxpgen'

In [4]:
# the merged dataframe is saved in examples folder and 
# can be displayed here by (remove semicolon for output)
df_merged;

##### **polylox_pgen**

the merged dataframe from before (or any dataframe in this format) can then be used
- to purge barcodes and reads (eliminate impossible/false barcodes and reads)
- to compute the generation probability (pgen) for the purged barcodes

In [5]:
### USER INPUT
location_file_in = './examples/sample1_sample2_merged.txt'

pgen_location_dir_out = './examples/'
pgen_file_name_out = 'sample1_sample2_pgen'
###

In [6]:
# import polylox_pgen script
# NOTE: has to be on the same level as this notebook
import polylox_pgen

In [7]:
# run polylox_pgen by 
df_pgen = polylox_pgen.main(location_file_in, pgen_location_dir_out, pgen_file_name_out)

Loading input data ... Done
Loading Polylox libraries ... Done
Purging barcodes and reads ... Done
Finding minimal recombinations ... Done
Computing generation probabilities ... Done
Creating output ... Done


In [8]:
# remove semicolon to display output dataframe
df_pgen;

***Additional options for polylox_pgen***

1) **quote marks**: use options as below to surround barcodes and/or sample names with "protecting" quote marks

this is recommended when opening files with Excel afterwards; it protects barcodes and/or sample names from being altered

```python
df_pgen = polylox_pgen.main(location_file_in, pgen_location_dir_out, pgen_file_name_out, quote_marks_barcodes=True, quote_marks_samples=True)
```
2) **float decimal delimiter**: to open files with European/German Excel, you may want to change the float decimal from '.' (default) to ','

use the option as below to change to any desired float decimal (here ',')

```python
df_pgen = polylox_pgen.main(location_file_in, pgen_location_dir_out, pgen_file_name_out, decimal_float=',')
```

3) **transition matrix**: the current default pgen computation is based on a uniform transition matrix

to reproduce pgens as in the original publication ([Pei et al., Nature, 2017](https://www.nature.com/articles/nature23653)) a length-dependent transition matrix can be used by

```python
df_pgen = polylox_pgen.main(location_file_in, pgen_location_dir_out, pgen_file_name_out, path_matrix_type='ld_2017')
```

(this is more a note for reproducibility; the option only marginally affects the computed pgens)

##### **References / Final notes**

- these scripts are based on the original [polylox (MATLAB)](https://github.com/hoefer-lab/polylox) implementation; see there also for more information
- original publication: [Pei et al., Nature, 2017](https://www.nature.com/articles/nature23653)