Kevlar is a library that can be installed using pip on command line. The kevlar package requires Python 3 and has several dependencies that are not in the standard Python libraries, as stated.

# Installation:

Some dependencies of Kevlar have commands that only run on linux. So we started by downloading wsl since our computer's operating system is windows. Then, we created a virtual environment which can be accessed through: "source ~/kevlar-3.8/bin/activate", which was recommended by the developers in order not to affect outside processes. Everything was installed on this environment.

It is important to note that Kevlar was last updated in 2019, so it uses old versions of dependencies. We first installed Python version 3.8 in order not to face any issues with the version, since biokevlar was last updated a long time ago. We then installed all package dependencies (pysam, netwrokx, pandas, scipy, intervaltree) manually. BWA was also needed, so we followed instructions given in its repository to install it.Once they were installed sucessfully, we installed biokevlar. Dependencies and kevlar can be installed easily using these 2 commands on command line: 


In [None]:
% pip3 install pysam networkx pandas scipy intervaltree git+https://github.com/dib-lab/khmer.git
% pip3 install biokevlar

# Running Kevlar 
After installing dependencies and Kevlar itself, we ran Kevlar on an example data set that is provided by the developers of Kevlar. It is mentioned that the output should include 5 variant calls: a 300 bp insertion and 4 single-nucleotide variants.

The sequencing data can be obtained through: 
Mother: https://osf.io/db82p
Father: https://osf.io/6vrnz
Proband: https://osf.io/wt5h8
Reference genome: https://osf.io/35wgn

It is recommended by the creators to run Kevlar using a Snakemake workflow, so we had to also install Snakemake.

Running kevlar can be done using the command line interface with a few simple commands, and we used the terminal in VSCODE. We created a new folder called "testing_modified_config" and we copied 2 files "Snakefile" and "config.json" to it (which are used in the command to run Kevlar using Snakefile), for easier access and retrieval of files. 

1st step: Downloading the data. 
We created a subfolder called "data" in "testing_modified_config" and downloaded all the data (reads from mother, father, proband and reference genome) into the subfolder. 

2nd step: Downloading and formatting the configuration file.

3rd step: Invoking the Snakemake workflow.
 The workflow runs Kevlar by invoking multiple subcommands like "partition" and "assemble"(which are the multiple steps of Kevlar)



In [None]:
# Downloading the data:
% cd data/
% curl -L https://osf.io/db82p/download -o mother.fq.gz
% curl -L https://osf.io/6vrnz/download -o father.fq.gz
% curl -L https://osf.io/wt5h8/download -o proband.fq.gz
% curl -L https://osf.io/35wgn/download -o refr.fa.gz
% bwa index refr.fa.gz

# Downloading and formatting the configuration file 
% curl -L https://osf.io/86adm/download | sed "s:/home/user/Desktop:$(pwd):g" > helium-config.json

# Exiting the "data file"
% cd ..

#Invoking the workflow
% snakemake     --snakefile Snakefile     --configfile data/helium-config.json --cores 4 --directory workdir -p calls








The first time we ran the workflow command (3rd step), we encountered the following error in the step of "partition": 

```
AttributeError: 'ReadGraph' object has no attribute 'node'
Error in rule partition:
    jobid: 13
    input: NovelReads/filtered.augfastq.gz
    output: NovelReads/partitioned.augfastq.gz, Logs/partition.log
    shell:
        kevlar --tee --logfile Logs/partition.log partition --min-abund 5 --out NovelReads/partitioned.augfastq.gz NovelReads/filtered.augfastq.gz
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
```

We fixed it by installing an older version of networkX (version 2.0)




Then we ran the workflow again, where we encountered another error: 

```
AttributeError: module 'numpy' has no attribute 'int'.
Error in rule partition:
    jobid: 13
    input: NovelReads/filtered.augfastq.gz
    output: NovelReads/partitioned.augfastq.gz, Logs/partition.log
    shell:
        kevlar --tee --logfile Logs/partition.log partition --min-abund 5 --out NovelReads/partitioned.augfastq.gz NovelReads/filtered.augfastq.gz
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

```

We fixed this error by downloading an older version of numpy (version 1.20.3)


This is the succesful run of Kevlar:

```python

Using shell: /usr/bin/bash
Provided cores: 4
Rules claiming more threads will be scaled down.
Select jobs to execute...
kevlar --tee --logfile Logs/simlike.log simlike --mu 30.0 --sigma 10.0 --epsilon 0.001 --case-min 5 --refr Reference/refr-counts.smallcounttable --sample-labels Proband Mother Father --out calls.scored.sorted.vcf.gz --controls Sketches/ctrl0-counts.counttable Sketches/ctrl1-counts.counttable --case Sketches/case-counts.counttable calls.0.prelim.vcf.gz calls.1.prelim.vcf.gz
[kevlar] running version 0.7
[kevlar::simlike] Loading k-mer counts for each sample
[kevlar::simlike] Computing likelihood scores for preliminary variant calls
[Wed Apr 24 13:46:54 2024]
Finished job 1.
8 of 9 steps (89%) done
Select jobs to execute...

[Wed Apr 24 13:46:54 2024]
localrule calls:
    input: calls.scored.sorted.vcf.gz
    output: complete
    jobid: 0
    reason: Missing output files: complete; Input files updated by another job: calls.scored.sorted.vcf.gz
    resources: tmpdir=/tmp

Touching output file complete.
[Wed Apr 24 13:46:54 2024]
Finished job 0.
9 of 9 steps (100%) done

```

This is the output of the run: a VCF (Variant Call Format) file. It is a standardized text file format used for representing SNP, indel, and structural variation calls. After unzipping it, we obtained the results in a table form as well as . The results are in line with the expected output (5 variant calls) so the run was succesful. 

A screenshot of a section of the table which shows the 5 variant calls is attached below for visualization purposes.



As it can be seen, there are 5 variant calls, the first one being an insertion and the other 4 are SNP since for insertions, the ALT allele includes the inserted sequence. 
(REF= reference allele and ALT= alternative allele)




![image](./output2.png)


