## Assembly of K-mers with Kassembly and Kmerkit

### Kmerkit

Kmerkit is a general toolkit for performing reference-free genome-wide association analyses using kmers. The preferred way to run analyses in Kmerkit is to use it as an Application Programming Interface (API)  interactively in a jupyter notebook. This allows access to statistics, plotting summaries, and encourages users to create reproducible documentation of their workflow.

In [5]:
import kmerkit

The program is currently contains six modules: Kcount, Kfilter, Kextract, Kassemble, and Kgwas. Each module can be used independently or consecutively in a pipeline. For example, the first step of a gwas pipeline using K-merkit is executing the 'Kcount' module which uses the Python package subprocess to call the program KMC to count k-mers in fastq/fasta files. Once unique k-mers are counted, the output files are loaded into the following module, Kfilter and Kextrat to filter unique k-mers that are then assembled into contigs and scaffols using Kassemble.

### Kassemble

Kassemble is an independent module applied to Kmerkit. Kassemble is a Python package that performs denovo assemblies of contigs and scaffolds using a reference-free, k-mer based approach. Kassemble incorporates the SPAdes and/or SOAPdenovo2 as a Python wrapper to create contigs of unique k-mers extracted from reads in a fastq file. SPAdes, St. Petersburg genome assembler, is an assembly toolkit containing various assembly pipelines and SOAPdenovo is a novel short-read assembler. Kassemble builds on these programs by offering visualization of k-mer statistics and assembly. 

In [None]:
import kassemble

When applied to the Kmerkit pipeline, first use K-count to count k-mers then Kfilter to filter unique k-mers to then be extracted with Kextract. K-mers from Kextract are input into Kassembly as a fast.q file assembled into contigs and scaffolds, then De Bruijn assembly graphs. 

### Pipeline for contigs and scaffolds using E.coli test data 

In [None]:
# import packages 

import kmerkit
import kassemble 

In [None]:
# Count k-mers using Kcount 

kmerkit kcount --name test --workdir /tmp --sample A ecoli_1K_1.fq.gz --sample B ecoli_1K_2.fq.gz

In [None]:
# Filter k-mers using Kfilter

kmerkit kfilter --name test --workdir /tmp --mincov A 0.0 B 1.0 --maxcov A 0.0 B 1.0

In [None]:
# Extract k-mers using Kextract

kmerkit kextract --name test --workdir /tmp --samples A ecoli_1K_1.fq.gz 

In [None]:
# Assemble k-mers using Kassemble

kassemble --name tes --workdir /tmp-assembled/ --sample ecoli_1K_1.fq.gz 