New simulated dataset, new "kevlar localize" command #83

standage · 2017-06-12T22:02:49Z

This PR addresses two issues that have captured my focus recently.

First, we're in dire need of a data set for testing where 1) we know the "correct" answers (variant calls) and 2) we can run and re-run the kevlar workflow quickly. I've simulated various data sets for testing & continuous integration purposes in the past: there are on the side of "too trivial". I've also simulated some larger data sets recently, but these are on the side of "take too long to run".
Second, up until recently I've been doing all variant calling by examining alignments manually. It's past time to automate this! But this underscored the need for the first point, so there's been a bit of yak shaving going on.

I'm happy to present notebook/human-sim-pico and kevlar localize.

The directory notebook/human-sim-pico has the complete record for how I produced a 2.5 Mb random human-like genome from scratch. This includes a Jupyter notebook, some data files, several commands, and a bit of commentary.
The command kevlar localize takes a kevlar assemble-generated Fasta file, invokes BWA to localize the k-mers in the reference genome, and (assuming it maps to a single region) extracts the genomic interval associated with the assembled contig(s) plus a bit. The assembled contig(s) will then be aligned to this genomic region with a dynamic programming solution to be implemented soon by Fereydoun.

This PR still needs a bit of cleanup (mostly documentation and tests) before it is merged.

codecov-io · 2017-06-12T22:10:20Z

Codecov Report

Merging #83 into master will increase coverage by 0.33%.
The diff coverage is 87.27%.

@@            Coverage Diff             @@
##           master      #83      +/-   ##
==========================================
+ Coverage   82.72%   83.05%   +0.33%     
==========================================
  Files          27       29       +2     
  Lines        1430     1523      +93     
  Branches      220      239      +19     
==========================================
+ Hits         1183     1265      +82     
- Misses        205      211       +6     
- Partials       42       47       +5

Impacted Files	Coverage Δ
kevlar/overlap.py	`70.5% <0%> (ø)`	⬆️
kevlar/novel.py	`75.96% <100%> (ø)`	⬆️
kevlar/cli/localize.py	`100% <100%> (ø)`
kevlar/reaugment.py	`91.66% <100%> (ø)`	⬆️
kevlar/seqio.py	`93.06% <100%> (+0.08%)`	⬆️
kevlar/__init__.py	`90.27% <100%> (+0.13%)`	⬆️
kevlar/cli/__init__.py	`100% <100%> (ø)`	⬆️
kevlar/assemble.py	`84.82% <50%> (ø)`	⬆️
kevlar/filter.py	`63.11% <50%> (ø)`	⬆️
kevlar/localize.py	`86.41% <86.41%> (ø)`
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 586c55e...df991ed. Read the comment docs.

standage added 5 commits June 9, 2017 14:42

New simulation

d7440e9

Update to notebook

20019d9

Sandboxing BWA wrapper

ecf65a7

New localize command

5d2b8bd

Clean up localize, add sequence extraction

7019bfe

standage added 10 commits June 12, 2017 16:02

Docstrings

236919e

Generalize augmented parser/reader, use for kevlar localize

6667f8b

Begin testing kevlar localize

83f4b98

More tests

deda6c1

More tests

dcef254

Appease pep8

54455e0

Tests for extract_region

6febadc

Python2 compatibility

68b2970

Test the kevlar localize main method

f3fe6a9

Improved docstrings, BWA install for Travis

df991ed

standage merged commit 7e52431 into master Jun 13, 2017

standage deleted the sim/pico branch June 13, 2017 16:50

standage mentioned this pull request Jun 13, 2017

Profiling results #84

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New simulated dataset, new "kevlar localize" command #83

New simulated dataset, new "kevlar localize" command #83

standage commented Jun 12, 2017

codecov-io commented Jun 12, 2017 •

edited

Loading

New simulated dataset, new "kevlar localize" command #83

New simulated dataset, new "kevlar localize" command #83

Conversation

standage commented Jun 12, 2017

codecov-io commented Jun 12, 2017 • edited Loading

Codecov Report

codecov-io commented Jun 12, 2017 •

edited

Loading