A python script to locate read alignments in nucleotide sequences using a naive matching approach. I know, slow, but the project startet initially to learn more about the read alignment problem. A long-term goal consists of the implementation of Boyer-Moore. The underlying idea was to identify binding sites that comprise only a few nucleotides in target sequences, such as those required for miRNA or primer binding sites. Of course, it can also be used as a general tool to identify partial regions in nucleotide sequences. Because of its lacking performance, it is not intended to work with large datasets, but rather as a downstream tool for detailed analysis.
python3 finder.py -t template.fa -q query.fa -o /your/output/path
Parameter | Description | Default |
---|---|---|
-t (--target ) |
path to template file | |
-q (--query ) |
path to query file | |
-o (--output ) |
path to output folder | |
-m (--mismatch ) |
number of mismatches allowed | 0 |
-s (--save ) |
Save output to file | False |
-r (--rev ) |
Search also in reverse complement of target sequence | False |
If no output path (-o) is specified, the current working directory is used.
The 'data' folder contains files with arbitrarily generated nucleotide sequences for testing purposes. Try them out using:
python3 finder.py -t ./data/template.fa -q ./data/query.fa --mismatch 2
The output is structured into a mapping file (in progress):
File | Description |
---|---|
mapping.txt | contains a simple text based visualization of template sequences with at least one hit |
If you have any feedback or comments, please send me a mail or open an issue on github.