Skip to content

Commit

Permalink
rename to GMGC-mapper
Browse files Browse the repository at this point in the history
  • Loading branch information
psj1997 committed Jun 17, 2020
1 parent e70c508 commit b352e73
Show file tree
Hide file tree
Showing 13 changed files with 684 additions and 29 deletions.
16 changes: 8 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# GMGC-Finder
# GMGC-mapper

![gmgc_finder_test](https://github.com/BigDataBiology/GMGC-Finder/workflows/gmgc_finder_test/badge.svg)
![gmgc_mapper_test](https://github.com/BigDataBiology/GMGC-Finder/workflows/gmgc_mapper_test/badge.svg)


Command line tool to query the Global Microbial Gene Catalog (GMGC).
Expand All @@ -13,7 +13,7 @@ Install from source
python setup.py install
```

GMGC-Finder requires [prodigal](https://github.com/hyattpd/Prodigal) to be
GMGC-mapper requires [prodigal](https://github.com/hyattpd/Prodigal) to be
available for genome mode.


Expand All @@ -32,26 +32,26 @@ available for genome mode.
1. Input is a genome sequence.

```bash
gmgc-finder -i input.fasta -o output
gmgc-mapper -i input.fasta -o output
```

2. Input is DNA/protein gene sequences

```bash
gmgc-finder --nt-genes genes.fna --aa-genes genes.faa -o output
gmgc-mapper --nt-genes genes.fna --aa-genes genes.faa -o output
```

The nucleotide input is optional (but should be used if available so that the
quality of the hits can be refined):

```bash
gmgc-finder --aa-genes genes.faa -o output
gmgc-mapper --aa-genes genes.faa -o output
```

If yout input is a metagenome, you can use
[NGLess](https://github.com/ngless-toolkit/ngless) for assembly and gene
prediction. For more details, [read the
docs](https://gmgc-finder.readthedocs.io/en/latest/usage/).
docs](https://gmgc-mapper.readthedocs.io/en/latest/usage/).

## Output

Expand All @@ -63,6 +63,6 @@ The output folder will contain
4. Human readable summary.

For more details, [read the
docs](https://gmgc-finder.readthedocs.io/en/latest/output/). A description of
docs](https://gmgc-mapper.readthedocs.io/en/latest/output/). A description of
the outputs is also written to output folder for convenience.

4 changes: 2 additions & 2 deletions docs/index.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# GMGC-Finder
# GMGC-mapper

GMGC-Finder is a command line tool to query input genome to the Global
GMGC-mapper is a command line tool to query input genome to the Global
Microbial Gene Catalog (GMGC). It will return the summary of alignment
categories and genome bins.

Expand Down
2 changes: 1 addition & 1 deletion docs/install.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Install

GMGC-Finder requires [prodigal](https://github.com/hyattpd/Prodigal). You need
GMGC-mapper requires [prodigal](https://github.com/hyattpd/Prodigal). You need
to install prodigal first and add it into your system path.

Install from source
Expand Down
6 changes: 3 additions & 3 deletions docs/output.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
# Output of GMGC-finder
# Output of GMGC-mapper

Explanation of the files in the output directory

## Prodigal output

These three files are the output of prodigal (if GMGC-finder was called in
These three files are the output of prodigal (if GMGC-mapper was called in
genome mode)

- `prodigal_out.faa` protein sequence
Expand Down Expand Up @@ -55,7 +55,7 @@ The file `summary.txt` provides a human-readable summary of the results, while
`runlog.yaml` is a summary of run metadata (as a YaML file, it is both machine
and human-readable).

The file `summary.txt` should be reproducible and running GMGC-finder twice on
The file `summary.txt` should be reproducible and running GMGC-mapper twice on
the same input should produce the same results. By design, though,
`runglog.yaml` includes information such as the time when the analysis was run
which is not reproducible.
Expand Down
10 changes: 5 additions & 5 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,25 +17,25 @@ The input must contain a genome file or DNA and Protein gene file or just Protei
1. Input is a genome sequence (`input.fasta`).

```bash
gmgc-finder -i input.fasta -o output
gmgc-mapper -i input.fasta -o output
```

GMGC-finder will call `prodigal` to predict genes and then process each gene.
GMGC-mapper will call `prodigal` to predict genes and then process each gene.

2. Input is DNA/protein gene sequences (`genes.fna` and `genes.faa`,
respectfully).

```bash
gmgc-finder --nt-genes genes.fna --aa-genes genes.faa -o output
gmgc-mapper --nt-genes genes.fna --aa-genes genes.faa -o output
```
```bash
gmgc-finder --aa-genes genes.faa -o output
gmgc-mapper --aa-genes genes.faa -o output
```
# Processing metagenomes using NGLess

If your input is metagenome, you can use
[NGLess](https://github.com/ngless-toolkit/ngless) for assembly and gene
prediction and, then, pass the results to GMGC-finder.
prediction and, then, pass the results to GMGC-mapper.


## Install
Expand Down
173 changes: 173 additions & 0 deletions gmgc_mapper/BLOSUM.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,173 @@
blosum50 = \
{
'*': {'*': 1, 'A': -5, 'C': -5, 'B': -5, 'E': -5, 'D': -5, 'G': -5,
'F': -5, 'I': -5, 'H': -5, 'K': -5, 'M': -5, 'L': -5,
'N': -5, 'Q': -5, 'P': -5, 'S': -5, 'R': -5, 'T': -5,
'W': -5, 'V': -5, 'Y': -5, 'X': -5, 'Z': -5},
'A': {'*': -5, 'A': 5, 'C': -1, 'B': -2, 'E': -1, 'D': -2, 'G': 0,
'F': -3, 'I': -1, 'H': -2, 'K': -1, 'M': -1, 'L': -2,
'N': -1, 'Q': -1, 'P': -1, 'S': 1, 'R': -2, 'T': 0, 'W': -3,
'V': 0, 'Y': -2, 'X': -1, 'Z': -1},
'C': {'*': -5, 'A': -1, 'C': 13, 'B': -3, 'E': -3, 'D': -4,
'G': -3, 'F': -2, 'I': -2, 'H': -3, 'K': -3, 'M': -2,
'L': -2, 'N': -2, 'Q': -3, 'P': -4, 'S': -1, 'R': -4,
'T': -1, 'W': -5, 'V': -1, 'Y': -3, 'X': -1, 'Z': -3},
'B': {'*': -5, 'A': -2, 'C': -3, 'B': 6, 'E': 1, 'D': 6, 'G': -1,
'F': -4, 'I': -4, 'H': 0, 'K': 0, 'M': -3, 'L': -4, 'N': 5,
'Q': 0, 'P': -2, 'S': 0, 'R': -1, 'T': 0, 'W': -5, 'V': -3,
'Y': -3, 'X': -1, 'Z': 1},
'E': {'*': -5, 'A': -1, 'C': -3, 'B': 1, 'E': 6, 'D': 2, 'G': -3,
'F': -3, 'I': -4, 'H': 0, 'K': 1, 'M': -2, 'L': -3, 'N': 0,
'Q': 2, 'P': -1, 'S': -1, 'R': 0, 'T': -1, 'W': -3, 'V': -3,
'Y': -2, 'X': -1, 'Z': 5},
'D': {'*': -5, 'A': -2, 'C': -4, 'B': 6, 'E': 2, 'D': 8, 'G': -1,
'F': -5, 'I': -4, 'H': -1, 'K': -1, 'M': -4, 'L': -4, 'N': 2,
'Q': 0, 'P': -1, 'S': 0, 'R': -2, 'T': -1, 'W': -5, 'V': -4,
'Y': -3, 'X': -1, 'Z': 1},
'G': {'*': -5, 'A': 0, 'C': -3, 'B': -1, 'E': -3, 'D': -1, 'G': 8,
'F': -4, 'I': -4, 'H': -2, 'K': -2, 'M': -3, 'L': -4, 'N': 0,
'Q': -2, 'P': -2, 'S': 0, 'R': -3, 'T': -2, 'W': -3, 'V': -4,
'Y': -3, 'X': -1, 'Z': -2},
'F': {'*': -5, 'A': -3, 'C': -2, 'B': -4, 'E': -3, 'D': -5,
'G': -4, 'F': 8, 'I': 0, 'H': -1, 'K': -4, 'M': 0, 'L': 1,
'N': -4, 'Q': -4, 'P': -4, 'S': -3, 'R': -3, 'T': -2, 'W': 1,
'V': -1, 'Y': 4, 'X': -1, 'Z': -4},
'I': {'*': -5, 'A': -1, 'C': -2, 'B': -4, 'E': -4, 'D': -4,
'G': -4, 'F': 0, 'I': 5, 'H': -4, 'K': -3, 'M': 2, 'L': 2,
'N': -3, 'Q': -3, 'P': -3, 'S': -3, 'R': -4, 'T': -1,
'W': -3, 'V': 4, 'Y': -1, 'X': -1, 'Z': -3},
'H': {'*': -5, 'A': -2, 'C': -3, 'B': 0, 'E': 0, 'D': -1, 'G': -2,
'F': -1, 'I': -4, 'H': 10, 'K': 0, 'M': -1, 'L': -3, 'N': 1,
'Q': 1, 'P': -2, 'S': -1, 'R': 0, 'T': -2, 'W': -3, 'V': -4,
'Y': 2, 'X': -1, 'Z': 0},
'K': {'*': -5, 'A': -1, 'C': -3, 'B': 0, 'E': 1, 'D': -1, 'G': -2,
'F': -4, 'I': -3, 'H': 0, 'K': 6, 'M': -2, 'L': -3, 'N': 0,
'Q': 2, 'P': -1, 'S': 0, 'R': 3, 'T': -1, 'W': -3, 'V': -3,
'Y': -2, 'X': -1, 'Z': 1},
'M': {'*': -5, 'A': -1, 'C': -2, 'B': -3, 'E': -2, 'D': -4,
'G': -3, 'F': 0, 'I': 2, 'H': -1, 'K': -2, 'M': 7, 'L': 3,
'N': -2, 'Q': 0, 'P': -3, 'S': -2, 'R': -2, 'T': -1, 'W': -1,
'V': 1, 'Y': 0, 'X': -1, 'Z': -1},
'L': {'*': -5, 'A': -2, 'C': -2, 'B': -4, 'E': -3, 'D': -4,
'G': -4, 'F': 1, 'I': 2, 'H': -3, 'K': -3, 'M': 3, 'L': 5,
'N': -4, 'Q': -2, 'P': -4, 'S': -3, 'R': -3, 'T': -1,
'W': -2, 'V': 1, 'Y': -1, 'X': -1, 'Z': -3},
'N': {'*': -5, 'A': -1, 'C': -2, 'B': 5, 'E': 0, 'D': 2, 'G': 0,
'F': -4, 'I': -3, 'H': 1, 'K': 0, 'M': -2, 'L': -4, 'N': 7,
'Q': 0, 'P': -2, 'S': 1, 'R': -1, 'T': 0, 'W': -4, 'V': -3,
'Y': -2, 'X': -1, 'Z': 0},
'Q': {'*': -5, 'A': -1, 'C': -3, 'B': 0, 'E': 2, 'D': 0, 'G': -2,
'F': -4, 'I': -3, 'H': 1, 'K': 2, 'M': 0, 'L': -2, 'N': 0,
'Q': 7, 'P': -1, 'S': 0, 'R': 1, 'T': -1, 'W': -1, 'V': -3,
'Y': -1, 'X': -1, 'Z': 4},
'P': {'*': -5, 'A': -1, 'C': -4, 'B': -2, 'E': -1, 'D': -1,
'G': -2, 'F': -4, 'I': -3, 'H': -2, 'K': -1, 'M': -3,
'L': -4, 'N': -2, 'Q': -1, 'P': 10, 'S': -1, 'R': -3,
'T': -1, 'W': -4, 'V': -3, 'Y': -3, 'X': -1, 'Z': -1},
'S': {'*': -5, 'A': 1, 'C': -1, 'B': 0, 'E': -1, 'D': 0, 'G': 0,
'F': -3, 'I': -3, 'H': -1, 'K': 0, 'M': -2, 'L': -3, 'N': 1,
'Q': 0, 'P': -1, 'S': 5, 'R': -1, 'T': 2, 'W': -4, 'V': -2,
'Y': -2, 'X': -1, 'Z': 0},
'R': {'*': -5, 'A': -2, 'C': -4, 'B': -1, 'E': 0, 'D': -2, 'G': -3,
'F': -3, 'I': -4, 'H': 0, 'K': 3, 'M': -2, 'L': -3, 'N': -1,
'Q': 1, 'P': -3, 'S': -1, 'R': 7, 'T': -1, 'W': -3, 'V': -3,
'Y': -1, 'X': -1, 'Z': 0},
'T': {'*': -5, 'A': 0, 'C': -1, 'B': 0, 'E': -1, 'D': -1, 'G': -2,
'F': -2, 'I': -1, 'H': -2, 'K': -1, 'M': -1, 'L': -1, 'N': 0,
'Q': -1, 'P': -1, 'S': 2, 'R': -1, 'T': 5, 'W': -3, 'V': 0,
'Y': -2, 'X': -1, 'Z': -1},
'W': {'*': -5, 'A': -3, 'C': -5, 'B': -5, 'E': -3, 'D': -5,
'G': -3, 'F': 1, 'I': -3, 'H': -3, 'K': -3, 'M': -1, 'L': -2,
'N': -4, 'Q': -1, 'P': -4, 'S': -4, 'R': -3, 'T': -3,
'W': 15, 'V': -3, 'Y': 2, 'X': -1, 'Z': -2},
'V': {'*': -5, 'A': 0, 'C': -1, 'B': -3, 'E': -3, 'D': -4, 'G': -4,
'F': -1, 'I': 4, 'H': -4, 'K': -3, 'M': 1, 'L': 1, 'N': -3,
'Q': -3, 'P': -3, 'S': -2, 'R': -3, 'T': 0, 'W': -3, 'V': 5,
'Y': -1, 'X': -1, 'Z': -3},
'Y': {'*': -5, 'A': -2, 'C': -3, 'B': -3, 'E': -2, 'D': -3,
'G': -3, 'F': 4, 'I': -1, 'H': 2, 'K': -2, 'M': 0, 'L': -1,
'N': -2, 'Q': -1, 'P': -3, 'S': -2, 'R': -1, 'T': -2, 'W': 2,
'V': -1, 'Y': 8, 'X': -1, 'Z': -2},
'X': {'*': -5, 'A': -1, 'C': -1, 'B': -1, 'E': -1, 'D': -1,
'G': -1, 'F': -1, 'I': -1, 'H': -1, 'K': -1, 'M': -1,
'L': -1, 'N': -1, 'Q': -1, 'P': -1, 'S': -1, 'R': -1,
'T': -1, 'W': -1, 'V': -1, 'Y': -1, 'X': -1, 'Z': -1},
'Z': {'*': -5, 'A': -1, 'C': -3, 'B': 1, 'E': 5, 'D': 1, 'G': -2,
'F': -4, 'I': -3, 'H': 0, 'K': 1, 'M': -1, 'L': -3, 'N': 0,
'Q': 4, 'P': -1, 'S': 0, 'R': 0, 'T': -1, 'W': -2, 'V': -3,
'Y': -2, 'X': -1, 'Z': 5}}


blosum62 = \
{'A': {'A': 4, 'R': -1, 'N': -2, 'D': -2, 'C': 0, 'Q': -1, 'E': -1, 'G': 0,
'H': -2, 'I': -1, 'L': -1, 'K': -1, 'M': -1, 'F': -2, 'P': -1, 'S': 1,
'T': 0, 'W': -3, 'Y': -2, 'V': 0, 'B': -2, 'Z': -1, 'X': 0, '*': -4},
'R': {'A': -1, 'R': 5, 'N': 0, 'D': -2, 'C': -3, 'Q': 1, 'E': 0, 'G': -2,
'H': 0, 'I': -3, 'L': -2, 'K': 2, 'M': -1, 'F': -3, 'P': -2, 'S': -1,
'T': -1, 'W': -3, 'Y': -2, 'V': -3, 'B': -1, 'Z': 0, 'X': -1, '*': -4},
'N': {'A': -2, 'R': 0, 'N': 6, 'D': 1, 'C': -3, 'Q': 0, 'E': 0, 'G': 0,
'H': 1, 'I': -3, 'L': -3, 'K': 0, 'M': -2, 'F': -3, 'P': -2, 'S': 1,
'T': 0, 'W': -4, 'Y': -2, 'V': -3, 'B': 3, 'Z': 0, 'X': -1, '*': -4},
'D': {'A': -2, 'R': -2, 'N': 1, 'D': 6, 'C': -3, 'Q': 0, 'E': 2, 'G': -1,
'H': -1, 'I': -3, 'L': -4, 'K': -1, 'M': -3, 'F': -3, 'P': -1, 'S': 0,
'T': -1, 'W': -4, 'Y': -3, 'V': -3, 'B': 4, 'Z': 1, 'X': -1, '*': -4},
'C': {'A': 0, 'R': -3, 'N': -3, 'D': -3, 'C': 9, 'Q': -3, 'E': -4, 'G': -3,
'H': -3, 'I': -1, 'L': -1, 'K': -3, 'M': -1, 'F': -2, 'P': -3, 'S': -1,
'T': -1, 'W': -2, 'Y': -2, 'V': -1, 'B': -3, 'Z': -3, 'X': -2, '*': -4},
'Q': {'A': -1, 'R': 1, 'N': 0, 'D': 0, 'C': -3, 'Q': 5, 'E': 2, 'G': -2,
'H': 0, 'I': -3, 'L': -2, 'K': 1, 'M': 0, 'F': -3, 'P': -1, 'S': 0,
'T': -1, 'W': -2, 'Y': -1, 'V': -2, 'B': 0, 'Z': 3, 'X': -1, '*': -4},
'E': {'A': -1, 'R': 0, 'N': 0, 'D': 2, 'C': -4, 'Q': 2, 'E': 5, 'G': -2,
'H': 0, 'I': -3, 'L': -3, 'K': 1, 'M': -2, 'F': -3, 'P': -1, 'S': 0,
'T': -1, 'W': -3, 'Y': -2, 'V': -2, 'B': 1, 'Z': 4, 'X': -1, '*': -4},
'G': {'A': 0, 'R': -2, 'N': 0, 'D': -1, 'C': -3, 'Q': -2, 'E': -2, 'G': 6,
'H': -2, 'I': -4, 'L': -4, 'K': -2, 'M': -3, 'F': -3, 'P': -2, 'S': 0,
'T': -2, 'W': -2, 'Y': -3, 'V': -3, 'B': -1, 'Z': -2, 'X': -1, '*': -4},
'H': {'A': -2, 'R': 0, 'N': 1, 'D': -1, 'C': -3, 'Q': 0, 'E': 0, 'G': -2,
'H': 8, 'I': -3, 'L': -3, 'K': -1, 'M': -2, 'F': -1, 'P': -2, 'S': -1,
'T': -2, 'W': -2, 'Y': 2, 'V': -3, 'B': 0, 'Z': 0, 'X': -1, '*': -4},
'I': {'A': -1, 'R': -3, 'N': -3, 'D': -3, 'C': -1, 'Q': -3, 'E': -3, 'G': -4,
'H': -3, 'I': 4, 'L': 2, 'K': -3, 'M': 1, 'F': 0, 'P': -3, 'S': -2,
'T': -1, 'W': -3, 'Y': -1, 'V': 3, 'B': -3, 'Z': -3, 'X': -1, '*': -4},
'L': {'A': -1, 'R': -2, 'N': -3, 'D': -4, 'C': -1, 'Q': -2, 'E': -3, 'G': -4,
'H': -3, 'I': 2, 'L': 4, 'K': -2, 'M': 2, 'F': 0, 'P': -3, 'S': -2,
'T': -1, 'W': -2, 'Y': -1, 'V': 1, 'B': -4, 'Z': -3, 'X': -1, '*': -4},
'K': {'A': -1, 'R': 2, 'N': 0, 'D': -1, 'C': -3, 'Q': 1, 'E': 1, 'G': -2,
'H': -1, 'I': -3, 'L': -2, 'K': 5, 'M': -1, 'F': -3, 'P': -1, 'S': 0,
'T': -1, 'W': -3, 'Y': -2, 'V': -2, 'B': 0, 'Z': 1, 'X': -1, '*': -4},
'M': {'A': -1, 'R': -1, 'N': -2, 'D': -3, 'C': -1, 'Q': 0, 'E': -2, 'G': -3,
'H': -2, 'I': 1, 'L': 2, 'K': -1, 'M': 5, 'F': 0, 'P': -2, 'S': -1,
'T': -1, 'W': -1, 'Y': -1, 'V': 1, 'B': -3, 'Z': -1, 'X': -1, '*': -4},
'F': {'A': -2, 'R': -3, 'N': -3, 'D': -3, 'C': -2, 'Q': -3, 'E': -3, 'G': -3,
'H': -1, 'I': 0, 'L': 0, 'K': -3, 'M': 0, 'F': 6, 'P': -4, 'S': -2,
'T': -2, 'W': 1, 'Y': 3, 'V': -1, 'B': -3, 'Z': -3, 'X': -1, '*': -4},
'P': {'A': -1, 'R': -2, 'N': -2, 'D': -1, 'C': -3, 'Q': -1, 'E': -1,
'G': -2, 'H': -2, 'I': -3, 'L': -3, 'K': -1, 'M': -2, 'F': -4, 'P': 7, 'S': -1,
'T': -1, 'W': -4, 'Y': -3, 'V': -2, 'B': -2, 'Z': -1, 'X': -2, '*': -4},
'S': {'A': 1, 'R': -1, 'N': 1, 'D': 0, 'C': -1, 'Q': 0, 'E': 0, 'G': 0,
'H': -1, 'I': -2, 'L': -2, 'K': 0, 'M': -1, 'F': -2, 'P': -1, 'S': 4,
'T': 1, 'W': -3, 'Y': -2, 'V': -2, 'B': 0, 'Z': 0, 'X': 0, '*': -4},
'T': {'A': 0, 'R': -1, 'N': 0, 'D': -1, 'C': -1, 'Q': -1, 'E': -1, 'G': -2,
'H': -2, 'I': -1, 'L': -1, 'K': -1, 'M': -1, 'F': -2, 'P': -1, 'S': 1,
'T': 5, 'W': -2, 'Y': -2, 'V': 0, 'B': -1, 'Z': -1, 'X': 0, '*': -4},
'W': {'A': -3, 'R': -3, 'N': -4, 'D': -4, 'C': -2, 'Q': -2, 'E': -3, 'G': -2,
'H': -2, 'I': -3, 'L': -2, 'K': -3, 'M': -1, 'F': 1, 'P': -4, 'S': -3,
'T': -2, 'W': 11, 'Y': 2, 'V': -3, 'B': -4, 'Z': -3, 'X': -2, '*': -4},
'Y': {'A': -2, 'R': -2, 'N': -2, 'D': -3, 'C': -2, 'Q': -1, 'E': -2, 'G': -3,
'H': 2, 'I': -1, 'L': -1, 'K': -2, 'M': -1, 'F': 3, 'P': -3, 'S': -2,
'T': -2, 'W': 2, 'Y': 7, 'V': -1, 'B': -3, 'Z': -2, 'X': -1, '*': -4},
'V': {'A': 0, 'R': -3, 'N': -3, 'D': -3, 'C': -1, 'Q': -2, 'E': -2, 'G': -3,
'H': -3, 'I': 3, 'L': 1, 'K': -2, 'M': 1, 'F': -1, 'P': -2, 'S': -2,
'T': 0, 'W': -3, 'Y': -1, 'V': 4, 'B': -3, 'Z': -2, 'X': -1, '*': -4},
'B': {'A': -2, 'R': -1, 'N': 3, 'D': 4, 'C': -3, 'Q': 0, 'E': 1, 'G': -1,
'H': 0, 'I': -3, 'L': -4, 'K': 0, 'M': -3, 'F': -3, 'P': -2, 'S': 0,
'T': -1, 'W': -4, 'Y': -3, 'V': -3, 'B': 4, 'Z': 1, 'X': -1, '*': -4},
'Z': {'A': -1, 'R': 0, 'N': 0, 'D': 1, 'C': -3, 'Q': 3, 'E': 4, 'G': -2,
'H': 0, 'I': -3, 'L': -3, 'K': 1, 'M': -1, 'F': -3, 'P': -1, 'S': 0,
'T': -1, 'W': -3, 'Y': -2, 'V': -2, 'B': 1, 'Z': 4, 'X': -1, '*': -4},
'X': {'A': 0, 'R': -1, 'N': -1, 'D': -1, 'C': -2, 'Q': -1, 'E': -1, 'G': -1,
'H': -1, 'I': -1, 'L': -1, 'K': -1, 'M': -1, 'F': -1, 'P': -2, 'S': 0,
'T': 0, 'W': -2, 'Y': -1, 'V': -1, 'B': -1, 'Z': -1, 'X': -1, '*': -4},
'*': {'A': -4, 'R': -4, 'N': -4, 'D': -4, 'C': -4, 'Q': -4, 'E': -4, 'G': -4,
'H': -4, 'I': -4, 'L': -4, 'K': -4, 'M': -4, 'F': -4, 'P': -4, 'S': -4,
'T': -4, 'W': -4, 'Y': -4, 'V': -4, 'B': -4, 'Z': -4, 'X': -4, '*': 1}}
75 changes: 75 additions & 0 deletions gmgc_mapper/alignment.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
import skbio.alignment
from skbio.sequence import DNA,Protein
from .BLOSUM import blosum62,blosum50

def num_alignment(query,target):
num = 0
for nucl_q , nucl_t in zip(query ,target):
if nucl_q == nucl_t:
num += 1
return num

def extract_sw(sw):
query_align = str(sw[0][0])
target_aligh = str(sw[0][1])
align = num_alignment(query_align, target_aligh)
identity = align / len(target_aligh)
target_start , target_end = sw[2][1]
align_length = (target_end-target_start+1)
return identity,align_length

def identity_coverage(dna_query,protein_query,dna_target,protein_target):
"""
def category(query, dna_seq, protein_seq):
if identity_coverage(query, dna_seq) >= (0.95, 0.95): return "EXACT"
if identity_coverage(query, protein_seq) >= (0.8, 0.8): return "SIMILAR"
if identity_coverage(query, protein_seq) >= (0.5, 0.5): return "MATCH"
return "NO MATCH"
"""
if dna_query != '':
try:
sw_dna = skbio.alignment.local_pairwise_align_ssw(DNA(dna_query),DNA(dna_target))
except:
sw_dna = skbio.alignment.local_pairwise_align_nucleotide(DNA(dna_query), DNA(dna_target))
dna_identity,align_length = extract_sw(sw_dna)
dna_coverage = align_length / min(len(dna_query),len(dna_target))
if dna_identity >= 0.95 and dna_coverage >= 0.95:

return 'EXACT'

else:
try:
sw_protein = skbio.alignment.local_pairwise_align_ssw(Protein(protein_query), Protein(protein_target),
substitution_matrix=blosum62, gap_open_penalty=11,
gap_extend_penalty=1)
except:
sw_protein = skbio.alignment.local_pairwise_align_protein(Protein(protein_query),
Protein(protein_target),
substitution_matrix=blosum62,
gap_open_penalty=11, gap_extend_penalty=1)
protein_identity, align_length = extract_sw(sw_protein)
protein_coverage = align_length / min(len(protein_query), len(protein_target))
if protein_identity >= 0.8 and protein_coverage >= 0.8:
return 'SIMILAR'

if protein_identity >= 0.5 and protein_coverage >= 0.5:
return 'MATCH'

return 'NO MATCH'

else:
try:
sw_protein = skbio.alignment.local_pairwise_align_ssw(Protein(protein_query),Protein(protein_target),substitution_matrix = blosum62,gap_open_penalty=11,gap_extend_penalty=1)
except:
sw_protein = skbio.alignment.local_pairwise_align_protein(Protein(protein_query),Protein(protein_target),substitution_matrix = blosum62,gap_open_penalty=11,gap_extend_penalty=1)
protein_identity,align_length = extract_sw(sw_protein)
protein_coverage = align_length / min(len(protein_query),len(protein_target))
if protein_identity >= 0.8 and protein_coverage >= 0.8:
return 'SIMILAR'

if protein_identity >= 0.5 and protein_coverage >= 0.5:
return 'MATCH'

return 'NO MATCH'


1 change: 1 addition & 0 deletions gmgc_mapper/gmgc_mapper_version.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
__version__ = '0.0.1'

0 comments on commit b352e73

Please sign in to comment.