A Python package for accessing and parsing the AAindex, a collection of amino acid properties and substitution matricies. You might want this package so that you can characterise certain amino acid residues, for example to compare the change in chemical properties caused by a missense mutation. Some notable features this offers are:
- Python API and CLI with JSON output
- Detailed test suite
- Parsing Expression Grammar for parsing AAindex entries that can be used independently
- Automatic handling of some idiosyncrasies of the AAindex format
- AAindex itself is not included, avoiding any breaches of the license
pip install aaindexer
Or, if you’re a super cool poetry user:
poetry add aaindexer
Scrapes a single aaindex database, and prints the result to stdout. A progress bar is shown via stderr.
aaindexer [OPTIONS] DATABASE_NUMBER
-[ Options ]-
--pretty, --no-pretty
If pretty (the default), pretty print the JSON, with newlines and indentation.
-[ Arguments ]-
DATABASE_NUMBER
Required argument
class aaindexer.AaindexRecord
A single record from a single aaindex database
accession: str
Record accession, e.g."ANDN920101"
authors: Optional[str]
Authors for the source publication, as a single string, e.g."Andersen, N.H., Cao, B. and Chen, C."
comment: Optional[str]
Additional comments
correlation: Optional[dict[str, Optional[float]]]
A dictionary of correlations between this record and others in the same database. The dictionary is indexed by the record accession number. e.g.{ "ROBB760101": 0.874, "QIAN880106": 0.846 }
description: str
Record description, as a string, e.g."alpha-CH chemical shifts (Andersen et al., 1992)"
index: Optional[dict[str, Optional[float]]]
A dictionary indexed by amino acid 1-letter codes, where the values are amino acid properties described in this record. e.g.{"A": 0.68, "R": -0.22 }
journal: Optional[str]
Journal for the source publication, e.g."Biochem. and Biophys. Res. Comm. 184, 1008-1014 (1992)"
matrix: Optional[dict[str, dict[str, Optional[float]]]]
A dictionary of dictionaries. The first and second index are both amino acid 1-letter codes, defining up a substitution matrix between the two amino acids. e.g.{ "A": { "A": 3.0 }, "R": { "A": -3.0, "R": 6.0 }
. Note that if matrix[X][Y] is not defined, then matrix[Y][X] (the reverse) will be.
pmid: Optional[str]
PubMed identifier, e.g."PMID:1575719"
title: Optional[str]
Title of the source publication, e.g."Peptide/protein structure analysis using the chemical shift index method: upfield alpha-CH values reveal dynamic helices and aL sites"
aaindexer.scrape_database(index)
Scrapes an aaindex database, and returns it as plain text
Parameters: index (int) – The number of the database to return (1-3) Returns: The aaindex database contents Return type: str
aaindexer.scrape_parse(index, progress=False)
Scrapes an aaindex database and parses the result
Parameters:
- index (int) – The number of the database to return (1-3)
- progress – If true, show progress
Return type:
Please cite the original AAindex paper if you are using this package or the raw data: Kawashima, S. & Kanehisa, M. AAindex: Amino Acid index database. Nucleic Acids Research 28, 374 (2000).
Clone the repo, and then:
poetry install
to install development dependenciespoetry run pytest test.py
to run testspoetry run sphinx-build . _build -b rst
to build the readme, then_build/index.rst README.rst
to replace the old readme