InBase provides a convenient pandas DataFrame of the 585 inteins in the unmaintained InBase database. The protein sequences are available as biopython SeqRecord objects, but otherwise nothing else is changed from the metadata.
InBase was collected using scrapy and can updated as detailed in the "update database" section below.
pip install --user git+
from inbase import INBASE
# See first few lines of all inteins.
# See first intein.
# Access biopython seq record information of first intein.
INBASE.ix[0, 'Intein aa Sequence']
# Count archea inteins.
INBASE['Domain of Life'].unique()
(INBASE['Domain of Life'] == 'Archaea').sum()
# Count all inteins.
Virtual environments and tests are orchestrated using tox
. Install
using pip
pip install --user tox
Make sure that ~/.local/bin
or similar is in your path per
PEP 370.
Install without tests:
tox --notest -e py27
Unfortunately scrapy
does not provide an update function to check
against the existing JSON data. One has to redownload the database,
but which only takes a few seconds. First, you will need to clone
this repository and create a "development environment" as described in
the section above. Then initialize the data environment with the
extras package:
tox --notest -e data
Check the current number of inbase records:
cat data/inbase.json | wc -l | xargs expr -2 +
Redownload the data:
rm data/inbase.json
.tox/data/bin/scrapy runspider -o data/inbase.json inbase/
Check the new number of records:
cat data/inbase.json | wc -l | xargs expr -2 +
If there indeed are more records, update your Manifest checksums, re-run the data tests and update your git repository and submit a pull request:
version=$(date +%Y%m%d.1)
sed -i -E "s#(version=').*('.+)#\1${version}\2#"
.tox/data/bin/gemato create --hashes "MD5 SHA1 SHA256" data/
tox -e data
git commit data/* -m "MAINT: Update inbase database on $(date -I)"
git push
Run all non-data tests using:
Debug failing tests:
tox --pdb
If you add dependencies and get import errors, you need to recreate the tox environment:
tox --recreate
When you edit the files, you're likely going to create lots of linter errors caught by the tox unit tests if your text editor doesn't have interactive error reporting. If you use Emacs, you can configure it for python development by installing elpy.