Digital Database of Microbial Phenotypes. Like an online Bergey's Manual.
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
README
colhdrfix.pl
ddmp.php
footnotefix.pl
import.compounds.csv
import.rec.csv
import.rec.log
import.rec.txt
importcsv.php
schema_ddmp.sql
tabit.pl
table-index.csv
table_standard.txt
tables-done.txt
unicode2not.pl
utilization.pl
volume2-tables.csv
volume2-tables.txt
volume3-tables.csv
volume3-tables.txt
volume4-tables.csv
volume4-tables.txt

README

This database is composed of phenotypic information on Bacteria and Archaea, as listed in the primary literature and Bergey's Manual of Systematic Bacteriology.
Contributors to date: R. Eric Collins

FILES:
volume2-tables.txt: Bergey's Manual Volume 2, The Proteobacteria
volume3-tables.txt: Bergey's Manual Volume 3, The Firmicutes
volume4-tables.txt: Bergey's Manual Volume 4, The Bacteroidetes, Spirochaetes, Tenericutes (Mollicutes), Acidobacteria, Fibrobacteres, Fusobacteria, Dictyoglomi, Gemmatimonadetes, Lentisphaerae, Verrucomicrobia, Chlamydiae, and Planctomycetes

NOTES:
Volume 1, the Archaea and Deeply-Branching Bacteria, is not available in digital format
Volume 5, the Actinobacteria, will be released in early 2012

PIPELINE:
 - get text out of PDFs
 -- pdftotext -layout -nopgbrk -enc UTF-8 <file>

 - get tables out of text
 -- grep -h -A 100 -B 1 -e 'TABLE' <file>

 - decode messed up Unicode in Volume 2
 -- unicode2not.pl <file>

 - fix whitespace
 -- tabit.pl <file>

 - fix formatting
 -- manually in Kate using Block Selection mode
 -- examples include cleaning up 1-column tables, multi-page tables
 -- multi-line captions were replaced using regex:
 --- Find: (TABLE.*)\n([a-z]{2,}.*)\s*
 --- Replace: \1 \2