acidoseq

Studying Acidobacteria reads from a Nanopore metagenomic data-set | Python v3.5 | PyPI (see version)

Author Samantha C Pendleton, Data Science MSc Aberystwyth University, Twitter | GitHub

Follow the Twitter bot I created, acido_bot, that dispenses daily facts about Acidobacteria!

The GC content of the Acidobacteria genomes are consistent with their placements, e.g. species in the same subdivision (above 60% for group V fragments and roughly 10% lower for group III fragments) are similar, displaying the diversity within the phylum [1]. The abundance of the subdivisions correlate with pH depends on the subdivisions: 1, 2, 3, 12, 13 have a negative relationship as pH increases, whilst 4, 6, 7, 10, 11, 16, 17, 18, 22, 25 are sparse in low pH and have a positive relationship as pH increases [2].

This package includes studying a collection of reads and gathering the ones assigned as Acidobacteria from a Kaiju output. There are various statistical information and GC plots. Futhermore, the group of unclassified Acidobacteria reads are visualised into subdivisons based on the pH level of the soil sample.

Introduction

Kaiju output provides taxon ID and the corredponding sequence, my package outputs the Acidobacteria species alongside annotation, plots, and information on the unclassified reads.

Prerequisite

FASTA format of all the reads.
Kaiju output after extracting the two columns: sequence ID and NCBI taxIDs.

Dependencies

import os
import csv                                                                                                        
import pysam  
import collections
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
import random
from termcolor import colored
from colorama import init 
import click

$ pip3 install matplotlib

Installation

GitClone

$ git clone https://github.com/sap218/acidoseq.git

pip

$ pip install acidoseq

Kaiju

I used the Kaiju output: columns 2 and 3 which included sequence references and the NCBI taxons.

Filter the output with only classified labels $ awk '$1 == "C"' kaiju.out > kaijuC.out
Cut the columns $ cut -f2,3 kaijuC.out > results.txt
Converted the txt to csv (comma-delimted) $ sed 's/\s\+/,/g' results.txt > result_seqid_taxon.csv

Map

If you are unsure of the pH of your soil samples, you may want to use the map script first - default city is Aberystwyth.

Please note: due to the fact that the Earth is spherical and maps are 2-dimensional, there will be some distortion when plotting locations.

$ acidomap --city Birmingham

Usage

CLI needs the Kaiju and FASTA file, all other options have defaults: e.g. pH = 5.

If no plot style was provided, or entered incorrectly, it will choose a random one.

Run like followed with Linux (find how to run with other operating systems here):

$ acidoseq --help
Usage: acidoseq [OPTIONS]

Options:
  --taxdumptype TEXT  Study "ALL" or only unclassified "U"?
  --kaijufile TEXT    Place edited Kaiju (csv) in directory for ease.
  --fastapath TEXT    Place FASTA in directory for ease.
  --style TEXT        ['seaborn-bright', 'seaborn-poster', 'seaborn-white',
                      'bmh', 'seaborn-darkgrid', 'seaborn-pastel',
                      'grayscale', '_classic_test', 'ggplot', 'seaborn-
                      whitegrid', 'seaborn-dark', 'seaborn-muted', 'seaborn-
                      colorblind', 'seaborn-ticks', 'Solarize_Light2',
                      'seaborn-notebook', 'dark_background', 'fast',
                      'seaborn', 'fivethirtyeight', 'seaborn-paper', 'seaborn-
                      dark-palette', 'seaborn-talk', 'classic', 'seaborn-
                      deep']
  --plottype TEXT     "span" range of GC means OR "line" average mean GC
  --ph TEXT           pH of soil, use map script for assistance.
  --help              Show this message and exit.

Examples

$ acidoseq --kaijufile result_seqid_taxon.csv --fastapath all.fa

$ acidoseq --taxdumptype ALL --kaijufile result_seqid_taxon.csv --fastapath all.fa --style ggplot --plottype span --ph 4.92

$ acidoseq --taxdumptype U --kaijufile result_seqid_taxon.csv --fastapath all.fa --style seaborn --plottype line --ph 7.14

Output

FASTA file: a collection of reads which were identified as Acidobacteria
Plot of AT and GC ratio comparison with means
Indepth plot of GC ratio with subdivisions labelled (regions with 'span' and means with 'line')
Separate FASTA files of the unclassified reads assigned into subdivisions based on the pH, e.g. a file of sequences which reside in the subdivison 1 GC span if the pH is low

Acknowledgements

Amanda Clare, senior lecturer, MSc supervisor at Aberystwyth University, Twitter | GitHub | Staff Profile
Sam Nicholls, postdoc at University of Birmingham, Twitter | GitHub
Arwyn Edwards, senior lecturer at Aberystwyth University, provided the data-set, Twitter | Staff Profile

Thank you! 🌱

Don't hesitate to create an issue or make a suggestion!

Todo List

Make available
Improve descriptions and comments
Look into command line interface
Fix code to output unclassified subdivisions based on pH
Alter code so the input file can be the original Kaiju output
Make available on Conda

References

[1] Quaiser, A., Ochsenreiter, T., Lanz, C., Schuster, S. C., Treusch, A. H., Eck, J., & Schleper, C. (2003). Acidobacteria form a coherent but highly diverse group within the bacterial domain: evidence from environmental genomics. Molecular microbiology, 50(2), 563-575.

[2] Eichorst, S. A., Breznak, J. A., & Schmidt, T. M. (2007). Isolation and characterization of soil bacteria that define Terriglobus gen. nov., in the phylum Acidobacteria. Applied and environmental microbiology, 73(8), 2708-2717.

Name		Name	Last commit message	Last commit date
Latest commit History 97 Commits
acidoseq		acidoseq
map		map
.gitignore		.gitignore
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

acidoseq

Introduction

Prerequisite

Dependencies

Installation

Map

Usage

Examples

Acknowledgements

Thank you! 🌱

Todo List

References

About

Uh oh!

Uh oh!

Languages

License

sap218/acidoseq

Folders and files

Latest commit

History

Repository files navigation

acidoseq

Introduction

Prerequisite

Dependencies

Installation

Map

Usage

Examples

Acknowledgements

Thank you! 🌱

Todo List

References

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Languages