SeqDB

This program simply parses through a list of .ABI files provided by the user, organize the files into a hierarchical structure, and insert the corresponding entries into a simple SQL database.

Features:

organizing

Using the database

Copy/Move the sequence files into the Input folder.
Click the "Check Quality" button to separate out the good and bad quality ones. Then click "View Uncertains" to see the files whose quality is inbetween good and bad. Judge if the sequence is good enough for giving meaningful BLASTn result. If yes, move it to the Good folder; if not, Bad folder.
Input the information about the sequence files to the fields in Step 2, and check that the information are correct.
Click the "Import Files" button. This will import the sequence files into the SQL database, create FASTA version of each sequence, and organize all the files into folders according to their Plate/Clone Alphabet. This will also produce a log file, which is equivalent to the transaction log on the interface.
If there is any exception, click the "View Exceptions" button to view those files. If any exception file is a valid trace file, then use the manual input interface to add it into the database. Otherwise, the user should remove the file.

Note: If there is any error during the import process, the error message is usually displayed in the command line window that accompanies the software window.

The database

The database is an elementary SQLite database containing two tables, along with three views.

The first table is the Sequence table, which contains most of the information about the sequence files. The other table is the Blast table, which contains only the first result of the BLASTn search on the sequence.

The three views are: LowQuality, Rerun, and Pursue.

Sequence table

ID	Plate	Clone	Primer	Run_date	Student	Instructor	Institution	Quality	FASTA	ABI	Pursue	Comment
2384	2	H5	F	2006-20-01	NLA	Bangera	Bellevue College	1	Sequence Files/FASTA/PF002/H/Pf002_H5F_2006-20-01.fasta	Sequence Files/ABI/PF002/H/Pf002_H5F_2006-20-01.abi	0

ID: Unique identifier of the sequence in the database. This information is used to correlate the sequence with its BLASTn search result in the Blast table.
Plate: The plate number of the bacteria clone.
Clone: The clone number of the bacteria clone.
Primer: The primer used for the sequencing reaction. F stands for SL1, R stands for SR2.
Run_date: The specific date when the sequence was read on the sequencing machine. When the sequencing machine is left to run overnight, it is possible that two different sequencing in that same batch will have different run date. This is because the run date extracted from the sequence file is the date of *that* individual file. For instance, both sample A and B are loaded into the sequencing machine at the same time, but sample A is sequenced on 11:30 PM of 2010-10-23, sample B is sequenced on 1:05 AM of 2010-10-24, then they will have different run date.
Student: The student who performed the procedures to sequence the clone.
Instructor: The instructor who oversee the student.
Institution: The place where the sequencing was done.
Quality: How good the sequence is. There are two possible values: 0 and 1. "1" means that the sequence is good enough for BLAST search, "0" means otherwise. Even if a record has a quality of "1", it does not mean that it will be perfect, but only that it might still give meaningful result in BLAST searches.
FASTA: Path to FASTA file of the sequence.
ABI: Path to ABI trace file of the sequence.
Pursue: Whether the sequence is selected for further investigation. "1" means yes, and "0" means no.
Comment: Any additional comment.

Blast table

ID	Genome	Organism	E_value	Query_from	Query_to	Subject_from	Subject_to	Identity	Similarity
2384	CP002585.1	Pseudomonas brassicacearum subsp. brassicacearum NFM421	1e-145	100	745	40555456	40556101	99%	F113

ID: Unique identifier for the sequence in this database. This value is used to link the BLASTn result with a sequence in the Sequence table.
Genome: Genome ID of the matching organism. The value used here is the Genebank ID, not the accession ID.
Organism: Name of the organism that this gene is found in, with the highest certainty by the BLASTn search.
E_value: Expected matching sequences in the database just by chance.
Query_from and Query_to: Location of the matching hit on the query.
Subject_from and Subject_to: Location of the hit on the subject.
Identity: The percentage similar of the query to the match.
Similarity: Other organism that this sequence is also found in.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
SeqDB		SeqDB
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
main.py		main.py
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SeqDB

Features:

Sequence table

Blast table

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SeqDB

Features:

Sequence table

Blast table

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages