Skip to content

kietjohn/SeqDB

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SeqDB

This program simply parses through a list of .ABI files provided by the user, organize the files into a hierarchical structure, and insert the corresponding entries into a simple SQL database.

Features:

  • organizing

Using the database

  1. Copy/Move the sequence files into the Input folder.
  2. Click the "Check Quality" button to separate out the good and bad quality ones. Then click "View Uncertains" to see the files whose quality is inbetween good and bad. Judge if the sequence is good enough for giving meaningful BLASTn result. If yes, move it to the Good folder; if not, Bad folder.
  3. Input the information about the sequence files to the fields in Step 2, and check that the information are correct.
  4. Click the "Import Files" button. This will import the sequence files into the SQL database, create FASTA version of each sequence, and organize all the files into folders according to their Plate/Clone Alphabet. This will also produce a log file, which is equivalent to the transaction log on the interface.
  5. If there is any exception, click the "View Exceptions" button to view those files. If any exception file is a valid trace file, then use the manual input interface to add it into the database. Otherwise, the user should remove the file.

Note: If there is any error during the import process, the error message is usually displayed in the command line window that accompanies the software window.

The database

The database is an elementary SQLite database containing two tables, along with three views.

The first table is the Sequence table, which contains most of the information about the sequence files. The other table is the Blast table, which contains only the first result of the BLASTn search on the sequence.

The three views are: LowQuality, Rerun, and Pursue.

Sequence table

ID Plate Clone Primer Run_date Student Instructor Institution Quality FASTA ABI Pursue Comment
2384 2 H5 F 2006-20-01 NLA Bangera Bellevue College 1 Sequence Files/FASTA/PF002/H/Pf002_H5F_2006-20-01.fasta Sequence Files/ABI/PF002/H/Pf002_H5F_2006-20-01.abi 0
ID
Unique identifier of the sequence in the database. This information is used to correlate the sequence with its BLASTn search result in the Blast table.
Plate
The plate number of the bacteria clone.
Clone
The clone number of the bacteria clone.
Primer
The primer used for the sequencing reaction. F stands for SL1, R stands for SR2.
Run_date
The specific date when the sequence was read on the sequencing machine. When the sequencing machine is left to run overnight, it is possible that two different sequencing in that same batch will have different run date. This is because the run date extracted from the sequence file is the date of *that* individual file. For instance, both sample A and B are loaded into the sequencing machine at the same time, but sample A is sequenced on 11:30 PM of 2010-10-23, sample B is sequenced on 1:05 AM of 2010-10-24, then they will have different run date.
Student
The student who performed the procedures to sequence the clone.
Instructor
The instructor who oversee the student.
Institution
The place where the sequencing was done.
Quality
How good the sequence is. There are two possible values: 0 and 1. "1" means that the sequence is good enough for BLAST search, "0" means otherwise. Even if a record has a quality of "1", it does not mean that it will be perfect, but only that it might still give meaningful result in BLAST searches.
FASTA
Path to FASTA file of the sequence.
ABI
Path to ABI trace file of the sequence.
Pursue
Whether the sequence is selected for further investigation. "1" means yes, and "0" means no.
Comment
Any additional comment.

Blast table

ID Genome Organism E_value Query_from Query_to Subject_from Subject_to Identity Similarity
2384 CP002585.1 Pseudomonas brassicacearum subsp. brassicacearum NFM421 1e-145 100 745 40555456 40556101 99% F113
ID
Unique identifier for the sequence in this database. This value is used to link the BLASTn result with a sequence in the Sequence table.
Genome
Genome ID of the matching organism. The value used here is the Genebank ID, not the accession ID.
Organism
Name of the organism that this gene is found in, with the highest certainty by the BLASTn search.
E_value
Expected matching sequences in the database just by chance.
Query_from and Query_to
Location of the matching hit on the query.
Subject_from and Subject_to
Location of the hit on the subject.
Identity
The percentage similar of the query to the match.
Similarity
Other organism that this sequence is also found in.

About

A simple database for managing sequencing data

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages