-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Run Diamond blastx and Megan on crab rna-seq reads #26
Comments
@kubu4 what is the status of this? |
Technically, the blasting/"MEGANizing" is finished. However, having extreme difficulties getting files opened in MEGAN GUI (required when using the free version) to extract taxonomic breakdown - the files are extremely large and MEGAN keeps crashing... At this point, I'm mostly resigned to just looking at the smallest file, which only consists of a single set of "MEGANized" reads (R1 only, not paired), and hoping we can consider this one file a "representative" data set? I've gotten it to open, but wanted to have the corresponding R2 reads. It takes a very long time (an hour or two) to open a file. So, when I attempt to open a file and then MEGAN crashes, it's a massive loss of time. Anyway, I'll get something posted here sometime on Thursday. |
What is current file output (prior to GUI) - eg does it have some taxa info
On Tue, Jun 27, 2023 at 2:26 PM kubu4 ***@***.***> wrote:
Technically, the blasting/"MEGANizing" is finished. However, having
extreme difficulties getting files opened in MEGAN GUI (required when using
the free version) to extract taxonomic breakdown - the files are extremely
large and MEGAN keeps crashing...
At this point, I'm mostly resigned to just looking at the smallest file,
which only consists of a single set of "MEGANized" reads (R1 only, not
paired), and hoping we can consider this one file a "representative" data
set? I've gotten it to open, but wanted to have the corresponding R2 reads.
It takes a very long time (an hour or two) to open a file. So, when I
attempt to open a file and then MEGAN crashes, it's a massive loss of time.
Anyway, I'll get something posted here sometime on Thursday.
—
Reply to this email directly, view it on GitHub
<https://urldefense.com/v3/__https://github.com/laurahspencer/DuMOAR/issues/26*issuecomment-1610242009__;Iw!!K-Hz7m0Vt54!hHNWMw7gSwsslrvdkUN1yB4_nx8HKyfzCFm7wF97HR87lbuMKOhUosdlwR8Q9Xd8A9S_A1e76iBGSbT2Hu35B7A$>,
or unsubscribe
<https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/ABB4PN2SRMSDQC3W4JHACC3XNNFX5ANCNFSM6AAAAAAYUSKKDA__;!!K-Hz7m0Vt54!hHNWMw7gSwsslrvdkUN1yB4_nx8HKyfzCFm7wF97HR87lbuMKOhUosdlwR8Q9Xd8A9S_A1e76iBGSbT2VuGigsg$>
.
You are receiving this because you commented.Message ID:
***@***.***>
--
Steven B. Roberts, Professor Associate Director - Graduate Program
Coordinator School of Aquatic and Fishery Sciences University of Washington
Fisheries Teaching and Research (FTR) Building - Office 232 1140 NE Boat
Street - Seattle, WA 98105 robertslab.info - ***@***.*** - @sr320
vm:206.866.5141 - cell:360.362.3626 schedule a zoom call:
https://d.pr/gsgxVJ
|
It's not a text file - it's some sort of compressed/binary format. Heres a link to the smallest file (68GB), if you want to poke around with it: |
Okay, I've looked into this a bit more. It turns out that I can convert the DAA files to RMA6 files via command line. I bring this up because it turns out that when importing the MEGANized DAA files into the MEGAN GUI, the GUI is actually converting them to RMA6 format! Gah! The RMA6 file format, unsurprisingly (since it just contains taxonomic assignment counts/data), is significantly smaller in size (like 2GB vs. 68GB). I'll get something set up to run on Mox to get these all converted. Then, I should be able to easily, and quickly(!), get them imported into the MEGAN GUI to extract taxonomic counts for all samples. |
do circles indicate prevalence? eg more Proteobacter v Arthropoda? |
Correct. Circles represent number of reads assigned within each taxa (legend in top left of that screenshot). |
Very interesting! Lots of microbes in the data eating our reads? Thanks for chasing after this.On Jun 29, 2023, at 10:33 AM, kubu4 ***@***.***> wrote:
Correct. Circles represent number of reads assigned within each taxa (legend in top left of that screenshot).
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>
|
Wow, lots of bacteria. I'm thinking about these samples and the fact that we put the whole gill filament into the tube. ..multiple filaments even because they were so small. There is a lot of surface area on those feathery gills and filaments for bacteria to hang out on. I'm wondering: 1) can MEGAN break down species any further? maybe not get to a species level but something to compare distributions of "types" of bacteria between OA and treated crabs; 2) what's up with all the not-assigned stuff? that's a big portion of the reads too. |
Definitely. I just posted a quick screencap at the Phylum level for people to glance at. It goes down to the species level.
Well, my guess is that's due to limitations of the BLAST results (which is what MEGAN is relying on). Since the data is only as good as the BLAST database (and there probably isn't a massive amount of crab sequencing data in NCBI), I'm guessing there are a LOT of matches to undefined proteins (or something similar). So, there's likely not much MEGAN can do with those sequencing reads to try to perform/predict taxonomic assignments. We could possibly try to redo the analysis with some relaxed BLAST parameters and/or MEGAN parameters to see if it helps reduced the number of unassigned reads, but it will take awhile (BLASTing/MEGANizing took ~45 days and conversion to the RMA6 file for loading into the MEGAN GUI takes ~10 days)... Another option is to extract the reads assigned to Arthropoda (and below), assemble transcriptome and then try to align the unassigned reads to the transcriptome to identify those that are likely crab sequences. Then, we'd extract those reads and use them to create an updated transcriptome. |
Great idea to explore differences in microbes in treated and untreated!
…On Tue, Jul 11, 2023 at 9:56 AM kubu4 ***@***.***> wrote:
1. can MEGAN break down species any further?
Definitely. I just posted a quick screencap at the Phylum level for people
to glance at. It goes down to the species level.
1. what's up with all the not-assigned stuff?
Well, my guess is that's due to limitations of the BLAST results (which is
what MEGAN is relying on). Since the data is only as good as the BLAST
database (and there probably isn't a massive amount of crab sequencing data
in NCBI), I'm guessing there are a LOT of matches to undefined proteins (or
something similar). So, there's likely not much MEGAN can do with those
sequencing reads to try to perform/predict taxonomic assignments.
We could possibly try to redo the analysis with some relaxed BLAST
parameters and/or MEGAN parameters to see if it helps reduced the number of
unassigned reads, but it will take awhile (BLASTing/MEGANizing took ~45
days and conversion to the RMA6 file for loading into the MEGAN GUI takes
~10 days)...
Another option is to extract the reads assigned to Arthropoda (and below),
assemble transcriptome and then try to align the unassigned reads to the
transcriptome to identify those that are likely crab sequences. Then, we'd
extract *those* reads and use them to create an updated transcriptome.
—
Reply to this email directly, view it on GitHub
<#26 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/A3RZJOGIS6NBJI6TGJHJUITXPWAVDANCNFSM6AAAAAAYUSKKDA>
.
You are receiving this because you commented.Message ID:
***@***.***>
--
...................................................................
Krista M. Nichols, PhD
Program Manager, Genetics and Evolution
Conservation Biology Division
Northwest Fisheries Science Center
NOAA, National Marine Fisheries Service
2725 Montlake Blvd E
Seattle, WA 98112
206.302.2470 (Google Voice)
|
Alrighty, I've uploaded output tables and screencaps: https://github.com/laurahspencer/DuMOAR/tree/main/results/MEGAN There is one table per FastQ. The format is:
|
added some data summary https://rpubs.com/sr320/1065657 |
Red and blue colors on sample names indicate pH treatment (red=OA, blue=ambient) Samples that we tossed during methylation analysis: Neither treatment nor weird methylation data seem to line up with % RNASeq reads identified as arthropoda. |
References RobertsLab/resources#1597
The text was updated successfully, but these errors were encountered: