Run Diamond blastx and Megan on crab rna-seq reads #26

kubu4 · 2023-05-30T21:21:24Z

sr320 · 2023-06-27T21:10:58Z

@kubu4 what is the status of this?

kubu4 · 2023-06-27T21:26:12Z

Technically, the blasting/"MEGANizing" is finished. However, having extreme difficulties getting files opened in MEGAN GUI (required when using the free version) to extract taxonomic breakdown - the files are extremely large and MEGAN keeps crashing...

At this point, I'm mostly resigned to just looking at the smallest file, which only consists of a single set of "MEGANized" reads (R1 only, not paired), and hoping we can consider this one file a "representative" data set? I've gotten it to open, but wanted to have the corresponding R2 reads.

It takes a very long time (an hour or two) to open a file. So, when I attempt to open a file and then MEGAN crashes, it's a massive loss of time.

Anyway, I'll get something posted here sometime on Thursday.

sr320 · 2023-06-27T23:00:51Z

What is current file output (prior to GUI) - eg does it have some taxa info

On Tue, Jun 27, 2023 at 2:26 PM kubu4 ***@***.***> wrote: Technically, the blasting/"MEGANizing" is finished. However, having extreme difficulties getting files opened in MEGAN GUI (required when using the free version) to extract taxonomic breakdown - the files are extremely large and MEGAN keeps crashing... At this point, I'm mostly resigned to just looking at the smallest file, which only consists of a single set of "MEGANized" reads (R1 only, not paired), and hoping we can consider this one file a "representative" data set? I've gotten it to open, but wanted to have the corresponding R2 reads. It takes a very long time (an hour or two) to open a file. So, when I attempt to open a file and then MEGAN crashes, it's a massive loss of time. Anyway, I'll get something posted here sometime on Thursday. — Reply to this email directly, view it on GitHub <https://urldefense.com/v3/__https://github.com/laurahspencer/DuMOAR/issues/26*issuecomment-1610242009__;Iw!!K-Hz7m0Vt54!hHNWMw7gSwsslrvdkUN1yB4_nx8HKyfzCFm7wF97HR87lbuMKOhUosdlwR8Q9Xd8A9S_A1e76iBGSbT2Hu35B7A$>, or unsubscribe <https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/ABB4PN2SRMSDQC3W4JHACC3XNNFX5ANCNFSM6AAAAAAYUSKKDA__;!!K-Hz7m0Vt54!hHNWMw7gSwsslrvdkUN1yB4_nx8HKyfzCFm7wF97HR87lbuMKOhUosdlwR8Q9Xd8A9S_A1e76iBGSbT2VuGigsg$> . You are receiving this because you commented.Message ID: ***@***.***>

-- Steven B. Roberts, Professor Associate Director - Graduate Program Coordinator School of Aquatic and Fishery Sciences University of Washington Fisheries Teaching and Research (FTR) Building - Office 232 1140 NE Boat Street - Seattle, WA 98105 robertslab.info - ***@***.*** - @sr320 vm:206.866.5141 - cell:360.362.3626 schedule a zoom call: https://d.pr/gsgxVJ

kubu4 · 2023-06-27T23:05:58Z

It's not a text file - it's some sort of compressed/binary format. Heres a link to the smallest file (68GB), if you want to poke around with it:

https://gannet.fish.washington.edu/Atumefaciens/20230316-mmag-diamond-meganizer-RNAseq/CH09-13.trimmed.R1.blastx.meganized.daa

kubu4 · 2023-06-29T15:34:21Z

Okay, I've looked into this a bit more. It turns out that I can convert the DAA files to RMA6 files via command line. I bring this up because it turns out that when importing the MEGANized DAA files into the MEGAN GUI, the GUI is actually converting them to RMA6 format! Gah!

The RMA6 file format, unsurprisingly (since it just contains taxonomic assignment counts/data), is significantly smaller in size (like 2GB vs. 68GB).

I'll get something set up to run on Mox to get these all converted. Then, I should be able to easily, and quickly(!), get them imported into the MEGAN GUI to extract taxonomic counts for all samples.

kubu4 · 2023-06-29T15:38:10Z

But, to tide us over until then, here's a quick visual breakdown of taxonomic assignment for the R1 reads for

sr320 · 2023-06-29T16:31:48Z

do circles indicate prevalence? eg more Proteobacter v Arthropoda?

kubu4 · 2023-06-29T16:32:52Z

Correct. Circles represent number of reads assigned within each taxa (legend in top left of that screenshot).

kristamnichols · 2023-06-29T16:36:28Z

Very interesting! Lots of microbes in the data eating our reads? Thanks for chasing after this.On Jun 29, 2023, at 10:33 AM, kubu4 ***@***.***> wrote: Correct. Circles represent number of reads assigned within each taxa (legend in top left of that screenshot). —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

mgavery · 2023-07-10T22:14:32Z

Wow, lots of bacteria. I'm thinking about these samples and the fact that we put the whole gill filament into the tube. ..multiple filaments even because they were so small. There is a lot of surface area on those feathery gills and filaments for bacteria to hang out on. I'm wondering: 1) can MEGAN break down species any further? maybe not get to a species level but something to compare distributions of "types" of bacteria between OA and treated crabs; 2) what's up with all the not-assigned stuff? that's a big portion of the reads too.

kubu4 · 2023-07-11T16:56:38Z

can MEGAN break down species any further?

Definitely. I just posted a quick screencap at the Phylum level for people to glance at. It goes down to the species level.

what's up with all the not-assigned stuff?

Well, my guess is that's due to limitations of the BLAST results (which is what MEGAN is relying on). Since the data is only as good as the BLAST database (and there probably isn't a massive amount of crab sequencing data in NCBI), I'm guessing there are a LOT of matches to undefined proteins (or something similar). So, there's likely not much MEGAN can do with those sequencing reads to try to perform/predict taxonomic assignments.

We could possibly try to redo the analysis with some relaxed BLAST parameters and/or MEGAN parameters to see if it helps reduced the number of unassigned reads, but it will take awhile (BLASTing/MEGANizing took ~45 days and conversion to the RMA6 file for loading into the MEGAN GUI takes ~10 days)...

Another option is to extract the reads assigned to Arthropoda (and below), assemble transcriptome and then try to align the unassigned reads to the transcriptome to identify those that are likely crab sequences. Then, we'd extract those reads and use them to create an updated transcriptome.

kristamnichols · 2023-07-11T17:03:30Z

Great idea to explore differences in microbes in treated and untreated!

…

On Tue, Jul 11, 2023 at 9:56 AM kubu4 ***@***.***> wrote: 1. can MEGAN break down species any further? Definitely. I just posted a quick screencap at the Phylum level for people to glance at. It goes down to the species level. 1. what's up with all the not-assigned stuff? Well, my guess is that's due to limitations of the BLAST results (which is what MEGAN is relying on). Since the data is only as good as the BLAST database (and there probably isn't a massive amount of crab sequencing data in NCBI), I'm guessing there are a LOT of matches to undefined proteins (or something similar). So, there's likely not much MEGAN can do with those sequencing reads to try to perform/predict taxonomic assignments. We could possibly try to redo the analysis with some relaxed BLAST parameters and/or MEGAN parameters to see if it helps reduced the number of unassigned reads, but it will take awhile (BLASTing/MEGANizing took ~45 days and conversion to the RMA6 file for loading into the MEGAN GUI takes ~10 days)... Another option is to extract the reads assigned to Arthropoda (and below), assemble transcriptome and then try to align the unassigned reads to the transcriptome to identify those that are likely crab sequences. Then, we'd extract *those* reads and use them to create an updated transcriptome. — Reply to this email directly, view it on GitHub <#26 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/A3RZJOGIS6NBJI6TGJHJUITXPWAVDANCNFSM6AAAAAAYUSKKDA> . You are receiving this because you commented.Message ID: ***@***.***>

-- ................................................................... Krista M. Nichols, PhD Program Manager, Genetics and Evolution Conservation Biology Division Northwest Fisheries Science Center NOAA, National Marine Fisheries Service 2725 Montlake Blvd E Seattle, WA 98112 206.302.2470 (Google Voice)

kubu4 · 2023-07-18T19:30:41Z

Alrighty, I've uploaded output tables and screencaps:

https://github.com/laurahspencer/DuMOAR/tree/main/results/MEGAN

There is one table per FastQ. The format is:

Phylum	Read Counts
Acidobacteria	1045
Aquificae	1253
Bacteroidetes	6709
Proteobacteria	99622
Chlamydiae	643
Planctomycetes	4057
Actinobacteria	4766
Firmicutes	1334
Microsporidia	32041
Chordata	211114
Nematoda	1718
Arthropoda	1187325
Mollusca	10101
Platyhelminthes	17161
Streptophyta	4859
Uroviricota	773

sr320 · 2023-07-25T03:12:14Z

added some data summary https://rpubs.com/sr320/1065657

laurahspencer · 2023-07-25T20:11:03Z

Red and blue colors on sample names indicate pH treatment (red=OA, blue=ambient)

Samples that we tossed during methylation analysis:
CH05-06, CH07-24, CH07-11, CH03-04, CH03-15, CH10-08, CH09-13, CH05-01.

Neither treatment nor weird methylation data seem to line up with % RNASeq reads identified as arthropoda.

laurahspencer · 2023-07-25T22:06:33Z

No clear correlation between % reads assigned to Arthropoda (y-axis) and genome alignment rate (x-axis).
NOTE: the % assigned to Arthropoda doesn't consider how many reads were not assigned.

kubu4 self-assigned this May 30, 2023

This was referenced Jun 29, 2023

Remove bacteria sequences from RNA-seq data #35

Closed

Assess bacteria (non-target) sequences in MBD data #36

Open

kubu4 closed this as completed Jul 18, 2023

laurahspencer mentioned this issue Jul 25, 2023

RNASeq STAR alignment rates #37

Closed

laurahspencer reopened this Jul 25, 2023

laurahspencer closed this as completed Oct 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Run Diamond blastx and Megan on crab rna-seq reads #26

Run Diamond blastx and Megan on crab rna-seq reads #26

kubu4 commented May 30, 2023

sr320 commented Jun 27, 2023

kubu4 commented Jun 27, 2023

sr320 commented Jun 27, 2023 via email

kubu4 commented Jun 27, 2023

kubu4 commented Jun 29, 2023

kubu4 commented Jun 29, 2023

sr320 commented Jun 29, 2023

kubu4 commented Jun 29, 2023

kristamnichols commented Jun 29, 2023 via email

mgavery commented Jul 10, 2023

kubu4 commented Jul 11, 2023

kristamnichols commented Jul 11, 2023 via email

kubu4 commented Jul 18, 2023

sr320 commented Jul 25, 2023

laurahspencer commented Jul 25, 2023 •

edited

Loading

laurahspencer commented Jul 25, 2023

Run Diamond blastx and Megan on crab rna-seq reads #26

Run Diamond blastx and Megan on crab rna-seq reads #26

Comments

kubu4 commented May 30, 2023

sr320 commented Jun 27, 2023

kubu4 commented Jun 27, 2023

sr320 commented Jun 27, 2023 via email

kubu4 commented Jun 27, 2023

kubu4 commented Jun 29, 2023

kubu4 commented Jun 29, 2023

sr320 commented Jun 29, 2023

kubu4 commented Jun 29, 2023

kristamnichols commented Jun 29, 2023 via email

mgavery commented Jul 10, 2023

kubu4 commented Jul 11, 2023

kristamnichols commented Jul 11, 2023 via email

kubu4 commented Jul 18, 2023

sr320 commented Jul 25, 2023

laurahspencer commented Jul 25, 2023 • edited Loading

laurahspencer commented Jul 25, 2023

laurahspencer commented Jul 25, 2023 •

edited

Loading