Skip to content

Conversation

@bebatut
Copy link
Member

@bebatut bebatut commented Dec 8, 2025

In complement of the MAGs learning pathway, this tutorial takes raw reads from a publication and, using workflows from IWC:

  • Preprocess the reads (QC, host & contamination removal)
  • Build, Refine, and Annotate Metagenome-Assembled Genomes (MAGs)

Each step is explained (with a link to the corresponding dedicated tutorial for a more advanced explanation), and the results are commented on.

I prepared histories with preprocessed data on UseGalaxy.eu, UseGalaxy.org, UseGalaxy.org.au, and UseGalaxy.fr.

@bebatut bebatut force-pushed the mags-building branch 2 times, most recently from f1be644 to cc01163 Compare December 8, 2025 15:48
Copy link
Collaborator

@paulzierep paulzierep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to continue with the main tutorial later.


7. **Recommended**: click **Import** (left of Run) to make your own local copy under *Workflows / My Workflows*.

You may have to refresh your history to see the queued jobs
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not needed anymore ?

@@ -0,0 +1,44 @@
# Generate input datasets for the training
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs a update, now its just two samples, no preprocessing, right ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This Markdown is not for learners. It was meant to be as a documentation on how that data were generated

Copy link
Collaborator

@paulzierep paulzierep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing work, if you could address the comments still, I think we should merge it for now !

> > Forward (Read 1) - Before | 34.5 | 34.6
> > Forward (Read 1) - After | 34.5 | 34.6
> > Reverse (Read 2) - Before | 33.4 | 32.7
> > Reverse (Read 2) - After | 33.4 | 32.7
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe a comment why this does not change ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess they deposited trimmed reads

> > <solution-title></solution-title>
> >
> > 1. 2.8% for SRR24759598 and 1.1% for SRR24759616
> > 2. 2.8% for SRR24759598 and 1.1% for SRR24759616
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not the number of reads ... but if we take percentage the question is a bit useless no ?

> > <comment-title></comment-title>
> > metaSPAdes is an alternative assembler.
> >
> > MEGAHIT is less computationally intensive and generate higher quality single and shorter contigs but shorter. metaSPAdes is very computationally intensive, but generates longer/more complete assemblies.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

strange sentence, and maybe point to https://doi.org/10.1093/bib/bbad087 for a benchmark


Beyond simply comparing the total number of bins, we can also examine the **contigs per bin** for each binning tool, which provides deeper insight into the **quality and granularity** of the reconstructed microbial genomes.

For that, we will use the `collection X, collection Y, and others (as list)` collection of collection. This structure contains two sub-collections—one for each sample. Within each sub-collection, there are four tables, each corresponding to a different binning tool (MetaBAT2, MaxBin2, SemiBin, and CONCOCT). Each table consists of two columns: the contig identifier and its assigned bin ID.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For that, we will use the collection X, collection Y, and others (as list) collection of collection. - not clear

@paulzierep paulzierep merged commit d1574c1 into galaxyproject:main Dec 18, 2025
3 checks passed
@paulzierep
Copy link
Collaborator

thanks @bebatut

@bebatut bebatut deleted the mags-building branch December 18, 2025 12:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants