New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Project 2: A reproducible template workflow for single-cell DNA methylation data #2

Open
abaghela opened this Issue Aug 2, 2017 · 18 comments

Comments

Projects
None yet
6 participants
@abaghela
Contributor

abaghela commented Aug 2, 2017

A reproducible template workflow for single-cell DNA methylation data

DNA methylation is a heritable epigenetic mark that shows a strong correlation with transcriptional activity, and may be detected by whole genome bisulfite sequencing (WGBS). Recently, WGBS has been performed successfully on single cells (SC-WGBS). The resulting data represents a fundamental shift in the capacity to measure and interpret DNA methylation, especially in rare cell types and contexts where subtle cell-to-cell heterogeneity is crucial, such as in stem cells or cancer. However, although some software tools have been published, and several existing studies have tended to use similar methods, no standardized pipeline for the analysis of SC-WGBS yet exists. Simultaneously, there has been a drive within bioinformatics towards improved reproducibility. Recreating the exact results of a study requires not only the exact code, but also the exact software. Common Workflow Language (CWL) provides a framework for specifying complete workflows, while Docker allows for bundling of the exact software and auxiliary data used in an analysis within a container that can be executed anywhere. Together, these have the potential to enable completely reproducible bioinformatics research. At a previous Hackathon, the first steps were taken towards developing Screw, a collection of standard tools and workflows for analysing SC-WGBS data, wrapped in CWL and Docker. https://github.com/Epigenomics-Screw/Screw Screw will include quality control visualization, clustering and visualisation of cells by pairwise dissimilarity measures, construction of recapitulated-bulk methylomes from single cells of the same lineage, generation of bigWig methylation tracks for downstream visualization, and wrappers around published tools such as DeepCpG and LOLA. This project will focus on completing Screw, while also building standardised workflows to analyse a series of public SC-WGBS data sets. This will both provide a complete resource for reproducible SC-WGBS analysis, as well as a first metanalysis of SC-WGBS data.

Team Lead: Kieran O'Neill | koneill@bcgsc.ca | @oneillkza | Postdoctoral Fellow | BC Genome Sciences Centre

@oneillkza

This comment has been minimized.

oneillkza commented Sep 12, 2017

So ... software: we need Docker. As far as I can see, ORCA already works by loading a Docker container. It sounds like running Docker inside Docker is possible, but not recommended. Could we get some comment from the ORCA admins on the best way to be navigating this? Eg if we could deploy our own containers directly, or if ORCA supports Common Workflow Language.

The hacky, roundabout, defeating the whole purpose of the project solution would be to run without Docker, and ensure that the ORCA container has everything from our existing container, but it's also likely that we'll be updating what software we need as we go during the hackathon.

Besides that, we'd need:

  • cwltool
  • Arvados -- less crucial, but would be good to have for testing cross-compatibility
@lchong

This comment has been minimized.

lchong commented Sep 19, 2017

@sjackman Can you comment on this? Would it be possible to load a different Docker image for Kieran's team when they log onto the ORCA machines?

@sjackman

This comment has been minimized.

sjackman commented Sep 19, 2017

Hi, Kieran. cc @tmozgach

Yes, ORCA supports Common Workflow Language (CWL). It has cwltool installed. It'd be good to test it out to ensure that it works for your purpose. It does not have Arvados installed.

and ensure that the ORCA container has everything from our existing container

Here's the list of software installed on ORCA: https://github.com/bcgsc/orca/blob/master/versions.tsv
Can you check whether any software is missing?

It sounds like running Docker inside Docker is possible

We'll have to discuss this and get back to you.

@sjackman

This comment has been minimized.

sjackman commented Sep 19, 2017

@oneillkza Do you run the CWL pipeline inside a Docker container, or does your CWL pipeline launch Docker containers?

@oneillkza

This comment has been minimized.

oneillkza commented Sep 19, 2017

@sjackman it launches containers. (This is basically the default cwltool behaviour.)

In our case, it's actually one container for all of the CWL tools, hence my saying we could bundle things up in the standard ORCA container. One tricky issue is that we also bundle up the Screw codebase inside the container, so as we hack on it, we'd need to constantly update the container.

@sjackman

This comment has been minimized.

sjackman commented Sep 19, 2017

As a first pass, would try running your pipeline using cwltool inside the bcgsc/orca container, and configure cwltool not to launch any containers?

@sjackman

This comment has been minimized.

sjackman commented Sep 19, 2017

We haven't created the ORCA accounts yet for Hackseq, but we can create yours first if you'd like to give that a go.

@oneillkza

This comment has been minimized.

oneillkza commented Sep 20, 2017

Yeah, that'd be a reasonable solution -- it's easy enough to use the --no-container flag in cwltool. We can test the Docker functionality on our local machines on toy examples, and run the pipeline in anger on ORCA but using --no-container.

Re: list of software, most of this is described in the following Dockerfiles. If you could add these to the ORCA Dockerfile, that should do it!

https://github.com/Epigenomics-Screw/Screw/blob/master/docker/base/Dockerfile
https://github.com/Epigenomics-Screw/Screw/blob/master/docker/screw/Dockerfile

Thanks!

(And yes please to getting an ORCA account for pre-testing.)

@sjackman

This comment has been minimized.

sjackman commented Sep 20, 2017

Great. I've asked Brendan to create an ORCA account for you. In the mean time, you can test out the ORCA Docker image on your own hardware if you like: https://hub.docker.com/r/bcgsc/orca/
docker run -it bcgsc/orca. Note that it's a very large image, many gigs.

@sjackman

This comment has been minimized.

sjackman commented Sep 20, 2017

R is installed, but the R packages are not pre-installed. You'll have to do that yourself.
@tmozgach Please add methpipe to the ORCA image.

@tmozgach

This comment has been minimized.

tmozgach commented Sep 25, 2017

@sjackman
Should the following software be in ORCA image for hackseq?

Install nano, vim, and emacs, man-db, methpipe 
@sjackman

This comment has been minimized.

sjackman commented Sep 25, 2017

Yes, please. Thanks, Tanya.
Please also brew install less if the command less is not already in the PATH.
And bzip2 and xz if they're not already in the PATH.

@tmozgach

This comment has been minimized.

tmozgach commented Sep 25, 2017

@sjackman I will add and start to build a new image 16th of September. By this time, is that possible to ask leaders what exactly they need in terms of software or think what should we add else?

@sjackman

This comment has been minimized.

sjackman commented Sep 25, 2017

The above are all installed.

$ which less bzip2 gzip xz
/usr/bin/less
/home/linuxbrew/.linuxbrew/bin/bzip2
/bin/gzip
/home/linuxbrew/.linuxbrew/bin/xz
@sjackman

This comment has been minimized.

sjackman commented Sep 25, 2017

This issue is for Project 2. Could you please post in each of the other project issues pointing each team leader to the list of installed software, and asking if they need any software missing from that list?

@lchong

This comment has been minimized.

lchong commented Sep 25, 2017

Hi @tmozgach @sjackman

I've already asked all the team leaders to post a list of required software in their respective project issues. But I'll also start a new issue summarizing people's requests so that it's all centralized, and I'll also remind them to give feedback (not everyone has done so yet).

@lchong lchong referenced this issue Sep 25, 2017

Closed

ORCA setup #33

31 of 31 tasks complete
@sjackman

This comment has been minimized.

sjackman commented Sep 25, 2017

Thanks, Lauren!

@jakelever

This comment has been minimized.

jakelever commented Oct 10, 2017

Hey team lead ( @oneillkza ) , we've been gathering Github IDs for your team members. From your description, it sounds like you plan to use the existing Screw repo for this project. If that's the case, could you please add the people below as collaborators to that project? Or if you'd prefer, we can make a repo in the hackseq organisation and sort out membership for you.

cmorganl
klimstef
sibylgisela
jesszha
jjonphl
adammendoza

Once the people are added, it'd be a great idea to start a discussion on that repo with information to get your team members started (e.g. some small suggested reading, things to look up, etc). We will also be adding everyone to Slack and creating a specific channel for each project. This may be an easier way to communicate.

We'll forward on any remaining Github IDs through this issue.

Thanks, Jake
obo the Hackseq organising committee

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment