This a nextflow
pipeline to map
reads from a stagger sequencing run using the sgcount
mapping tool.
Then you will need to install sgcount
and fxtools
.
These can be installed with the rust package manager cargo
, which can be
installed with the following one-liner:
You will then need to install nextflow.
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
cargo install sgcount fxtools
First make sure you have java 11 or later installed
java -version
Then download nextflow
curl -s https://get.nextflow.io | bash
This will download nextflow into your current working directory and you can validate it works with:
./nextflow run hello
You should put
nextflow
into your$PATH
. I won't describe how to do that here but there are plenty of tutorials online on how to do so.I am assuming that
nextflow
is in your$PATH
for the remainder of this tutorial.
You can clone this git repo to get a copy for each new run.
# clone the repo
git clone \
https://github.com/noamteyssier/stagger_seq_crispr_screen_nextflow \
my_sequencing_run
# enter the directory
cd my_sequencing_run
There are few things to configure, but you can make adjustments
by editing the bundled file nextflow.config
.
You will need to specify your CRISPR library path and the gene to sgRNA (g2s) file path.
You can edit the file nextflow.config
to update this.
The two variables to change are library_path
and g2s_path
.
The stagger sequencing has a constant adapter region before the variable region of the library.
This adapter's position can be considered dynamically placed with a variable number of nucleotides before it.
In the data I've seen the adapter was ACCTTGTTGG
.
However, if you have a different adapter you can update the
variable adapter
in the config to reflect that.
We then need to place our sequencing reads into the data/
directory bundled with this repo.
These are expected to be fastqs of the form data/<sample_name>_R1*.fastq.gz
.
To run the pipeline we can use the following command:
nextflow run -resume Pipeline.nf
All outputs of the pipeline will be available in the results/
directory that will be created.