Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/workflows/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ jobs:
strategy:
max-parallel: 5
matrix:
python: [3.8, 3.9, '3.10']
python: ['3.10', '3.11', '3.12']
fail-fast: false


Expand All @@ -29,7 +29,7 @@ jobs:
sudo apt-get install libopenblas-dev # for scipy

- name: checkout git repo
uses: actions/checkout@v2
uses: actions/checkout@v4

- name: conda/mamba
uses: mamba-org/setup-micromamba@v1
Expand Down
218 changes: 179 additions & 39 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,94 +2,234 @@
.. image:: https://badge.fury.io/py/sequana-chipseq.svg
:target: https://pypi.python.org/pypi/sequana_chipseq

.. image:: https://github.com/sequana/chipseq/actions/workflows/main.yml/badge.svg
:target: https://github.com/sequana/chipseq/actions/workflows/main.yml

.. image:: https://img.shields.io/badge/python-3.10%20%7C%203.11%20%7C%203.12-blue.svg
:target: https://pypi.python.org/pypi/sequana
:alt: Python 3.10 | 3.11 | 3.12

.. image:: http://joss.theoj.org/papers/10.21105/joss.00352/status.svg
:target: http://joss.theoj.org/papers/10.21105/joss.00352
:alt: JOSS (journal of open source software) DOI

.. image:: https://github.com/sequana/chipseq/actions/workflows/main.yml/badge.svg
:target: https://github.com/sequana/chipseq/actions/

This is the **chipseq** pipeline from the `Sequana <https://sequana.readthedocs.org>`_ project.

.. image:: https://img.shields.io/badge/python-3.8%20%7C%203.9%20%7C3.10-blue.svg
:target: https://pypi.python.org/pypi/sequana
:alt: Python 3.8 | 3.9 | 3.10
:Overview: ChIP-seq pipeline from raw reads to peaks, IDR statistics, and functional annotation
:Input: Paired or single-end FastQ files and a CSV experimental design file
:Output: HTML summary report, narrow/broad peak files, IDR statistics, bigwig tracks, and annotation tables
:Status: Production
:Citation: Cokelaer et al, (2017), 'Sequana': a Set of Snakemake NGS pipelines, Journal of Open Source Software, 2(16), 352, JOSS DOI https://doi:10.21105/joss.00352


This is is the **chipseq** pipeline from the `Sequana <https://sequana.readthedocs.org>`_ project
.. image:: sequana_pipelines/chipseq/dag.png
:width: 100%

:Overview: ChIP-seq pipeline to detect peaks using IDR statistics
:Input: Set of fastq files and a design file
:Output: HTML reports and various plots and annotation files
:Status: production
:Citation: Cokelaer et al, (2017), ‘Sequana’: a Set of Snakemake NGS pipelines, Journal of Open Source Software, 2(16), 352, JOSS DOI doi:10.21105/joss.00352
.. image:: sequana_pipelines/chipseq/dag_complete.png
:width: 100%


Installation
~~~~~~~~~~~~

Just install this package using Python **pip** software::
::

pip install sequana_chipseq --upgrade

You will also need the third-party tools listed under Requirements below.


Quick Start
~~~~~~~~~~~

**1. Prepare a design file** ``design.csv``::

type,condition,replicat,sample_name
IP,EXP1,1,IP_EXP1_rep1
IP,EXP1,2,IP_EXP1_rep2
Input,EXP1,1,Input_EXP1

- ``type`` must be ``IP`` (immunoprecipitated) or ``Input`` (control).
- ``sample_name`` must match the prefix of the corresponding FastQ file
(e.g. ``IP_EXP1_rep1`` matches ``IP_EXP1_rep1_R1_.fastq.gz``).
- At least two IP replicates per condition are required for IDR analysis.

**2. Prepare a genome directory** named after the genome, containing:

- ``<name>.fa`` — reference genome FASTA
- ``<name>.gff`` or ``<name>.gff3`` — gene annotation

Example::

ecoli_MG1655/
├── ecoli_MG1655.fa
└── ecoli_MG1655.gff

**3. Set up the pipeline**::

sequana_chipseq \
--input-directory DATAPATH \
--genome-directory /path/to/ecoli_MG1655 \
--design-file design.csv

**4. Run the pipeline**::

cd chipseq
sh chipseq.sh


Usage
~~~~~

::

sequana_chipseq --help
sequana_chipseq --input-directory DATAPATH

This creates a directory with the pipeline and configuration file. You will then need
to execute the pipeline::
Key pipeline-specific options:

``--genome-directory``
Path to the genome directory (must contain ``<name>.fa`` and ``<name>.gff``).

``--design-file``
CSV experimental design file (see Quick Start above).

``--aligner-choice``
Aligner to use. Currently only ``bowtie2`` is supported.

``--blacklist-file``
BED3 file of genomic regions to exclude from analysis (tab-separated:
chromosome, start, end).

``--genome-size``
Effective genome size for macs3 peak calling. Automatically computed from
the FASTA file if not provided; override with a plain integer.

``--do-fingerprints``
Enable ``plotFingerprint`` QC to assess ChIP enrichment quality.

Run on a SLURM cluster::

cd chipseq
sh chipseq.sh # for a local run
sbatch chipseq.sh

This launch a snakemake pipeline. If you are familiar with snakemake, you can
retrieve the pipeline itself and its configuration files and then execute the pipeline yourself with specific parameters::
Or drive Snakemake directly::

snakemake -s chipseq.rules -c config.yaml --cores 4 --stats stats.txt
snakemake -s chipseq.rules --cores 4 --stats stats.txt

Or use `sequanix <https://sequana.readthedocs.io/en/main/sequanix.html>`_ interface.

Requirements
~~~~~~~~~~~~
Usage with Apptainer
~~~~~~~~~~~~~~~~~~~~~

This pipeline requires the following executable(s):
Run every tool inside pre-built containers — no local tool installation needed::

- idr This python package is not on pypi. Manual installation is required. Instructions are here:
https://github.com/nboley/idr but we also provide a singularity in https://damona.readthedocs.io
sequana_chipseq \
--input-directory DATAPATH \
--genome-directory /path/to/genome \
--design-file design.csv \
--use-apptainer

.. image:: https://raw.githubusercontent.com/sequana/chipseq/main/sequana_pipelines/chipseq/dag.png
.. image:: https://raw.githubusercontent.com/sequana/chipseq/main/sequana_pipelines/chipseq/dag_complete.png
Store images in a shared location to avoid re-downloading::

sequana_chipseq ... --use-apptainer --apptainer-prefix ~/.sequana/apptainers

Details
~~~~~~~~~
Then run as usual::

cd chipseq
sh chipseq.sh

This pipeline runs **chipseq** in parallel on the input fastq files (paired or not).
A brief sequana summary report is also produced.

Requirements
~~~~~~~~~~~~

Rules and configuration details
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The following tools must be available (install via conda/bioconda)::

mamba env create -f environment.yml

- **bowtie2** — read alignment
- **fastp** — adapter trimming and quality filtering
- **fastqc** — per-read quality control
- **samtools** — BAM sorting, indexing, and flagstat
- **deeptools** — bigwig generation (genomeCoverageBed) and fingerprint QC
- **ucsc-bedgraphtobigwig** — bedGraph to bigWig conversion
- **macs3** — narrow and broad peak calling
- **homer** — peak annotation (``annotatePeaks.pl``)
- **idr** — Irreproducibility Discovery Rate between replicates
- **multiqc** — aggregated QC report


Pipeline overview
~~~~~~~~~~~~~~~~~

1. **Trimming** — fastp removes low-quality reads and adapters.
2. **QC** — FastQC on raw and cleaned reads.
3. **Alignment** — bowtie2 maps reads to the reference genome.
4. **[Optional] Mark duplicates** — Picard marks PCR duplicates.
5. **[Optional] Blacklist removal** — bedtools removes artefact-prone regions.
6. **bigwig** — per-sample coverage tracks for genome browsers (e.g. IGV).
7. **[Optional] Fingerprints** — plotFingerprint QC to assess ChIP enrichment.
8. **Phantom peak** — strand cross-correlation analysis (NSC, RSC, Qtag scores).
9. **Peak calling** — macs3 detects narrow and broad peaks for each IP vs Input pair.
10. **FRiP** — Fraction of Reads in Peaks per sample and comparison.
11. **IDR** — Irreproducibility Discovery Rate on true replicates, pseudo-replicates, and self-pseudo-replicates.
12. **Annotation** — homer annotates peaks relative to genomic features.
13. **MultiQC** — aggregated QC across all samples.
14. **HTML report** — summary with phantom peaks, FRiP plots, IDR tables, and annotation plots.


Configuration
~~~~~~~~~~~~~

Here is the `latest documented configuration file <https://raw.githubusercontent.com/sequana/chipseq/main/sequana_pipelines/chipseq/config.yaml>`_.
Key sections:

- ``general`` — aligner choice and genome directory path
- ``fastp`` — trimming options (length, quality, adapters)
- ``fastqc`` — FastQC options and threads
- ``bowtie2_mapping`` / ``bowtie2_index`` — mapping options, threads, memory
- ``macs3`` — peak calling parameters (genome size, bandwidth, q-value, broad cutoff)
- ``idr`` — IDR thresholds, rank metric, number of pseudo-replicates
- ``fingerprints`` — enable/disable and number of bins
- ``mark_duplicates`` — enable/disable PCR duplicate marking
- ``remove_blacklist`` — enable/disable and path to BED blacklist
- ``multiqc`` — MultiQC options

Here is the `latest documented configuration file <https://raw.githubusercontent.com/sequana/chipseq/main/sequana_pipelines/chipseq/config.yaml>`_
to be used with the pipeline. Each rule used in the pipeline may have a section in the configuration file.

Changelog
~~~~~~~~~

========= ====================================================================
Version Description
========= ====================================================================
0.11.0 * switch to click and new sequana_pipetools
0.10.0 * Fix design in case of samples that starts with same prefix
0.12.0 * Fix ``plot_FRiP``: was iterating over all comparisons inside each
rule invocation causing ``FileNotFoundError`` in parallel runs;
now processes only its own wildcard
* Fix IDR rules (``idr_NT``, ``self_pseudo_replicate_idr``,
``pseudo_replicate_idr``): IDR exits non-zero on sparse data;
added ``|| true`` + conditional ``mv`` so the pipeline continues
and downstream Python rules handle empty results gracefully
peaks and Homer returns an empty DataFrame
* Fix ``fastp`` rule: use ``input.fastq`` / ``output.r1`` /
``output.r2`` to match the sequana-wrappers fastp shell interface;
split into paired/single-end branches
* Add ``log:`` directives and stderr redirection to rules that were
missing them: ``phantom_align``, ``chrom_sizes``, ``fingerprints``,
``bam_to_bed``, ``bed_to_bigwig``, ``pseudo_replicate_idr``
* Update ``sequana_tools`` container to ``26.1.14``
* Update CI: Python 3.10/3.11/3.12; ``actions/checkout@v4``
0.11.0 * Switch to click and new sequana_pipetools
0.10.0 * Fix design in case of samples that start with the same prefix
* Include final IDR plots and tables
* Fix containers and wrappers in the config file
* Better HTML report
0.9.1 * Fix requirements and setup.py (remove wrong idr package)
0.9.0 * use latest wrappers and apptainer (for rulegraph)
0.9.0 * Use latest wrappers and apptainer (for rulegraph)
0.8.0 **First release.**
========= ====================================================================


Contribute & Code of Conduct
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

To contribute to this project, please take a look at the
`Contributing Guidelines <https://github.com/sequana/sequana/blob/main/CONTRIBUTING.rst>`_ first. Please note that this project is released with a
`Code of Conduct <https://github.com/sequana/sequana/blob/main/CONDUCT.md>`_. By contributing to this project, you agree to abide by its terms.
12 changes: 6 additions & 6 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,13 @@ build-backend = "poetry.core.masonry.api"

[tool.poetry]
name = "sequana-chipseq"
version = "0.11.0"
version = "0.12.0"
description = "A ChIP-seq pipeline from raw reads to peaks"
authors = ["Sequana Team"]
license = "BSD-3"
repository = "https://github.com/sequana/rnaseq"
repository = "https://github.com/sequana/chipseq"
readme = "README.rst"
keywords = ["snakemake, sequana, RNAseq, RNADiff, differential analysis"]
keywords = ["snakemake, sequana, ChIP-seq, differential analysis"]
classifiers = [
"Development Status :: 5 - Production/Stable",
"Intended Audience :: Education",
Expand All @@ -19,9 +19,9 @@ classifiers = [
"Intended Audience :: Science/Research",
"License :: OSI Approved :: BSD License",
"Operating System :: POSIX :: Linux",
"Programming Language :: Python :: 3.8",
"Programming Language :: Python :: 3.9",
"Programming Language :: Python :: 3.10",
"Programming Language :: Python :: 3.11",
"Programming Language :: Python :: 3.12",
"Topic :: Software Development :: Libraries :: Python Modules",
"Topic :: Scientific/Engineering :: Bio-Informatics",
"Topic :: Scientific/Engineering :: Information Analysis",
Expand All @@ -33,7 +33,7 @@ packages = [


[tool.poetry.dependencies]
python = ">=3.8,<4.0"
python = ">=3.10,<4.0"
sequana = ">=0.16.0"
sequana_pipetools = ">=0.16.4"
click-completion = "^0.5.2"
Expand Down
10 changes: 5 additions & 5 deletions sequana_pipelines/chipseq/__init__.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import pkg_resources
try:
version = pkg_resources.require("sequana_chipseq")[0].version
except:
version = ">=0.8.0"
from importlib.metadata import PackageNotFoundError, version

try:
version = version("sequana-chipseq")
except PackageNotFoundError:
version = "unknown"
Loading