sequana · cokelaer · Apr 1, 2026 · Apr 1, 2026
diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml
@@ -17,7 +17,7 @@ jobs:
     strategy:
       max-parallel: 5
       matrix:
-        python: [3.8, 3.9, '3.10']
+        python: ['3.10', '3.11', '3.12']
       fail-fast: false
 
 
@@ -29,7 +29,7 @@ jobs:
         sudo apt-get install libopenblas-dev # for scipy
 
     - name: checkout git repo
-      uses: actions/checkout@v2
+      uses: actions/checkout@v4
 
     - name: conda/mamba
       uses: mamba-org/setup-micromamba@v1

diff --git a/README.rst b/README.rst
@@ -2,94 +2,234 @@
 .. image:: https://badge.fury.io/py/sequana-chipseq.svg
      :target: https://pypi.python.org/pypi/sequana_chipseq
 
+.. image:: https://github.com/sequana/chipseq/actions/workflows/main.yml/badge.svg
+   :target: https://github.com/sequana/chipseq/actions/workflows/main.yml
+
+.. image:: https://img.shields.io/badge/python-3.10%20%7C%203.11%20%7C%203.12-blue.svg
+    :target: https://pypi.python.org/pypi/sequana
+    :alt: Python 3.10 | 3.11 | 3.12
+
 .. image:: http://joss.theoj.org/papers/10.21105/joss.00352/status.svg
     :target: http://joss.theoj.org/papers/10.21105/joss.00352
     :alt: JOSS (journal of open source software) DOI
 
-.. image:: https://github.com/sequana/chipseq/actions/workflows/main.yml/badge.svg
-   :target: https://github.com/sequana/chipseq/actions/
-
+This is the **chipseq** pipeline from the `Sequana <https://sequana.readthedocs.org>`_ project.
 
-.. image:: https://img.shields.io/badge/python-3.8%20%7C%203.9%20%7C3.10-blue.svg
-    :target: https://pypi.python.org/pypi/sequana
-    :alt: Python 3.8 | 3.9 | 3.10
+:Overview: ChIP-seq pipeline from raw reads to peaks, IDR statistics, and functional annotation
+:Input: Paired or single-end FastQ files and a CSV experimental design file
+:Output: HTML summary report, narrow/broad peak files, IDR statistics, bigwig tracks, and annotation tables
+:Status: Production
+:Citation: Cokelaer et al, (2017), 'Sequana': a Set of Snakemake NGS pipelines, Journal of Open Source Software, 2(16), 352, JOSS DOI https://doi:10.21105/joss.00352
 
 
-This is is the **chipseq** pipeline from the `Sequana <https://sequana.readthedocs.org>`_ project
+.. image:: sequana_pipelines/chipseq/dag.png
+   :width: 100%
 
-:Overview: ChIP-seq pipeline to detect peaks using IDR statistics
-:Input: Set of fastq files and a design file
-:Output: HTML reports and various plots and annotation files
-:Status: production
-:Citation: Cokelaer et al, (2017), ‘Sequana’: a Set of Snakemake NGS pipelines, Journal of Open Source Software, 2(16), 352, JOSS DOI doi:10.21105/joss.00352
+.. image:: sequana_pipelines/chipseq/dag_complete.png
+   :width: 100%
 
 
 Installation
 ~~~~~~~~~~~~
 
-Just install this package using Python **pip** software::
+::
 
     pip install sequana_chipseq --upgrade
 
+You will also need the third-party tools listed under Requirements below.
+
+
+Quick Start
+~~~~~~~~~~~
+
+**1. Prepare a design file** ``design.csv``::
+
+    type,condition,replicat,sample_name
+    IP,EXP1,1,IP_EXP1_rep1
+    IP,EXP1,2,IP_EXP1_rep2
+    Input,EXP1,1,Input_EXP1
+
+- ``type`` must be ``IP`` (immunoprecipitated) or ``Input`` (control).
+- ``sample_name`` must match the prefix of the corresponding FastQ file
+  (e.g. ``IP_EXP1_rep1`` matches ``IP_EXP1_rep1_R1_.fastq.gz``).
+- At least two IP replicates per condition are required for IDR analysis.
+
+**2. Prepare a genome directory** named after the genome, containing:
+
+- ``<name>.fa`` — reference genome FASTA
+- ``<name>.gff`` or ``<name>.gff3`` — gene annotation
+
+Example::
+
+    ecoli_MG1655/
+    ├── ecoli_MG1655.fa
+    └── ecoli_MG1655.gff
+
+**3. Set up the pipeline**::
+
+    sequana_chipseq \
+        --input-directory DATAPATH \
+        --genome-directory /path/to/ecoli_MG1655 \
+        --design-file design.csv
+
+**4. Run the pipeline**::
+
+    cd chipseq
+    sh chipseq.sh
+
 
 Usage
 ~~~~~
 
 ::
 
     sequana_chipseq --help
-    sequana_chipseq --input-directory DATAPATH
 
-This creates a directory with the pipeline and configuration file. You will then need
-to execute the pipeline::
+Key pipeline-specific options:
+
+``--genome-directory``
+    Path to the genome directory (must contain ``<name>.fa`` and ``<name>.gff``).
+
+``--design-file``
+    CSV experimental design file (see Quick Start above).
+
+``--aligner-choice``
+    Aligner to use. Currently only ``bowtie2`` is supported.
+
+``--blacklist-file``
+    BED3 file of genomic regions to exclude from analysis (tab-separated:
+    chromosome, start, end).
+
+``--genome-size``
+    Effective genome size for macs3 peak calling. Automatically computed from
+    the FASTA file if not provided; override with a plain integer.
+
+``--do-fingerprints``
+    Enable ``plotFingerprint`` QC to assess ChIP enrichment quality.
+
+Run on a SLURM cluster::
 
     cd chipseq
-    sh chipseq.sh  # for a local run
+    sbatch chipseq.sh
 
-This launch a snakemake pipeline. If you are familiar with snakemake, you can
-retrieve the pipeline itself and its configuration files and then execute the pipeline yourself with specific parameters::
+Or drive Snakemake directly::
 
-    snakemake -s chipseq.rules -c config.yaml --cores 4 --stats stats.txt
+    snakemake -s chipseq.rules --cores 4 --stats stats.txt
 
-Or use `sequanix <https://sequana.readthedocs.io/en/main/sequanix.html>`_ interface.
 
-Requirements
-~~~~~~~~~~~~
+Usage with Apptainer
+~~~~~~~~~~~~~~~~~~~~~
 
-This pipeline requires the following executable(s):
+Run every tool inside pre-built containers — no local tool installation needed::
 
-- idr This python package is not on pypi. Manual installation is required. Instructions are here:
-https://github.com/nboley/idr but we also provide a singularity in https://damona.readthedocs.io
+    sequana_chipseq \
+        --input-directory DATAPATH \
+        --genome-directory /path/to/genome \
+        --design-file design.csv \
+        --use-apptainer
 
-.. image:: https://raw.githubusercontent.com/sequana/chipseq/main/sequana_pipelines/chipseq/dag.png
-.. image:: https://raw.githubusercontent.com/sequana/chipseq/main/sequana_pipelines/chipseq/dag_complete.png
+Store images in a shared location to avoid re-downloading::
 
+    sequana_chipseq ... --use-apptainer --apptainer-prefix ~/.sequana/apptainers
 
-Details
-~~~~~~~~~
+Then run as usual::
+
+    cd chipseq
+    sh chipseq.sh
 
-This pipeline runs **chipseq** in parallel on the input fastq files (paired or not).
-A brief sequana summary report is also produced.
 
+Requirements
+~~~~~~~~~~~~
 
-Rules and configuration details
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+The following tools must be available (install via conda/bioconda)::
+
+    mamba env create -f environment.yml
+
+- **bowtie2** — read alignment
+- **fastp** — adapter trimming and quality filtering
+- **fastqc** — per-read quality control
+- **samtools** — BAM sorting, indexing, and flagstat
+- **deeptools** — bigwig generation (genomeCoverageBed) and fingerprint QC
+- **ucsc-bedgraphtobigwig** — bedGraph to bigWig conversion
+- **macs3** — narrow and broad peak calling
+- **homer** — peak annotation (``annotatePeaks.pl``)
+- **idr** — Irreproducibility Discovery Rate between replicates
+- **multiqc** — aggregated QC report
+
+
+Pipeline overview
+~~~~~~~~~~~~~~~~~
+
+1. **Trimming** — fastp removes low-quality reads and adapters.
+2. **QC** — FastQC on raw and cleaned reads.
+3. **Alignment** — bowtie2 maps reads to the reference genome.
+4. **[Optional] Mark duplicates** — Picard marks PCR duplicates.
+5. **[Optional] Blacklist removal** — bedtools removes artefact-prone regions.
+6. **bigwig** — per-sample coverage tracks for genome browsers (e.g. IGV).
+7. **[Optional] Fingerprints** — plotFingerprint QC to assess ChIP enrichment.
+8. **Phantom peak** — strand cross-correlation analysis (NSC, RSC, Qtag scores).
+9. **Peak calling** — macs3 detects narrow and broad peaks for each IP vs Input pair.
+10. **FRiP** — Fraction of Reads in Peaks per sample and comparison.
+11. **IDR** — Irreproducibility Discovery Rate on true replicates, pseudo-replicates, and self-pseudo-replicates.
+12. **Annotation** — homer annotates peaks relative to genomic features.
+13. **MultiQC** — aggregated QC across all samples.
+14. **HTML report** — summary with phantom peaks, FRiP plots, IDR tables, and annotation plots.
+
+
+Configuration
+~~~~~~~~~~~~~
+
+Here is the `latest documented configuration file <https://raw.githubusercontent.com/sequana/chipseq/main/sequana_pipelines/chipseq/config.yaml>`_.
+Key sections:
+
+- ``general`` — aligner choice and genome directory path
+- ``fastp`` — trimming options (length, quality, adapters)
+- ``fastqc`` — FastQC options and threads
+- ``bowtie2_mapping`` / ``bowtie2_index`` — mapping options, threads, memory
+- ``macs3`` — peak calling parameters (genome size, bandwidth, q-value, broad cutoff)
+- ``idr`` — IDR thresholds, rank metric, number of pseudo-replicates
+- ``fingerprints`` — enable/disable and number of bins
+- ``mark_duplicates`` — enable/disable PCR duplicate marking
+- ``remove_blacklist`` — enable/disable and path to BED blacklist
+- ``multiqc`` — MultiQC options
 
-Here is the `latest documented configuration file <https://raw.githubusercontent.com/sequana/chipseq/main/sequana_pipelines/chipseq/config.yaml>`_
-to be used with the pipeline. Each rule used in the pipeline may have a section in the configuration file.
 
 Changelog
 ~~~~~~~~~
 
 ========= ====================================================================
 Version   Description
 ========= ====================================================================
-0.11.0    * switch to click and new sequana_pipetools
-0.10.0    * Fix design in case of samples that starts with same prefix
+0.12.0    * Fix ``plot_FRiP``: was iterating over all comparisons inside each
+            rule invocation causing ``FileNotFoundError`` in parallel runs;
+            now processes only its own wildcard
+          * Fix IDR rules (``idr_NT``, ``self_pseudo_replicate_idr``,
+            ``pseudo_replicate_idr``): IDR exits non-zero on sparse data;
+            added ``|| true`` + conditional ``mv`` so the pipeline continues
+            and downstream Python rules handle empty results gracefully
+            peaks and Homer returns an empty DataFrame
+          * Fix ``fastp`` rule: use ``input.fastq`` / ``output.r1`` /
+            ``output.r2`` to match the sequana-wrappers fastp shell interface;
+            split into paired/single-end branches
+          * Add ``log:`` directives and stderr redirection to rules that were
+            missing them: ``phantom_align``, ``chrom_sizes``, ``fingerprints``,
+            ``bam_to_bed``, ``bed_to_bigwig``, ``pseudo_replicate_idr``
+          * Update ``sequana_tools`` container to ``26.1.14``
+          * Update CI: Python 3.10/3.11/3.12; ``actions/checkout@v4``
+0.11.0    * Switch to click and new sequana_pipetools
+0.10.0    * Fix design in case of samples that start with the same prefix
           * Include final IDR plots and tables
           * Fix containers and wrappers in the config file
           * Better HTML report
 0.9.1     * Fix requirements and setup.py (remove wrong idr package)
-0.9.0     * use latest wrappers and apptainer (for rulegraph)
+0.9.0     * Use latest wrappers and apptainer (for rulegraph)
 0.8.0     **First release.**
 ========= ====================================================================
+
+
+Contribute & Code of Conduct
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+To contribute to this project, please take a look at the
+`Contributing Guidelines <https://github.com/sequana/sequana/blob/main/CONTRIBUTING.rst>`_ first. Please note that this project is released with a
+`Code of Conduct <https://github.com/sequana/sequana/blob/main/CONDUCT.md>`_. By contributing to this project, you agree to abide by its terms.
diff --git a/pyproject.toml b/pyproject.toml
@@ -4,13 +4,13 @@ build-backend = "poetry.core.masonry.api"
 
 [tool.poetry]
 name = "sequana-chipseq"
-version = "0.11.0"
+version = "0.12.0"
 description = "A ChIP-seq pipeline from raw reads to peaks"
 authors = ["Sequana Team"]
 license = "BSD-3"
-repository = "https://github.com/sequana/rnaseq"
+repository = "https://github.com/sequana/chipseq"
 readme = "README.rst"
-keywords = ["snakemake, sequana, RNAseq, RNADiff, differential analysis"]
+keywords = ["snakemake, sequana, ChIP-seq, differential analysis"]
 classifiers = [
         "Development Status :: 5 - Production/Stable",
         "Intended Audience :: Education",
@@ -19,9 +19,9 @@ classifiers = [
         "Intended Audience :: Science/Research",
         "License :: OSI Approved :: BSD License",
         "Operating System :: POSIX :: Linux",
-        "Programming Language :: Python :: 3.8",
-        "Programming Language :: Python :: 3.9",
         "Programming Language :: Python :: 3.10",
+        "Programming Language :: Python :: 3.11",
+        "Programming Language :: Python :: 3.12",
         "Topic :: Software Development :: Libraries :: Python Modules",
         "Topic :: Scientific/Engineering :: Bio-Informatics",
         "Topic :: Scientific/Engineering :: Information Analysis",
@@ -33,7 +33,7 @@ packages = [
 
 
 [tool.poetry.dependencies]
-python = ">=3.8,<4.0"
+python = ">=3.10,<4.0"
 sequana = ">=0.16.0"
 sequana_pipetools = ">=0.16.4"
 click-completion = "^0.5.2"

diff --git a/sequana_pipelines/chipseq/__init__.py b/sequana_pipelines/chipseq/__init__.py
@@ -1,6 +1,6 @@
-import pkg_resources
-try:
-    version = pkg_resources.require("sequana_chipseq")[0].version
-except:
-    version = ">=0.8.0"
+from importlib.metadata import PackageNotFoundError, version
 
+try:
+    version = version("sequana-chipseq")
+except PackageNotFoundError:
+    version = "unknown"