Skip to content

New module: clustering#11337

Closed
dbaku42 wants to merge 25 commits into
nf-core:masterfrom
dbaku42:add-clustering-module
Closed

New module: clustering#11337
dbaku42 wants to merge 25 commits into
nf-core:masterfrom
dbaku42:add-clustering-module

Conversation

@dbaku42
Copy link
Copy Markdown

@dbaku42 dbaku42 commented Apr 28, 2026

Description

New module: clustering

Performs KMeans or DBSCAN clustering on principal components from PLINK2 --pca (.eigenvec file).

Features

  • Supports both kmeans and dbscan algorithms via parameters
  • Uses conda environment (environment.yml)
  • Outputs: sample-to-cluster assignment (*_clusters.csv) and metadata (*_clustering_info.json)
  • Full nf-test coverage (normal + stub test)

Author

Checklist

  • nf-core modules lint clustering → passed
  • nf-core modules test clustering → passed (module only)
  • Subworkflow snpclustering still uses custom container (will be updated separately)

Closes # (if you have an issue)


This PR adds a new standalone module for population clustering.

jfy133 and others added 22 commits April 29, 2026 16:09
* Specify more guidelines on input channels

* Linting

* Updates based on code review

* Update README.md

* Fix broken sentence
Co-authored-by: James A. Fellows Yates <jfy133@gmail.com>
* add umicollapse

* initial clippy module commit

* find working test data

* complete test

* prettier

* add a doi

* Delete main.nf

* Delete meta.yml

* Delete main.nf

* Delete nextflow.config

* Delete test.yml

* remove umicollapse from pytest

* rename gtf to include that its from gencode

* add test for clippy intergenic mode

* remove quay.io
* initial trycycler subsample commit

* Update modules/nf-core/trycycler/subsample/main.nf

to resolve issue with nf-test and empty gzipped files

Co-authored-by: Simon Pearce <24893913+SPPearce@users.noreply.github.com>

* addressing comments on initial commit, fixed version number, added stub tests

---------

Co-authored-by: Simon Pearce <24893913+SPPearce@users.noreply.github.com>
…index files (nf-core#10319)

* Update modules

* Fix meta

* Update samtools stub
* Specify more guidelines on input channels

* Linting

* Updates based on code review

* Update README.md

* Fix broken sentence
Co-authored-by: James A. Fellows Yates <jfy133@gmail.com>
* add umicollapse

* initial clippy module commit

* find working test data

* complete test

* prettier

* add a doi

* Delete main.nf

* Delete meta.yml

* Delete main.nf

* Delete nextflow.config

* Delete test.yml

* remove umicollapse from pytest

* rename gtf to include that its from gencode

* add test for clippy intergenic mode

* remove quay.io
* initial trycycler subsample commit

* Update modules/nf-core/trycycler/subsample/main.nf

to resolve issue with nf-test and empty gzipped files

Co-authored-by: Simon Pearce <24893913+SPPearce@users.noreply.github.com>

* addressing comments on initial commit, fixed version number, added stub tests

---------

Co-authored-by: Simon Pearce <24893913+SPPearce@users.noreply.github.com>
…index files (nf-core#10319)

* Update modules

* Fix meta

* Update samtools stub
- Added flashpca module using custom container ghcr.io/dbaku42/flashpca:2.0
- Fixed file handling for PLINK inputs inside Docker (cp -L + fixed basename)
- Updated nf-test to use proper input tuple
- Test now passes with --profile docker
…ate environment/meta files

- Added containerOptions '--entrypoint ""' to fix argument parsing in Docker
- Separated environment.yml (Conda config only) from meta.yml (module documentation)
- Hardcoded version to 2.0 for snapshot stability
- Fixed dependencies list in environment.yml to prevent linting TypeError
- Remove name field from environment.yml
- Fix meta.yml output structure to match schema
- Update versions output to tuple with meta for topic compatibility
…abling autoMounts

- Add tests/config/nextflow.config with Docker profile
- Update tests/main.nf.test
- Improve container declaration and version capture in main.nf
- Address mount parsing error on GitHub Actions
Performs KMeans or DBSCAN clustering on PLINK2 --pca .eigenvec files.
- Supports both KMeans and DBSCAN
- Uses conda environment
- Includes nf-test with snapshot testing
@dbaku42 dbaku42 force-pushed the add-clustering-module branch from 0701e49 to 54bc054 Compare April 29, 2026 14:10
@dbaku42 dbaku42 force-pushed the add-clustering-module branch from 54bc054 to 9e6f155 Compare April 29, 2026 14:15
@dbaku42 dbaku42 force-pushed the add-clustering-module branch from fb9fc21 to ed916d5 Compare April 29, 2026 14:39
dbaku42 and others added 2 commits April 29, 2026 16:44
- Updated import path from local to nf-core/clustering
- Removed custom container reference for now
@dbaku42 dbaku42 closed this May 13, 2026
auto-merge was automatically disabled May 13, 2026 20:49

Pull request was closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants