## CRISPR pipeline

### Prerequisites

Nextflow and Docker/Singularity must be installed in your compute environment before running the pipeline:

1. Nextflow (version > 24)
Workflow manager for executing the pipeline:


In [None]:
%%bash
conda install bioconda::nextflow

2. Singularity
Container platform that must be available in your execution environment.


### Pipeline installation

To install the pipeline:

In [None]:
%%bash
git clone https://github.com/pinellolab/CRISPR_Pipeline.git

### Download Data from IGVF Portal

We've created a download pipeline to retrieve all necessary files from the IGVF portal, including FASTQ files, seqspecs, and reference metadata. 

**Pipeline Location:** https://github.com/pinellolab/CRISPR_Pipeline/tree/main/download_pipeline

**Configuration**

Configure the pipeline by editing your `nextflow.config` file with the following parameters:

```groovy 
    params {
        // Portal seqspec availability
        SEQSPEC_ON_PORTAL = 'false'  
        // Set to 'true' if seqspecs are available on portal
        
        // Download options: 'all', 'fastq', or 'other'
        download_option = 'all'
        
        // IGVF authentication - path to your key file
        keypair_json = '/path/to/your/igvf_key.json'
        
        // Dataset identifier
        accession_id = 'IGVFDS4389OUWU'
        
        // Seqspec files (required when SEQSPEC_ON_PORTAL = 'false')
        SEQUENCE_PARSING_scRNA_seqspec_yaml = '/path/to/seqspec/rna_seq_spec.yaml'
        SEQUENCE_PARSING_sgRNA_seqspec_yaml = '/path/to/seqspec/sgrna_seq_spec.yaml'
        SEQUENCE_PARSING_hash_seqspec_yaml = '/path/to/seqspec/hash_seq_spec.yaml'
    }
```

**Required information**

**1. IGVF Authentication Key**

Create an IGVF key JSON file with your credentials:

```json
    { "key": "your_access_key_here", "secret": "your_secret_key_here"}
```

**2. Download Options**

Choose what to download by setting `download_option`:

- **`'all'`** - Downloads FASTQ files, reference files, and metadata
- **`'fastq'`** - Downloads only FASTQ files
- **`'other'`** - Downloads only reference files and metadata

**3. Seqspec Configuration**

- If seqspecs are available on the portal, set `SEQSPEC_ON_PORTAL = 'true'`
- If not available on portal, set `SEQSPEC_ON_PORTAL = 'false'` and provide paths to your local seqspec YAML files

**Once configured, run the download pipeline:**

In [None]:
%%bash
cd CRISPR_Pipeline/download_pipeline
nextflow run main.nf 

After running the download pipeline, besides fastq files, metadata, and seqspec, we have will also have a samplesheet.tsv. The samplesheet configure the full path to all input files. 

**Before running the pipeline, customize the `nextflow.config` for your run.**

Run the pipeline:

In [None]:
%%bash

cd CRISPR_Pipeline
nextflow run main.nf -profile local

### Compute Configuration

Select and configure your compute environment:

**1. Local**

In [None]:
%%bash
nextflow run main.nf -profile local --input /path/to/your/samplesheet.tsv --outdir /path/to/your/outdir

**2. SLURM Cluster**

```groovy
    slurm {
        process.queue = 'short,normal,long'  // Update partition names
    }
```

In [None]:
%%bash

nextflow run main.nf -profile slurm --input /path/to/your/samplesheet.tsv --outdir /path/to/your/outdir

**3. Google Cloud Platform**

```groovy
    google_bucket = 'gs://your-bucket-name'
    google_project = 'your-gcp-project-id'
    google_region = 'us-central1'
```

In [None]:
%%bash

export GOOGLE_APPLICATION_CREDENTIALS="/path/to/your/pipeline-service-key.json"
nextflow run main.nf -profile google --input /path/to/your/samplesheet.tsv --outdir /path/to/your/outdir