Skip to content

Commit

Permalink
Merge pull request #358 from VErconi/WipClarificationUsage
Browse files Browse the repository at this point in the history
Wip clarification usage
  • Loading branch information
JoseEspinosa committed Apr 12, 2024
2 parents 76ee846 + 357cf89 commit 9fbc9b0
Showing 1 changed file with 28 additions and 21 deletions.
49 changes: 28 additions & 21 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,57 +6,64 @@
## Samplesheet input

You will need to create a samplesheet with information about the samples you would like to analyse before running the pipeline. Use this parameter to specify its location. It has to be a comma-separated file with 4 columns, and a header row as shown in the examples below.
Before running the pipeline, create a samplesheet with information about the samples you would like to analyse. This samplesheet contains the files that will be passed as inputs to the pipeline. The --input parameter is used to specify the samplesheet location. It has to be a comma-separated file with 4 columns, and a header row as shown in the examples below.

```bash
--input '[path to samplesheet file]'
```

### Multiple replicates

The `sample` identifier is the same when you have multiple biological replicates from the same experimental group, just increment the `replicate` identifier appropriately. The first replicate value for any given experimental group must be 1. Below is an example for a single experimental group in triplicate:
The `sample` identifier is the same when you have multiple biological replicates from the same experimental group, just increment the `replicate` identifier appropriately. The first replicate value for any given experimental group must be 1. Below is an example for the analysis of paired-end sequencing of ATAC-seq experiment performed in triplicate for the cell line "A" :

```console
sample,fastq_1,fastq_2,replicate
CONTROL,AEG588A1_S1_L002_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz,1
CONTROL,AEG588A1_S1_L003_R1_001.fastq.gz,AEG588A1_S1_L003_R2_001.fastq.gz,2
CONTROL,AEG588A1_S1_L004_R1_001.fastq.gz,AEG588A1_S1_L004_R2_001.fastq.gz,3
SAMPLE_A,AEG588A1_S1_L002_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz,1
SAMPLE_A,AEG588A1_S1_L003_R1_001.fastq.gz,AEG588A1_S1_L003_R2_001.fastq.gz,2
SAMPLE_A,AEG588A1_S1_L004_R1_001.fastq.gz,AEG588A1_S1_L004_R2_001.fastq.gz,3
```

The pipeline will automatically append the `&_REP<BIOLOGICAL_REPLICATE_NUMBER>` suffix to the sample name within the pipeline e.g. `CONTROL_REP1`, `CONTROL_REP2` and `CONTROL_REP3` using the example above. If you don't have replicates you can set the `replicate` value to 1 for all of your samples.
The pipeline will automatically append the `&_REP<BIOLOGICAL_REPLICATE_NUMBER>` suffix to the sample name within the pipeline e.g. `SAMPLE_A_REP1`, `SAMPLE_A_REP2` and `SAMPLE_A_REP3` using the example above. If you don't have replicates you can set the `replicate` value to 1 for all of your samples. Below an example for the analysis of of paired-end sequencing of ATAC-seq experiment performed without replicates for the cell line "A" and for the tissue "B":

```console
sample,fastq_1,fastq_2,replicate
SAMPLE_A,AEG588A1_S1_L002_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz,1
SAMPLE_B,BEG599B2_S1_L003_R1_001.fastq.gz,BEG599B2_S1_L003_R2_001.fastq.gz,1
```

### Multiple runs of the same sample

The `sample` and `replicate` identifiers have to be the same when you have re-sequenced the same sample more than once e.g. to increase sequencing depth. The pipeline will perform the alignments in parallel, and subsequently merge them before further analysis. Below is an example a sample sequenced across 3 lanes:
The `sample` and `replicate` identifiers have to be the same when you have re-sequenced the same sample more than once e.g. to increase sequencing depth. The pipeline will perform the alignments in parallel, and subsequently merge them before further analysis. Below is an example of how the samplesheet for SAMPLE_A would look like if sequenced across 3 lanes:

```csv title="samplesheet.csv"
sample,fastq_1,fastq_2,replicate
CONTROL,AEG588A1_S1_L002_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz,1
CONTROL,AEG588A1_S1_L003_R1_001.fastq.gz,AEG588A1_S1_L003_R2_001.fastq.gz,1
CONTROL,AEG588A1_S1_L004_R1_001.fastq.gz,AEG588A1_S1_L004_R2_001.fastq.gz,1
SAMPLE_A,AEG588A1_S1_L002_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz,1
SAMPLE_A,AEG588A1_S1_L003_R1_001.fastq.gz,AEG588A1_S1_L003_R2_001.fastq.gz,1
SAMPLE_A,AEG588A1_S1_L004_R1_001.fastq.gz,AEG588A1_S1_L004_R2_001.fastq.gz,1
```

The pipeline will automatically append the `*_T<TECHNICAL_REPLICATE_NUMBER>` suffix to the sample name within the pipeline e.g. `CONTROL_REP1_T1`, `CONTROL_REP1_T2` and `CONTROL_REP1_T3` using the example above.
The pipeline will automatically append the `*_T<TECHNICAL_REPLICATE_NUMBER>` suffix to the sample name within the pipeline e.g. `SAMPLE_A_REP1_T1`, `SAMPLE_A_REP1_T2` and `SAMPLE_A_REP1_T3` using the example above.

### Control data
### INPUT control data

If controls are to be used for peak calling use the parameter `--with_control`. In this case, the samplesheet file needs the additional columns `control` and `control_replicate`. These should be the sample identifier and sample replicate for the controls.
An input control is a file that can be used during peak calling to estimate the background of the experiment. If input controls sequencing information is available, it can be used for peak calling using the parameter `--with_control`. In this case, the samplesheet file needs the additional columns `control` and `control_replicate`. These should be the sample identifier and sample replicate for the input controls, as in the example below.

### Full samplesheet

The pipeline will auto-detect whether a sample is single- or paired-end using the information provided in the samplesheet. The samplesheet can have as many columns as you desire, however, there is a strict requirement for the first 4 columns to match those defined in the table below.

A final samplesheet file consisting of both single- and paired-end data may look something like the one below. This is for 7 samples, where we have biological triplicates for both the `CONTROL` and `TREATMENT` groups, and the third replicate in the `TREATMENT` group has been a technical replicate as a result of being sequenced twice.
A final samplesheet file consisting of both single- and paired-end data may look something like the one below. This is for 7 samples, where we have biological triplicates for both the control condition, such as cell line "A" untreated, `UNTREATED_A` and the treatment condition, such as cell line "A" treated with a compound, `TREATED_A` groups, and the third replicate in the `TREATED_A` group has been a technical replicate as a result of being sequenced twice. In this example, INPUT control is available `INPUT_A` with no replicates.

```csv title="samplesheet.csv"
sample,fastq_1,fastq_2,replicate,control,control_replicate
CONTROL,AEG588A1_S1_L002_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz,1,,
CONTROL,AEG588A2_S2_L002_R1_001.fastq.gz,AEG588A2_S2_L002_R2_001.fastq.gz,2,,
CONTROL,AEG588A3_S3_L002_R1_001.fastq.gz,AEG588A3_S3_L002_R2_001.fastq.gz,3,,
TREATMENT,AEG588A4_S4_L003_R1_001.fastq.gz,,1,CONTROL,1
TREATMENT,AEG588A5_S5_L003_R1_001.fastq.gz,,2,CONTROL,2
TREATMENT,AEG588A6_S6_L003_R1_001.fastq.gz,,3,CONTROL,3
TREATMENT,AEG588A6_S6_L004_R1_001.fastq.gz,,3,CONTROL,3
INPUT_A,IEG577I1_S1_L001_R1_001.fastq.gz,IEG577I1_S1_L002_R2_001.fastq.gz,1,,
UNTREATED_A,AEG588A1_S1_L002_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz,1,INPUT_A,1
UNTREATED_A,AEG588A2_S2_L002_R1_001.fastq.gz,AEG588A2_S2_L002_R2_001.fastq.gz,2,INPUT_A,1
UNTREATED_A,AEG588A3_S3_L002_R1_001.fastq.gz,AEG588A3_S3_L002_R2_001.fastq.gz,3,INPUT_A,1
TREATED_A,AEG588A4_S4_L003_R1_001.fastq.gz,,1,INPUT_A,1
TREATED_A,AEG588A5_S5_L003_R1_001.fastq.gz,,2,INPUT_A,1
TREATED_A,AEG588A6_S6_L003_R1_001.fastq.gz,,3,INPUT_A,1
TREATED_A,AEG588A6_S6_L004_R1_001.fastq.gz,,3,INPUT_A,1
```

| Column | Description |
Expand Down

0 comments on commit 9fbc9b0

Please sign in to comment.