# RNA-seq Pipeline

**Folder Structure**

- {**date**}{**project_name**}
    - **samples (needs to be added manually)**
        
        
    - analysis
        
        
    - results
        
        
    - **workflow (needs to be added manually)**
        
        
    
    📝 Metadata
    
    📝 Snakefile
    
    📝 README.md

To-Dos

- Add conditional indexing

For contributors:
- Common Errors and their solution:
    - Not all output, log and benchmark files of rule alignment contain the same wildcards. This is crucial though, in order to avoid that two or more jobs write to the same file.
        - This error means that there's one of the output, benchmark, and log overwrite each other by not having the same wildcards
        - For example, log: {sample}.log is different than {sample}_{lane}.log
        - The first would overwrite the sample log with each lane which would delete the previous lane data
    - Error: dag2.dot: syntax error in line 1 near 'All'
        - the dag.dot file has extra lines in the beginning

Be smart regarding threads count for each rule. 
If you assign too many threads to each rule, you reduce the number of jobs that can run in parallel. For instance:

If each rule uses 14 threads and you have 44 cores, you can only run 3 jobs in parallel (14 threads × 3 jobs = 42 threads used), leaving only 2 threads idle or unused.

However, many bioinformatics tools like FastQC, MultiQC, Fastp, and even Kallisto often do not require that many threads. Tools like HISAT2 or StringTie can benefit more from multithreading, but even these tools usually achieve diminishing returns beyond a certain number of threads (e.g., 8-12).

The goal is to strike a balance between running jobs with enough threads to maximize performance and leaving enough CPUs free to run multiple jobs simultaneously. Here's a suggestion:

- FastQC: It’s I/O bound, so it typically won’t benefit much from many threads. You could set threads: 2 for FastQC.
- HISAT2: It scales well with multiple threads, but more than 8-12 threads often provides diminishing returns. Set threads: 8 for HISAT2.
- StringTie: StringTie benefits from multithreading, but similarly, beyond 8-12 threads, you may not see significant gains. Set threads: 8 for StringTie.
- Kallisto: It’s lightweight and efficient, so threads: 4-6 is reasonable.
- Fastp: It can handle multithreading, but something like threads: 4-6 should be sufficient.
- MultiQC: Typically doesn't need many threads, so threads: 1-2 should be fine.
