# DRS Mapping
Use IsoQuant software to align DRS data to the genome, calibrate it based on second-generation sequencing, and then obtain transcript boundary information.

<zh>使用 IsoQuant 软件把 DRS 数据比对到基因组上，并且根据二代测序进行校准，然后获取转录本边界信息。</zh>

In [1]:
import os
import subprocess

from spider_silkome_module import (
    RAW_DATA_DIR,
    INTERIM_DATA_DIR,
    PROCESSED_DATA_DIR,
)
from spider_silkome_module import (
    run_shell_command_with_check,
)

from spider_silkome_module import (
    GeneralGFF,
)

[32m2025-10-17 13:22:41.294[0m | [1mINFO    [0m | [36mspider_silkome_module.config[0m:[36m<module>[0m:[36m11[0m - [1mPROJ_ROOT path is: /home/gyk/project/spider_silkome[0m


## Create Genome STAR index

Note: STAR needs to be installed in advance. Download link: https://github.com/alexdobin/STAR

<zh>注意：需要提前安装好 STAR，下载地址：https://github.com/alexdobin/STAR</zh>

In [2]:
genome_file = f"{RAW_DATA_DIR}/spider_genome/Trichonephila_clavata.fa"
spider = "Trichonephila_clavata"
# genome_index_dir = f"{INTERIM_DATA_DIR}/star_index/{spider}"
# os.makedirs(genome_index_dir, exist_ok=True)
# star_index_cmd = f"STAR --runThreadN 70 --runMode genomeGenerate --genomeDir {genome_index_dir} --genomeFastaFiles {genome_file}"
# subprocess.run(star_index_cmd, shell=True)

## Creat BGI RNA-seq Bam files

**Note:** Maker sure the nextflow was installed in your system.

In this section, we use nf-core/rnaseq to create Bam files for DRS data.

Please prepare `nf-params.json` and `samplesheet.csv` file in `RNA-seq_workflow` directory according the [nf-core/rnaseq document](https://nf-co.re/rnaseq).

Run `nextflow run nf-core/rnaseq -r 3.19.0 -name BGI_RNA-seq -profile docker -params-file nf-params.json` in `RNA-seq_workflow` directory.

## DRS Mapping

Some errors occurred when running DRS mapping. So the gff file must be fixed by `agat_sp_fix_features_locations_duplicated.pl` before running the next step.

In [None]:

!awk -F'\t' '$3 ~ /^(gene|mRNA|exon|CDS)$/' /home/gyk/project/spider_silkome/data/raw/spider_genome/Trichonephila_clavata.gff > /home/gyk/project/spider_silkome/data/raw/spider_genome/Trichonephila_clavata_fixed.gff
!pixi run --environment agat agat_convert_sp_gxf2gxf.pl \
    --gff /home/gyk/project/spider_silkome/data/raw/spider_genome/Trichonephila_clavata_fixed.gff \
    --output /home/gyk/project/spider_silkome/data/raw/spider_genome/Trichonephila_clavata_fixed_agat.gff


Note: if ValueError: Duplicate ID xxxxxx, then fixed the id manually.

In [4]:
gene_annotation_file = genome_file.replace(".fa", "_fixed_agat.gff")
fastq_file = f"{RAW_DATA_DIR}/Tclav-F1_Ar-28-Trcl-f/pass.fq.gz"
isoquant_output_dir = f"{INTERIM_DATA_DIR}/03.DRS_mapping/isoquant"
isoquant_cmd = f"isoquant.py --genedb {gene_annotation_file} --reference {genome_file} --fastq {fastq_file} --data_type nanopore -o {isoquant_output_dir}"
run_shell_command_with_check(isoquant_cmd, isoquant_output_dir,force=True)

[32m2025-10-17 14:23:42.026[0m | [1mINFO    [0m | [36mspider_silkome_module.features[0m:[36mrun_shell_command_with_check[0m:[36m50[0m - [1mExecute command: isoquant.py --genedb /home/gyk/project/spider_silkome/data/raw/spider_genome/Trichonephila_clavata_fixed_agat.gff --reference /home/gyk/project/spider_silkome/data/raw/spider_genome/Trichonephila_clavata.fa --fastq /home/gyk/project/spider_silkome/data/raw/Tclav-F1_Ar-28-Trcl-f/pass.fq.gz --data_type nanopore -o /home/gyk/project/spider_silkome/data/interim/03.DRS_mapping/isoquant[0m
2025-10-17 14:23:42,538 - INFO - Running IsoQuant version 3.9.0
2025-10-17 14:23:51,548 - INFO - Overwriting the previous run
2025-10-17 14:23:52,553 - INFO - Novel unspliced transcripts will not be reported, set --report_novel_unspliced true to discover them
2025-10-17 14:23:52,554 - INFO -  === IsoQuant pipeline started === 
2025-10-17 14:23:52,554 - INFO - Python version: 3.10.18 | packaged by conda-forge | (main, Jun  4 2025, 14:45:41) [

True