Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can I use this tool for plants? #31

Open
hanshanmengqi opened this issue Oct 26, 2023 · 3 comments
Open

Can I use this tool for plants? #31

hanshanmengqi opened this issue Oct 26, 2023 · 3 comments

Comments

@hanshanmengqi
Copy link

Hi,

Thank you for your interesting tool.

I work on plants, so can I use it for plants?

Best,
Han

@Ziwei-Liu
Copy link

Yes.

Here is my command lines:

for i in *.chr.bam; \
do \
i=${i%.chr.bam*}; \
nohup ROSE_main.py --custom *_refseq.ucsc -i ${i}.cut.bed -r ${i}.chr.bam -o ./${i}/ 2>${i}.log & \
done

Here we only have to insure that:

  1. The chromosome names in your bam file and peak file are started with 'chr'. If you have contigs, them name them 'chrC1','chrC2', or 'chrContig1','chrContig2', or anything begin with chr as you like.
  2. Peak files can directly use the narrowPeak files produced by macs2, while remember changing its suffix to .bed.
  3. Custom your own genome annotation file the same format as UCSC table track format. If you are using a gff3 file as your annotation file, then use a software called gff3ToGenePred to transform it. But Remember to add an index column and a header row to your transformed annotation file, for a normal transformation did not make the format completely the same as the examples provided in the program's annotation folder.
    A typical gff3 file:
##gff-version 3
chrC1   EVM     gene    118111  135837  .       +       .       ID=Contig1G000001;
chrC1   EVM     mRNA    118111  135837  .       +       .       ID=Contig1G000001.mRNA1;Parent=Contig1G000001
chrC1   EVM     exon    118111  118122  .       +       .       ID=Contig1G000001.exon1;Parent=Contig1G000001.mRNA1
chrC1   EVM     CDS     118111  118122  .       +       0       ID=Contig1G000001.cds1;Parent=Contig1G000001.mRNA1
chrC1   EVM     exon    120459  120548  .       +       .       ID=Contig1G000001.exon2;Parent=Contig1G000001.mRNA1
chrC1   EVM     CDS     120459  120548  .       +       0       ID=Contig1G000001.cds2;Parent=Contig1G000001.mRNA1

A transformed genepred file using gff3ToGenePred:

Contig1G000005.mRNA1    chrC1   +       185081  186707  185081  186707  2       185081,186044,  185093,186707,  0       Contig1G000005  cmpl    cmpl    0,0,
Contig1G000004.mRNA1    chrC1   +       153060  171316  153060  171316  14      153060,153415,153606,153849,155852,156537,156725,160188,161266,161580,164263,166108,166471,171012,      153075,153490,153816,153899,155865,156612,157021,160276,161473,161622,164343,166186,167116,171316,   0       Contig1G000004  cmpl    cmpl    0,0,0,0,1,0,0,1,0,0,0,1,1,1,
Contig1G000003.mRNA1    chrC1   +       148519  149466  148519  149466  4       148519,148766,149001,149196,    148607,148885,149076,149466,    0       Contig1G000003  cmpl    cmpl    0,2,0,0,
Contig1G000002.mRNA1    chrC1   +       136234  137231  136234  137231  3       136234,136564,136919,   136246,136639,137231,   0       Contig1G000002  cmpl    cmpl    0,0,0,
Contig1G000001.mRNA1    chrC1   +       118110  135837  118110  135837  7       118110,120458,121255,128550,128987,129666,135809,       118122,120548,121489,128646,129036,129703,135837,       0       Contig1G000001  cmpl    cmpl    0,0,0,0,0,2,1,

An example from the repositories' anntotation folder:

#bin	name	chrom	strand	txStart	txEnd	cdsStart	cdsEnd	exonCount	exonStarts	exonEnds	score	name2	cdsStartStat	cdsEndStat	exonFrames
0	NR_075077	chr1	-	67092175	67134971	67134971	67134971	10	67092175,67096251,67103237,67111576,67113613,67115351,67125751,67127165,67131141,67134929,	67093604,67096321,67103382,67111644,67113756,67115464,67125909,67127257,67131227,67134971,	0	C1orf141	unk	unk	-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
0	NM_001276352	chr1	-	67092175	67134971	67093579	67127240	9	67092175,67096251,67103237,67111576,67115351,67125751,67127165,67131141,67134929,	67093604,67096321,67103382,67111644,67115464,67125909,67127257,67131227,67134971,	0	C1orf141	cmpl	cmpl	2,1,0,1,2,0,0,-1,-1,
0	NM_001276351	chr1	-	67092175	67134971	67093004	67127240	8	67092175,67095234,67096251,67115351,67125751,67127165,67131141,67134929,	67093604,67095421,67096321,67115464,67125909,67127257,67131227,67134971,	0	C1orf141	cmpl	cmpl	0,2,1,2,0,0,-1,-1,
0	NM_000299	chr1	+	201283451	201332993	201283702	201328836	15	201283451,201293941,201313165,201316552,201317571,201318617,201319815,201320266,201321977,201323012,201324427,201324940,201325753,201328761,201330073,	201283904,201294045,201313560,201316697,201317779,201318795,201319878,201320381,201322133,201323189,201324581,201325127,201325838,201328868,201332993,	0	PKP1	cmpl	cmpl	0,1,0,2,0,1,2,2,0,0,0,1,2,0,-1,

So remember adding the bin column and the header line manually
A customed annotation file provided to ROSE finally:

#bin    name    chrom   strand  txStart txEnd   cdsStart        cdsEnd  exonCount       exonStarts      exonEnds        score   name2   cdsStartStat    cdsEndStat      exonFrames
0       Contig1G000010.mRNA1    chrC1   +       249212  251406  249212  251406  4       249212,249358,249619,251001,    249227,249503,249801,251406,    0       Contig1G000010  cmpl    cmpl    0,0,2,0,
0       Contig1G000009.mRNA1    chrC1   +       222452  247955  222452  247955  19      222452,225441,227650,227864,228265,235538,235788,236674,236869,239211,239470,241009,242202,242448,244280,244530,245983,246205,247562,   222461,225576,227722,228133,228343,235755,235841,236764,236951,239344,239728,241391,242308,242582,244497,244583,246078,246315,247955,        0       Contig1G000009  cmpl    cmpl    0,0,0,0,1,1,0,1,1,0,2,2,1,0,1,0,1,2,0,
0       Contig1G000008.mRNA1    chrC1   +       220918  222208  220918  222208  4       220918,221411,221849,222117,    220975,221483,222079,222208,    0       Contig1G000008  cmpl    cmpl    0,0,0,1,
0       Contig1G000007.mRNA1    chrC1   +       207537  210311  207537  210311  5       207537,208815,209732,209965,210233,     207558,208941,209804,210193,210311,     0       Contig1G000007  cmpl    cmpl    0,0,0,0,0,
0       Contig1G000006.mRNA1    chrC1   +       198072  199140  198072  199140  4       198072,198399,198570,199094,    198084,198471,198896,199140,    0       Contig1G000006  cmpl    cmpl    0,0,0,1,
0       Contig1G000005.mRNA1    chrC1   +       185081  186707  185081  186707  2       185081,186044,  185093,186707,  0       Contig1G000005  cmpl    cmpl    0,0,
0       Contig1G000004.mRNA1    chrC1   +       153060  171316  153060  171316  14      153060,153415,153606,153849,155852,156537,156725,160188,161266,161580,164263,166108,166471,171012,      153075,153490,153816,153899,155865,156612,157021,160276,161473,161622,164343,166186,167116,171316,   0       Contig1G000004  cmpl    cmpl    0,0,0,0,1,0,0,1,0,0,0,1,1,1,
0       Contig1G000003.mRNA1    chrC1   +       148519  149466  148519  149466  4       148519,148766,149001,149196,    148607,148885,149076,149466,    0       Contig1G000003  cmpl    cmpl    0,2,0,0,
0       Contig1G000002.mRNA1    chrC1   +       136234  137231  136234  137231  3       136234,136564,136919,   136246,136639,137231,   0       Contig1G000002  cmpl    cmpl    0,0,0,
0       Contig1G000001.mRNA1    chrC1   +       118110  135837  118110  135837  7       118110,120458,121255,128550,128987,129666,135809,       118122,120548,121489,128646,129036,129703,135837,       0       Contig1G000001  cmpl    cmpl0,0,0,0,0,2,1,
  1. The annotation file used must be named as *_refseq.ucsc, remember renaming your customed annotation file after changed its format into what you need.

This tool is easy to use, and powerful, I like it too.

@hanshanmengqi
Copy link
Author

Dear Liu,

Thank you so much for your detailed reply.

However, I encountered an error with gff3ToGenePred. The error message is: 'CDS feature must have phase.'
My commang is: gff3ToGenePred input.gff3 output.Gp
Do you have any suggestions on how to resolve this?

Best,
Han

@Ziwei-Liu
Copy link

Dear Liu,

Thank you so much for your detailed reply.

However, I encountered an error with gff3ToGenePred. The error message is: 'CDS feature must have phase.' My commang is: gff3ToGenePred input.gff3 output.Gp Do you have any suggestions on how to resolve this?

Best, Han

Maybe you should check whether your gff file has correctly annotated phase of your cds. To do so, check the 8th column of lines that are marked as CDS in 3rd column in your gff file and make sure it appears to be one of the three numbers of 0, 1, or 2 but not any other symbols like '.'.
For detailed information and examples about gff format and what does 'phase' mean for CDS, please check https://www.ncbi.nlm.nih.gov/datasets/docs/v1/reference-docs/file-formats/about-ncbi-gff3/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants