Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GenesPassQC.bed not generated correctly #6

Open
pineapple216 opened this issue Jul 1, 2015 · 8 comments
Open

GenesPassQC.bed not generated correctly #6

pineapple216 opened this issue Jul 1, 2015 · 8 comments

Comments

@pineapple216
Copy link

Hi Shilin,

I tried running theupdated version of ExonDel, 1.06 and unfortunately it gave an error, considering the genespassqc.bed file;

perl ExonDel.pl -i /home/koen/cnv_tool_comparison/ExonDel/bam_input_run1_exondel.txt -c /home/koen/cnv_tool_comparison/ExonDel/ExonDel.cfg -o /home/koen/cnv_tool_comparison/ExonDel_results_run1_merged_MIPS -g /home/koen/cnv_tool_comparison/ExonDel/genelist.txt -t 7
[Wed Jul 1 10:01:50 2015] Only the genes in /home/koen/cnv_tool_comparison/ExonDel/genelist.txt will be used
[Wed Jul 1 10:01:50 2015] GC adjustment will not be performed, and the constant cutoffs in config file will be used
[Wed Jul 1 10:01:50 2015] Loading BED file
[Wed Jul 1 10:01:53 2015] Finish BED file (cover 7667 base pairs)
[Wed Jul 1 10:01:53 2015] Loading RefSeq file
[Wed Jul 1 10:01:53 2015] Finish RefSeq file
[Wed Jul 1 10:01:53 2015] #####################################################################
[Wed Jul 1 10:01:53 2015] #ERROR: genesPassQC.bed file was not generated correctly.
[Wed Jul 1 10:01:53 2015] #Decrease exon_bp_cover_threshold and overall_exon_count_threshold in configure file.
[Wed Jul 1 10:01:53 2015] #More information at README config file section.
[Wed Jul 1 10:01:53 2015] #####################################################################
ERROR: genesPassQC.bed file was not generated correctly. at ExonDel.pl line 213, line 53432.

I find this weird, because I've set both exon_bp_cover_threshold and overall_exon_count_threshold to 0 in the config file.

If it's useful to know; I've in this case selected in the genelist only the BRCA1 gene (NM_007294) which can be found in the hg19 reference bed file which I downloaded from UCSC.

Hopefully you can help me resolve this issue, let me know if you need more information,

Kind regards,

Koen

@slzhao
Copy link
Owner

slzhao commented Jul 6, 2015

Hi Koen,

Does the covered_percentage file contain BRCA1 gene? Does
covered_percentage_bp and covered_percentage_exon for that gene larger than
0 in covered_percentage file?

Best,
Shilin

2015-07-01 3:06 GMT-05:00 Koen Hendriks notifications@github.com:

Hi Shilin,

I tried running theupdated version of ExonDel, 1.06 and unfortunately it
gave an error, considering the genespassqc.bed file;

perl ExonDel.pl -i
/home/koen/cnv_tool_comparison/ExonDel/bam_input_run1_exondel.txt -c
/home/koen/cnv_tool_comparison/ExonDel/ExonDel.cfg -o
/home/koen/cnv_tool_comparison/ExonDel_results_run1_merged_MIPS -g
/home/koen/cnv_tool_comparison/ExonDel/genelist.txt -t 7
[Wed Jul 1 10:01:50 2015] Only the genes in
/home/koen/cnv_tool_comparison/ExonDel/genelist.txt will be used
[Wed Jul 1 10:01:50 2015] GC adjustment will not be performed, and the
constant cutoffs in config file will be used
[Wed Jul 1 10:01:50 2015] Loading BED file
[Wed Jul 1 10:01:53 2015] Finish BED file (cover 7667 base pairs)
[Wed Jul 1 10:01:53 2015] Loading RefSeq file
[Wed Jul 1 10:01:53 2015] Finish RefSeq file
[Wed Jul 1 10:01:53 2015]
#####################################################################
[Wed Jul 1 10:01:53 2015] #ERROR: genesPassQC.bed file was not generated
correctly.
[Wed Jul 1 10:01:53 2015] #Decrease exon_bp_cover_threshold and
overall_exon_count_threshold in configure file.
[Wed Jul 1 10:01:53 2015] #More information at README config file section.
[Wed Jul 1 10:01:53 2015]
#####################################################################
ERROR: genesPassQC.bed file was not generated correctly. at ExonDel.pl
line 213, line 53432.

I find this weird, because I've set both exon_bp_cover_threshold and
overall_exon_count_threshold to 0 in the config file.

If it's useful to know; I've in this case selected in the genelist only
the BRCA1 gene (NM_007294) which can be found in the hg19 reference bed
file which I downloaded from UCSC.

Hopefully you can help me resolve this issue, let me know if you need more
information,

Kind regards,

Koen


Reply to this email directly or view it on GitHub
#6.

@pineapple216
Copy link
Author

Hi Shilin,

No, the covered percentage file is empty.
As is genesPassQC.bed and genesPassQC.bed.depth isn't created.

Best,

Koen

@slzhao
Copy link
Owner

slzhao commented Jul 6, 2015

Hello Koen,

I think read_percentage file should at least have a title line "#bin name
chrom strand txStart txEnd cdsStart cdsEnd ~~~".

Would you please check these details so that we can find where the problem
is:

  1. For the gene list file, it should be:
    BRCA1
    SomeGene

2. For the bed file, it should be:
chr  PosStart  PosEnd

OR
chr PosStart PosEnd BRCA1 Otherinformation
chr PosStart PosEnd SomeGene Otherinformation


3. For the gtf file, BRCA1 should be included.


Thank you!

Best,
Shilin



2015-07-06 1:45 GMT-05:00 Koen Hendriks <notifications@github.com>:

> Hi Shilin,
>
> No, the covered percentage file is empty.
> As is genesPassQC.bed and genesPassQC.bed.depth isn't created.
>
> Best,
>
> Koen
>
> —
> Reply to this email directly or view it on GitHub
> <https://github.com/slzhao/ExonDel/issues/6#issuecomment-118744264>.
>

@pineapple216
Copy link
Author

Hi Shilin,

The covered percentage file indeed contains the header (I should've made that clear earlier, sorry)

  1. The genelist contains NM_007294, the refseq identifier for the BRCA1 gene
  2. The bedfile contains these refseq identifers for all genes, but for NM_007294 it looks like this:

chr17 41196301 41197829 NM_007294
chr17 41199649 41199730 NM_007294
chr17 41201127 41201221 NM_007294
chr17 41203069 41203144 NM_007294
chr17 41209058 41209162 NM_007294
chr17 41215339 41215400 NM_007294
chr17 41215880 41215978 NM_007294
chr17 41219614 41219722 NM_007294
chr17 41222934 41223265 NM_007294
chr17 41226337 41226548 NM_007294
chr17 41228494 41228641 NM_007294
chr17 41234410 41234602 NM_007294
chr17 41242950 41243059 NM_007294
chr17 41243441 41246887 NM_007294
chr17 41247852 41247949 NM_007294
chr17 41249250 41249316 NM_007294
chr17 41251781 41251907 NM_007294
chr17 41256128 41256288 NM_007294
chr17 41256874 41256983 NM_007294
chr17 41258462 41258560 NM_007294
chr17 41267732 41267806 NM_007294
chr17 41276023 41276142 NM_007294
chr17 41277277 41277510 NM_007294

Which in this case are the positions of all the Exons in the BRCA1 gene.

  1. The before mentioned identifer is included in the GTF file, the data for it looks like this:

899 NM_007294 chr17 - 41196311 41277500 41197694 41276113 23 41196311,41199659,41201137,41203079,41209068,41215349,41215890,41219624,41222944,41226347,41228504,41234420,41242960,41243451,41247862,41249260,41251791,41256138,41256884,41258472,41267742,41276033,41277287, 41197819,41199720,41201211,41203134,41209152,41215390,41215968,41219712,41223255,41226538,41228631,41234592,41243049,41246877,41247939,41249306,41251897,41256278,41256973,41258550,41267796,41276132,41277500, 0 BRCA1 cmpl cmpl 1,0,1,0,0,1,1,0,1,2,1,0,1,1,2,1,0,1,2,2,2,0,-1,

If you need more info, let me know!

Best,

Koen

@slzhao
Copy link
Owner

slzhao commented Jul 8, 2015

Hi Koen,

The genes in gtf file should be the same with bed and genelist file. So you
can change it from "BRCA1" to "NM_007294" and try it again. I think it
should work.

Best,
Shilin

2015-07-08 4:05 GMT-05:00 Koen Hendriks notifications@github.com:

Hi Shilin,

The covered percentage file indeed contains the header (I should've made
that clear earlier, sorry)

The genelist contains NM_007294, the refseq identifier for the BRCA1
gene
2.

The bedfile contains these refseq identifers for all genes, but for
NM_007294 it looks like this:

chr17 41196301 41197829 NM_007294
chr17 41199649 41199730 NM_007294
chr17 41201127 41201221 NM_007294
chr17 41203069 41203144 NM_007294
chr17 41209058 41209162 NM_007294
chr17 41215339 41215400 NM_007294
chr17 41215880 41215978 NM_007294
chr17 41219614 41219722 NM_007294
chr17 41222934 41223265 NM_007294
chr17 41226337 41226548 NM_007294
chr17 41228494 41228641 NM_007294
chr17 41234410 41234602 NM_007294
chr17 41242950 41243059 NM_007294
chr17 41243441 41246887 NM_007294
chr17 41247852 41247949 NM_007294
chr17 41249250 41249316 NM_007294
chr17 41251781 41251907 NM_007294
chr17 41256128 41256288 NM_007294
chr17 41256874 41256983 NM_007294
chr17 41258462 41258560 NM_007294
chr17 41267732 41267806 NM_007294
chr17 41276023 41276142 NM_007294
chr17 41277277 41277510 NM_007294

Which in this case are the positions of all the Exons in the BRCA1 gene.

  1. The before mentioned identifer is included in the GTF file, the
    data for it looks like this:

899 NM_007294 chr17 - 41196311 41277500 41197694 41276113 23
41196311,41199659,41201137,41203079,41209068,41215349,41215890,41219624,41222944,41226347,41228504,41234420,41242960,41243451,41247862,41249260,41251791,41256138,41256884,41258472,41267742,41276033,41277287,
41197819,41199720,41201211,41203134,41209152,41215390,41215968,41219712,41223255,41226538,41228631,41234592,41243049,41246877,41247939,41249306,41251897,41256278,41256973,41258550,41267796,41276132,41277500,
0 BRCA1 cmpl cmpl 1,0,1,0,0,1,1,0,1,2,1,0,1,1,2,1,0,1,2,2,2,0,-1,

If you need more info, let me know!

Best,

Koen


Reply to this email directly or view it on GitHub
#6 (comment).

@pineapple216
Copy link
Author

Hi Shilin,

I think you misunderstood what I wrote, no problem.
The genes in the gtf file, bed file and generalist file are already the same as I pointed out earlier.

Or did I miss something regarding this 
(they all have the identifier NM_007294)

Best,

Koen
On 8 Jul 2015 at 17:11:36, slzhao (notifications@github.com) wrote:

Hi Koen,

The genes in gtf file should be the same with bed and genelist file. So you
can change it from "BRCA1" to "NM_007294" and try it again. I think it
should work.

Best,
Shilin

2015-07-08 4:05 GMT-05:00 Koen Hendriks notifications@github.com:

Hi Shilin,

The covered percentage file indeed contains the header (I should've made
that clear earlier, sorry)

The genelist contains NM_007294, the refseq identifier for the BRCA1
gene
2.

The bedfile contains these refseq identifers for all genes, but for
NM_007294 it looks like this:

chr17 41196301 41197829 NM_007294
chr17 41199649 41199730 NM_007294
chr17 41201127 41201221 NM_007294
chr17 41203069 41203144 NM_007294
chr17 41209058 41209162 NM_007294
chr17 41215339 41215400 NM_007294
chr17 41215880 41215978 NM_007294
chr17 41219614 41219722 NM_007294
chr17 41222934 41223265 NM_007294
chr17 41226337 41226548 NM_007294
chr17 41228494 41228641 NM_007294
chr17 41234410 41234602 NM_007294
chr17 41242950 41243059 NM_007294
chr17 41243441 41246887 NM_007294
chr17 41247852 41247949 NM_007294
chr17 41249250 41249316 NM_007294
chr17 41251781 41251907 NM_007294
chr17 41256128 41256288 NM_007294
chr17 41256874 41256983 NM_007294
chr17 41258462 41258560 NM_007294
chr17 41267732 41267806 NM_007294
chr17 41276023 41276142 NM_007294
chr17 41277277 41277510 NM_007294

Which in this case are the positions of all the Exons in the BRCA1 gene.

  1. The before mentioned identifer is included in the GTF file, the
    data for it looks like this:

899 NM_007294 chr17 - 41196311 41277500 41197694 41276113 23
41196311,41199659,41201137,41203079,41209068,41215349,41215890,41219624,41222944,41226347,41228504,41234420,41242960,41243451,41247862,41249260,41251791,41256138,41256884,41258472,41267742,41276033,41277287,
41197819,41199720,41201211,41203134,41209152,41215390,41215968,41219712,41223255,41226538,41228631,41234592,41243049,41246877,41247939,41249306,41251897,41256278,41256973,41258550,41267796,41276132,41277500,
0 BRCA1 cmpl cmpl 1,0,1,0,0,1,1,0,1,2,1,0,1,1,2,1,0,1,2,2,2,0,-1,

If you need more info, let me know!

Best,

Koen


Reply to this email directly or view it on GitHub
#6 (comment).


Reply to this email directly or view it on GitHub.

@slzhao
Copy link
Owner

slzhao commented Jul 8, 2015

Hi Koen,

Currently we are using "BRCA1" in "0 BRCA1 cmpl cmpl
1,0,1,0,0,1,1,0,1,2,1,0,1,1,2,1,0,1,2,2,2,0,-1" as gene name, not "NM_007294"
in "899 NM_007294 chr17 - 41196311 41277500 41197694". So you may need to
change "BRCA1" to "NM_007294".
I will update the code in next version so that it will read both "NM_007294"
and "BRCA1" as gene name.

Best,
Shilin

2015-07-08 10:17 GMT-05:00 Koen Hendriks notifications@github.com:

Hi Shilin,

I think you misunderstood what I wrote, no problem.
The genes in the gtf file, bed file and generalist file are already the
same as I pointed out earlier.

Or did I miss something regarding this
(they all have the identifier NM_007294)

Best,

Koen
On 8 Jul 2015 at 17:11:36, slzhao (notifications@github.com) wrote:

Hi Koen,

The genes in gtf file should be the same with bed and genelist file. So you
can change it from "BRCA1" to "NM_007294" and try it again. I think it
should work.

Best,
Shilin

2015-07-08 4:05 GMT-05:00 Koen Hendriks notifications@github.com:

Hi Shilin,

The covered percentage file indeed contains the header (I should've made
that clear earlier, sorry)

The genelist contains NM_007294, the refseq identifier for the BRCA1
gene
2.

The bedfile contains these refseq identifers for all genes, but for
NM_007294 it looks like this:

chr17 41196301 41197829 NM_007294
chr17 41199649 41199730 NM_007294
chr17 41201127 41201221 NM_007294
chr17 41203069 41203144 NM_007294
chr17 41209058 41209162 NM_007294
chr17 41215339 41215400 NM_007294
chr17 41215880 41215978 NM_007294
chr17 41219614 41219722 NM_007294
chr17 41222934 41223265 NM_007294
chr17 41226337 41226548 NM_007294
chr17 41228494 41228641 NM_007294
chr17 41234410 41234602 NM_007294
chr17 41242950 41243059 NM_007294
chr17 41243441 41246887 NM_007294
chr17 41247852 41247949 NM_007294
chr17 41249250 41249316 NM_007294
chr17 41251781 41251907 NM_007294
chr17 41256128 41256288 NM_007294
chr17 41256874 41256983 NM_007294
chr17 41258462 41258560 NM_007294
chr17 41267732 41267806 NM_007294
chr17 41276023 41276142 NM_007294
chr17 41277277 41277510 NM_007294

Which in this case are the positions of all the Exons in the BRCA1 gene.

  1. The before mentioned identifer is included in the GTF file, the
    data for it looks like this:

899 NM_007294 chr17 - 41196311 41277500 41197694 41276113 23

41196311,41199659,41201137,41203079,41209068,41215349,41215890,41219624,41222944,41226347,41228504,41234420,41242960,41243451,41247862,41249260,41251791,41256138,41256884,41258472,41267742,41276033,41277287,

41197819,41199720,41201211,41203134,41209152,41215390,41215968,41219712,41223255,41226538,41228631,41234592,41243049,41246877,41247939,41249306,41251897,41256278,41256973,41258550,41267796,41276132,41277500,
0 BRCA1 cmpl cmpl 1,0,1,0,0,1,1,0,1,2,1,0,1,1,2,1,0,1,2,2,2,0,-1,

If you need more info, let me know!

Best,

Koen


Reply to this email directly or view it on GitHub
#6 (comment).


Reply to this email directly or view it on GitHub.


Reply to this email directly or view it on GitHub
#6 (comment).

@pineapple216
Copy link
Author

Hi Shilin,

I see :) Guess I missed that one.
It might be an idea to write some sort of parser which parses refseq ID’s for gene names and vice versa.

Koen

On 8 Jul 2015 at 17:41:55, slzhao (notifications@github.com) wrote:

Hi Koen,

Currently we are using "BRCA1" in "0 BRCA1 cmpl cmpl
1,0,1,0,0,1,1,0,1,2,1,0,1,1,2,1,0,1,2,2,2,0,-1" as gene name, not "NM_007294"
in "899 NM_007294 chr17 - 41196311 41277500 41197694". So you may need to
change "BRCA1" to "NM_007294".
I will update the code in next version so that it will read both "NM_007294"
and "BRCA1" as gene name.

Best,
Shilin

2015-07-08 10:17 GMT-05:00 Koen Hendriks notifications@github.com:

Hi Shilin,

I think you misunderstood what I wrote, no problem.
The genes in the gtf file, bed file and generalist file are already the
same as I pointed out earlier.

Or did I miss something regarding this
(they all have the identifier NM_007294)

Best,

Koen
On 8 Jul 2015 at 17:11:36, slzhao (notifications@github.com) wrote:

Hi Koen,

The genes in gtf file should be the same with bed and genelist file. So you
can change it from "BRCA1" to "NM_007294" and try it again. I think it
should work.

Best,
Shilin

2015-07-08 4:05 GMT-05:00 Koen Hendriks notifications@github.com:

Hi Shilin,

The covered percentage file indeed contains the header (I should've made
that clear earlier, sorry)

The genelist contains NM_007294, the refseq identifier for the BRCA1
gene
2.

The bedfile contains these refseq identifers for all genes, but for
NM_007294 it looks like this:

chr17 41196301 41197829 NM_007294
chr17 41199649 41199730 NM_007294
chr17 41201127 41201221 NM_007294
chr17 41203069 41203144 NM_007294
chr17 41209058 41209162 NM_007294
chr17 41215339 41215400 NM_007294
chr17 41215880 41215978 NM_007294
chr17 41219614 41219722 NM_007294
chr17 41222934 41223265 NM_007294
chr17 41226337 41226548 NM_007294
chr17 41228494 41228641 NM_007294
chr17 41234410 41234602 NM_007294
chr17 41242950 41243059 NM_007294
chr17 41243441 41246887 NM_007294
chr17 41247852 41247949 NM_007294
chr17 41249250 41249316 NM_007294
chr17 41251781 41251907 NM_007294
chr17 41256128 41256288 NM_007294
chr17 41256874 41256983 NM_007294
chr17 41258462 41258560 NM_007294
chr17 41267732 41267806 NM_007294
chr17 41276023 41276142 NM_007294
chr17 41277277 41277510 NM_007294

Which in this case are the positions of all the Exons in the BRCA1 gene.

  1. The before mentioned identifer is included in the GTF file, the
    data for it looks like this:

899 NM_007294 chr17 - 41196311 41277500 41197694 41276113 23

41196311,41199659,41201137,41203079,41209068,41215349,41215890,41219624,41222944,41226347,41228504,41234420,41242960,41243451,41247862,41249260,41251791,41256138,41256884,41258472,41267742,41276033,41277287,

41197819,41199720,41201211,41203134,41209152,41215390,41215968,41219712,41223255,41226538,41228631,41234592,41243049,41246877,41247939,41249306,41251897,41256278,41256973,41258550,41267796,41276132,41277500,
0 BRCA1 cmpl cmpl 1,0,1,0,0,1,1,0,1,2,1,0,1,1,2,1,0,1,2,2,2,0,-1,

If you need more info, let me know!

Best,

Koen


Reply to this email directly or view it on GitHub
#6 (comment).


Reply to this email directly or view it on GitHub.


Reply to this email directly or view it on GitHub
#6 (comment).


Reply to this email directly or view it on GitHub.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants