You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Many thanks for publishing and making available such an interesting and useful package! I find splice QTLs very interesting and would very much like to identify and study some from my cancer dataset.
I have encountered an error with the index.genotype command which I hope you will be able to help me with.
As per the instructions on your sQTLseekeR Github page, https://github.com/jmonlong/sQTLseekeR, I have prepared my genotype information as described, with chromosome, snp start, snp end, snpID, then my samples with genotypes coded 0,1,2,(-1 for missing):
CHR_22.indexed <- index.genotype(output_Reference_file_Transpose_a2Version_CHR_22.traw)
Error in file.exists(file) : invalid 'file' argument
Do have any suggestions as to why this could be?
Admittedly the file I am running the command on contains information from chromosome 22 (I decided to run on a small subset first). Could this be confusing the programme by it not containing every chromosome?
I am sure that I have created the strcture of the file correclty, because if I use a file which does not have the correct number of input columns as is stipulated in the instructions, I get a different error telling me that those columns are missing.
genotype.indexed.f <- index.genotype(incorrect.table)
Error in index.genotype(genotype.f) :
Missing column or in incorrect order. The first 4 columns must be 'chr', 'start', 'end' and 'snpId'.
Could it also be that my data is not in the correct format? Is the data required to be in .tsv format? Because the run-example page reads:
genotype.f="snps-012coded.tsv" #1) Index the genotype file (if not done externally before)
genotype.indexed.f = index.genotype(genotype.f)
My data is not in .tsv, but it is already read into R. I would guess that this is unlikely the issue, because no matter what format the data is in prior to being read into R, it will become a dataframe once it is read in.
However, I cannot actually see a line in the run-example where the .tsv is actually read in. read.table is used to read in transcript expression in Step 2, and to read in the bed file in step 3. However I cannot actually see a line to specifically read in the .tsv, which is what makes me wonder if it is required to be specifically in that format. #2) Prepare transcript expression
te.df= read.table(trans.exp.f,as.is=TRUE,header=TRUE,sep="\t") #3) Test gene/SNP associations
As a potential solution to this problem, your example says "Index the genotype file (if not done externally before)", which implies that this step can be achieved another way. If I am unable to get this command to work, is there an alternative method I can use to compress and index the genotypes, as the index.genotype command is supposed to do? Would you be able to point me in the direction of a suitable package with which to do that?
I sincerely appreciate your time and I would be extremely grateful of assistance you are able to give!! I look forward to referencing your package when I have found some novel splice QTLs.
And I a using R version 3.2.4 on a linux server if is important.
Many thanks!
The text was updated successfully, but these errors were encountered:
As you said, the data seems to be formatted correctly. The problem is actually what you mentioned: that index.genotype is supposed to get as input the name of a file, not a R object. The reason for this is to avoid loading the entire file in R (as these genotypes can be quite large). Under the hood, the file won't actually be loaded in a data.frame but will be directly compressed and indexed using Rsamtools functions.
I'll try to clear the documentation and error messages, thanks for the feedback.
(As you mentioned the other solution would be to compress/index the file outside of R, using the tabix program. But anyway, now it should work within R when you use the file name instead of the R object.)
Dear Dr Monlong,
Many thanks for publishing and making available such an interesting and useful package! I find splice QTLs very interesting and would very much like to identify and study some from my cancer dataset.
I have encountered an error with the index.genotype command which I hope you will be able to help me with.
As per the instructions on your sQTLseekeR Github page, https://github.com/jmonlong/sQTLseekeR, I have prepared my genotype information as described, with chromosome, snp start, snp end, snpID, then my samples with genotypes coded 0,1,2,(-1 for missing):
However when I try and run the index.genotype command as per the "run-example" page, https://github.com/jmonlong/sQTLseekeR/blob/master/scripts/run-example.R, I get the following error:
Do have any suggestions as to why this could be?
Admittedly the file I am running the command on contains information from chromosome 22 (I decided to run on a small subset first). Could this be confusing the programme by it not containing every chromosome?
I am sure that I have created the strcture of the file correclty, because if I use a file which does not have the correct number of input columns as is stipulated in the instructions, I get a different error telling me that those columns are missing.
Could it also be that my data is not in the correct format? Is the data required to be in .tsv format? Because the run-example page reads:
genotype.f="snps-012coded.tsv"
#1) Index the genotype file (if not done externally before)
genotype.indexed.f = index.genotype(genotype.f)
My data is not in .tsv, but it is already read into R. I would guess that this is unlikely the issue, because no matter what format the data is in prior to being read into R, it will become a dataframe once it is read in.
However, I cannot actually see a line in the run-example where the .tsv is actually read in. read.table is used to read in transcript expression in Step 2, and to read in the bed file in step 3. However I cannot actually see a line to specifically read in the .tsv, which is what makes me wonder if it is required to be specifically in that format.
#2) Prepare transcript expression
te.df= read.table(trans.exp.f,as.is=TRUE,header=TRUE,sep="\t")
#3) Test gene/SNP associations
gene.bed= read.table(gene.bed.f,as.is=TRUE,sep="\t")
As a potential solution to this problem, your example says "Index the genotype file (if not done externally before)", which implies that this step can be achieved another way. If I am unable to get this command to work, is there an alternative method I can use to compress and index the genotypes, as the index.genotype command is supposed to do? Would you be able to point me in the direction of a suitable package with which to do that?
I sincerely appreciate your time and I would be extremely grateful of assistance you are able to give!! I look forward to referencing your package when I have found some novel splice QTLs.
And I a using R version 3.2.4 on a linux server if is important.
Many thanks!
The text was updated successfully, but these errors were encountered: