Error with index.genotype command. Error in file.exists(file) : invalid 'file' argument #1

TobyGurran · 2016-06-05T11:30:20Z

Dear Dr Monlong,

Many thanks for publishing and making available such an interesting and useful package! I find splice QTLs very interesting and would very much like to identify and study some from my cancer dataset.

I have encountered an error with the index.genotype command which I hope you will be able to help me with.

As per the instructions on your sQTLseekeR Github page, https://github.com/jmonlong/sQTLseekeR, I have prepared my genotype information as described, with chromosome, snp start, snp end, snpID, then my samples with genotypes coded 0,1,2,(-1 for missing):

output_Reference_file_Transpose_a2Version_CHR_22.traw[1:3,1:7]
chr start end snpId sample1 sample2 sample3
1 22 16054311 16054311 rs102459 2 2 2
2 22 16054713 16054713 rs230493 2 2 2
3 22 16066757 16066757 rs356385 2 2 2

However when I try and run the index.genotype command as per the "run-example" page, https://github.com/jmonlong/sQTLseekeR/blob/master/scripts/run-example.R, I get the following error:

CHR_22.indexed <- index.genotype(output_Reference_file_Transpose_a2Version_CHR_22.traw)
Error in file.exists(file) : invalid 'file' argument

Do have any suggestions as to why this could be?

Admittedly the file I am running the command on contains information from chromosome 22 (I decided to run on a small subset first). Could this be confusing the programme by it not containing every chromosome?

I am sure that I have created the strcture of the file correclty, because if I use a file which does not have the correct number of input columns as is stipulated in the instructions, I get a different error telling me that those columns are missing.

genotype.indexed.f <- index.genotype(incorrect.table)
Error in index.genotype(genotype.f) :
Missing column or in incorrect order. The first 4 columns must be 'chr', 'start', 'end' and 'snpId'.

Could it also be that my data is not in the correct format? Is the data required to be in .tsv format? Because the run-example page reads:

genotype.f="snps-012coded.tsv"
#1) Index the genotype file (if not done externally before)

genotype.indexed.f = index.genotype(genotype.f)

My data is not in .tsv, but it is already read into R. I would guess that this is unlikely the issue, because no matter what format the data is in prior to being read into R, it will become a dataframe once it is read in.
However, I cannot actually see a line in the run-example where the .tsv is actually read in. read.table is used to read in transcript expression in Step 2, and to read in the bed file in step 3. However I cannot actually see a line to specifically read in the .tsv, which is what makes me wonder if it is required to be specifically in that format.
#2) Prepare transcript expression

te.df= read.table(trans.exp.f,as.is=TRUE,header=TRUE,sep="\t")
#3) Test gene/SNP associations

gene.bed= read.table(gene.bed.f,as.is=TRUE,sep="\t")

As a potential solution to this problem, your example says "Index the genotype file (if not done externally before)", which implies that this step can be achieved another way. If I am unable to get this command to work, is there an alternative method I can use to compress and index the genotypes, as the index.genotype command is supposed to do? Would you be able to point me in the direction of a suitable package with which to do that?

I sincerely appreciate your time and I would be extremely grateful of assistance you are able to give!! I look forward to referencing your package when I have found some novel splice QTLs.

And I a using R version 3.2.4 on a linux server if is important.

Many thanks!

jmonlong · 2016-06-06T00:24:22Z

Dear Toby,

Thanks for your enthusiasm and detailed message !

As you said, the data seems to be formatted correctly. The problem is actually what you mentioned: that index.genotype is supposed to get as input the name of a file, not a R object. The reason for this is to avoid loading the entire file in R (as these genotypes can be quite large). Under the hood, the file won't actually be loaded in a data.frame but will be directly compressed and indexed using Rsamtools functions.

I'll try to clear the documentation and error messages, thanks for the feedback.

(As you mentioned the other solution would be to compress/index the file outside of R, using the tabix program. But anyway, now it should work within R when you use the file name instead of the R object.)

Don't hesitate if you have any other problems.

jmonlong closed this as completed Aug 18, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error with index.genotype command. Error in file.exists(file) : invalid 'file' argument #1

Error with index.genotype command. Error in file.exists(file) : invalid 'file' argument #1

TobyGurran commented Jun 5, 2016

jmonlong commented Jun 6, 2016

Error with index.genotype command. Error in file.exists(file) : invalid 'file' argument #1

Error with index.genotype command. Error in file.exists(file) : invalid 'file' argument #1

Comments

TobyGurran commented Jun 5, 2016

jmonlong commented Jun 6, 2016