End-of-file during read error, discrepancy between PLINK .bim file and simulated .bim file? #9

tshalev · 2018-06-11T17:38:27Z

Hi,

I am eager to use BayesR in my genomic prediction as I have heard good things; however, I seem to be having an issue. When I try to run the program with my data I get this error:

bayesR -bfile plink -out test_bayesR -numit 10000 -burnin 2000

forrtl: severe (24): end-of-file during read, unit -5, file Internal List-Directed Read
Image PC Routine Line Source
bayesR 00000000004A9A2E Unknown Unknown Unknown
bayesR 00000000004A84C6 Unknown Unknown Unknown
bayesR 0000000000461022 Unknown Unknown Unknown
bayesR 000000000041D5CB Unknown Unknown Unknown
bayesR 000000000041CB32 Unknown Unknown Unknown
bayesR 00000000004370D3 Unknown Unknown Unknown
bayesR 00000000004358FC Unknown Unknown Unknown
bayesR 000000000040BC57 Unknown Unknown Unknown
bayesR 00000000004146B1 Unknown Unknown Unknown
bayesR 000000000040322C Unknown Unknown Unknown
libc.so.6 00007F879EB1B445 Unknown Unknown Unknown
bayesR 0000000000403129 Unknown Unknown Unknown

I don't have any problem running the program with the example data, so I checked the differences between the simulated data and my own and found that columns 5 and 6 of the simulated data are either 0, 1 or 2, whereas columns 5 and 6 of default PLINK output are the reference and alternate alleles themselves, i.e. A, T, G, C. For example:

Simulated data:
1 rs1 0 88671 1 2
1 rs2 0 114576 1 2
1 rs3 0 115699 2 1
1 rs4 0 155552 2 1
1 rs5 0 175528 1 2

My data:
2323-211_BBMergePreAssembled_Trinity_TR10034_c1_g1_i1 . 0 558 T C
2323-211_BBMergePreAssembled_Trinity_TR10054_c0_g1_i1 . 0 1134 G A
2323-211_BBMergePreAssembled_Trinity_TR10092_c0_g1_i1 . 0 1108 A T
2323-211_BBMergePreAssembled_Trinity_TR10114_c1_g3_i1 . 0 441 G C
2323-211_BBMergePreAssembled_Trinity_TR10129_c0_g1_i1 . 0 56 T C
(We don't have chromosome mapping data for this organism, these are probes from which SNPs are called).

Is there a way of converting from the alleles to the dosage information when using this format, or indeed, is this actually the problem? Someone else here suggested that their problem was with missing data, but even when removing all individuals with missing phenotype data I still run into the error.

Any help would be appreciated,
Thanks,
Tal

syntheke · 2018-06-11T22:21:20Z

BayesR only supports plink binary BED file format and handles missing genotype information. To make it work for your data you need to convert to best guess genotypes. Modifying bayesR to handle dosage information is straightforward by modifying the routines for loading and scaling the genotype information.

tshalev · 2018-06-12T17:08:28Z

I'm not entirely certain I understand. The main issue I can see is that the example data has 0, 1 and 2 in the fifth and sixth columns of the .bim file, whereas PLINK normally outputs nucleotide letter codes in those columns. I'm actually not sure how 0 1 and 2 make sense, since this information references specific SNPs, and not the genotype of an individual at a SNP (unless I am misunderstanding something).

I suppose a straightforward question would be: How do I get to having the .bed .bim and .fam files looking exactly like those in the example data, from a .vcf file (the standard for storing SNP data)?

Sorry for the trouble and thanks again,
Tal

syntheke · 2018-06-13T01:53:25Z

The software does not use any information in the bim-file, it only counts the number of rows to know the number of SNPs in the bed-file. Googling "vcf to plink" shows quite a few ways how to do this. However, if your vcf file does not contain genotypes it will not work.

tshalev · 2018-08-03T16:56:00Z

OK I figured out the issue. Recoding the plink file after converting to VCF fixed the problem. Thanks!

qadrikazmi · 2019-02-20T13:09:36Z

hi @tshalev ,
can you please share the command of plink which you used?
I am getting the same issue but not able to solve this even after recoding.

tshalev · 2019-02-20T17:27:08Z

@qadrikazmi, kind of embarrassed to say but I think the problem was that I was entering the phenotype data into the .bim file in Excel :p. When I added it using scripts the problem went away.

qadrikazmi · 2019-02-20T17:29:03Z

yes @tshalev … the issue was that it accepts space delimeted phenotype (.fam) file only.

tshalev · 2019-02-20T17:44:11Z

Ah yes, sorry it's been a while. Yes I meant the .fam file. For plink, I generated the file using
plink --vcf file.vcf --allow-extra-chr --double-id --vcf-half-call m.
I then imported into R and had it enter the phenotype data, and afterwards had no problems running BayesR.

GabrieleNocchi · 2019-07-01T15:27:22Z

I am having the same issue and I am struggling to make it work.

I make my BED files using the function --make-bed in plink. I also need to add the phenotypes to my .fam file following conversion to plink from VCF. I have tried adding the phenotypes both programatically and manually but I still get the following error:

At line 572 of file baymods.f90
Fortran runtime error: End of file

Instead the simulated example data work fine. I am struggling to understand why.

GabrieleNocchi · 2019-07-01T15:28:34Z

Ah yes, sorry it's been a while. Yes I meant the .fam file. For plink, I generated the file using
plink --vcf file.vcf --allow-extra-chr --double-id --vcf-half-call m.
I then imported into R and had it enter the phenotype data, and afterwards had no problems running BayesR.

Did you use the flag --make-bed to make the plink files?

tshalev · 2019-07-01T15:50:28Z

Ah yes, sorry it's been a while. Yes I meant the .fam file. For plink, I generated the file using
plink --vcf file.vcf --allow-extra-chr --double-id --vcf-half-call m.
I then imported into R and had it enter the phenotype data, and afterwards had no problems running BayesR.

Did you use the flag --make-bed to make the plink files?

I did not. I converted my vcf file directly to plink binary format( bed, bim, fam) using the plink --vcf call. I then added phenotype information to the fam file in R (i.e., writing to the sixth column of the fam file from a different file with my phenoytpe data). After that it runs fine.

GabrieleNocchi · 2019-07-01T15:50:52Z

Ok, I got it. I spent quite a bit of time on this so let me just write in case somebody else encounter the same issue, as for me it was not too clear from the above comments:

So, I thought my problem was similar to tshalev, as I originally added my phenotypes in the .fam file manually and also programatically using sed, simply replacing the plink default assigned -9, which stands for no phenotype, with my phenotype.

At the end the issue was simply that the .fam file needs to be a SPACE separated file, not a TAB separated file, It is a bit strange because plink seems to generate the .fam tab spaced so you have to edit it and change those tabs into spaces to make it work.

Maybe it was not the case in older plink versions.

GabrieleNocchi · 2019-07-01T15:53:34Z

Ah yes, sorry it's been a while. Yes I meant the .fam file. For plink, I generated the file using
plink --vcf file.vcf --allow-extra-chr --double-id --vcf-half-call m.
I then imported into R and had it enter the phenotype data, and afterwards had no problems running BayesR.

Did you use the flag --make-bed to make the plink files?

I did not. I converted my vcf file directly to plink binary format( bed, bim, fam) using the plink --vcf call. I then added phenotype information to the fam file in R (i.e., writing to the sixth column of the fam file from a different file with my phenoytpe data). After that it runs fine.

Many thanks fro your reply tshalev. I actually sorted my issue after I wrote my meesage. I replaced the tabs in the .fam files with spaces and it accepted it.

tshalev closed this as completed Aug 3, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

End-of-file during read error, discrepancy between PLINK .bim file and simulated .bim file? #9

End-of-file during read error, discrepancy between PLINK .bim file and simulated .bim file? #9

tshalev commented Jun 11, 2018

syntheke commented Jun 11, 2018

tshalev commented Jun 12, 2018

syntheke commented Jun 13, 2018

tshalev commented Aug 3, 2018

qadrikazmi commented Feb 20, 2019

tshalev commented Feb 20, 2019 •

edited

Loading

qadrikazmi commented Feb 20, 2019

tshalev commented Feb 20, 2019

GabrieleNocchi commented Jul 1, 2019

GabrieleNocchi commented Jul 1, 2019

tshalev commented Jul 1, 2019

GabrieleNocchi commented Jul 1, 2019

GabrieleNocchi commented Jul 1, 2019

End-of-file during read error, discrepancy between PLINK .bim file and simulated .bim file? #9

End-of-file during read error, discrepancy between PLINK .bim file and simulated .bim file? #9

Comments

tshalev commented Jun 11, 2018

syntheke commented Jun 11, 2018

tshalev commented Jun 12, 2018

syntheke commented Jun 13, 2018

tshalev commented Aug 3, 2018

qadrikazmi commented Feb 20, 2019

tshalev commented Feb 20, 2019 • edited Loading

qadrikazmi commented Feb 20, 2019

tshalev commented Feb 20, 2019

GabrieleNocchi commented Jul 1, 2019

GabrieleNocchi commented Jul 1, 2019

tshalev commented Jul 1, 2019

GabrieleNocchi commented Jul 1, 2019

GabrieleNocchi commented Jul 1, 2019

tshalev commented Feb 20, 2019 •

edited

Loading