Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

End-of-file during read error, discrepancy between PLINK .bim file and simulated .bim file? #9

Closed
tshalev opened this issue Jun 11, 2018 · 13 comments

Comments

@tshalev
Copy link

tshalev commented Jun 11, 2018

Hi,

I am eager to use BayesR in my genomic prediction as I have heard good things; however, I seem to be having an issue. When I try to run the program with my data I get this error:

bayesR -bfile plink -out test_bayesR -numit 10000 -burnin 2000

forrtl: severe (24): end-of-file during read, unit -5, file Internal List-Directed Read
Image PC Routine Line Source
bayesR 00000000004A9A2E Unknown Unknown Unknown
bayesR 00000000004A84C6 Unknown Unknown Unknown
bayesR 0000000000461022 Unknown Unknown Unknown
bayesR 000000000041D5CB Unknown Unknown Unknown
bayesR 000000000041CB32 Unknown Unknown Unknown
bayesR 00000000004370D3 Unknown Unknown Unknown
bayesR 00000000004358FC Unknown Unknown Unknown
bayesR 000000000040BC57 Unknown Unknown Unknown
bayesR 00000000004146B1 Unknown Unknown Unknown
bayesR 000000000040322C Unknown Unknown Unknown
libc.so.6 00007F879EB1B445 Unknown Unknown Unknown
bayesR 0000000000403129 Unknown Unknown Unknown

I don't have any problem running the program with the example data, so I checked the differences between the simulated data and my own and found that columns 5 and 6 of the simulated data are either 0, 1 or 2, whereas columns 5 and 6 of default PLINK output are the reference and alternate alleles themselves, i.e. A, T, G, C. For example:

Simulated data:
1 rs1 0 88671 1 2
1 rs2 0 114576 1 2
1 rs3 0 115699 2 1
1 rs4 0 155552 2 1
1 rs5 0 175528 1 2

My data:
2323-211_BBMergePreAssembled_Trinity_TR10034_c1_g1_i1 . 0 558 T C
2323-211_BBMergePreAssembled_Trinity_TR10054_c0_g1_i1 . 0 1134 G A
2323-211_BBMergePreAssembled_Trinity_TR10092_c0_g1_i1 . 0 1108 A T
2323-211_BBMergePreAssembled_Trinity_TR10114_c1_g3_i1 . 0 441 G C
2323-211_BBMergePreAssembled_Trinity_TR10129_c0_g1_i1 . 0 56 T C
(We don't have chromosome mapping data for this organism, these are probes from which SNPs are called).

Is there a way of converting from the alleles to the dosage information when using this format, or indeed, is this actually the problem? Someone else here suggested that their problem was with missing data, but even when removing all individuals with missing phenotype data I still run into the error.

Any help would be appreciated,
Thanks,
Tal

@syntheke
Copy link
Owner

BayesR only supports plink binary BED file format and handles missing genotype information. To make it work for your data you need to convert to best guess genotypes. Modifying bayesR to handle dosage information is straightforward by modifying the routines for loading and scaling the genotype information.

@tshalev
Copy link
Author

tshalev commented Jun 12, 2018

I'm not entirely certain I understand. The main issue I can see is that the example data has 0, 1 and 2 in the fifth and sixth columns of the .bim file, whereas PLINK normally outputs nucleotide letter codes in those columns. I'm actually not sure how 0 1 and 2 make sense, since this information references specific SNPs, and not the genotype of an individual at a SNP (unless I am misunderstanding something).

I suppose a straightforward question would be: How do I get to having the .bed .bim and .fam files looking exactly like those in the example data, from a .vcf file (the standard for storing SNP data)?

Sorry for the trouble and thanks again,
Tal

@syntheke
Copy link
Owner

The software does not use any information in the bim-file, it only counts the number of rows to know the number of SNPs in the bed-file. Googling "vcf to plink" shows quite a few ways how to do this. However, if your vcf file does not contain genotypes it will not work.

@tshalev
Copy link
Author

tshalev commented Aug 3, 2018

OK I figured out the issue. Recoding the plink file after converting to VCF fixed the problem. Thanks!

@tshalev tshalev closed this as completed Aug 3, 2018
@qadrikazmi
Copy link

hi @tshalev ,
can you please share the command of plink which you used?
I am getting the same issue but not able to solve this even after recoding.

@tshalev
Copy link
Author

tshalev commented Feb 20, 2019

@qadrikazmi, kind of embarrassed to say but I think the problem was that I was entering the phenotype data into the .bim file in Excel :p. When I added it using scripts the problem went away.

@qadrikazmi
Copy link

yes @tshalev … the issue was that it accepts space delimeted phenotype (.fam) file only.

@tshalev
Copy link
Author

tshalev commented Feb 20, 2019

Ah yes, sorry it's been a while. Yes I meant the .fam file. For plink, I generated the file using
plink --vcf file.vcf --allow-extra-chr --double-id --vcf-half-call m.
I then imported into R and had it enter the phenotype data, and afterwards had no problems running BayesR.

@GabrieleNocchi
Copy link

I am having the same issue and I am struggling to make it work.

I make my BED files using the function --make-bed in plink. I also need to add the phenotypes to my .fam file following conversion to plink from VCF. I have tried adding the phenotypes both programatically and manually but I still get the following error:

At line 572 of file baymods.f90
Fortran runtime error: End of file

Instead the simulated example data work fine. I am struggling to understand why.

@GabrieleNocchi
Copy link

Ah yes, sorry it's been a while. Yes I meant the .fam file. For plink, I generated the file using
plink --vcf file.vcf --allow-extra-chr --double-id --vcf-half-call m.
I then imported into R and had it enter the phenotype data, and afterwards had no problems running BayesR.

Did you use the flag --make-bed to make the plink files?

@tshalev
Copy link
Author

tshalev commented Jul 1, 2019

Ah yes, sorry it's been a while. Yes I meant the .fam file. For plink, I generated the file using
plink --vcf file.vcf --allow-extra-chr --double-id --vcf-half-call m.
I then imported into R and had it enter the phenotype data, and afterwards had no problems running BayesR.

Did you use the flag --make-bed to make the plink files?

I did not. I converted my vcf file directly to plink binary format( bed, bim, fam) using the plink --vcf call. I then added phenotype information to the fam file in R (i.e., writing to the sixth column of the fam file from a different file with my phenoytpe data). After that it runs fine.

@GabrieleNocchi
Copy link

Ok, I got it. I spent quite a bit of time on this so let me just write in case somebody else encounter the same issue, as for me it was not too clear from the above comments:

So, I thought my problem was similar to tshalev, as I originally added my phenotypes in the .fam file manually and also programatically using sed, simply replacing the plink default assigned -9, which stands for no phenotype, with my phenotype.

At the end the issue was simply that the .fam file needs to be a SPACE separated file, not a TAB separated file, It is a bit strange because plink seems to generate the .fam tab spaced so you have to edit it and change those tabs into spaces to make it work.

Maybe it was not the case in older plink versions.

@GabrieleNocchi
Copy link

Ah yes, sorry it's been a while. Yes I meant the .fam file. For plink, I generated the file using
plink --vcf file.vcf --allow-extra-chr --double-id --vcf-half-call m.
I then imported into R and had it enter the phenotype data, and afterwards had no problems running BayesR.

Did you use the flag --make-bed to make the plink files?

I did not. I converted my vcf file directly to plink binary format( bed, bim, fam) using the plink --vcf call. I then added phenotype information to the fam file in R (i.e., writing to the sixth column of the fam file from a different file with my phenoytpe data). After that it runs fine.

Many thanks fro your reply tshalev. I actually sorted my issue after I wrote my meesage. I replaced the tabs in the .fam files with spaces and it accepted it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants