Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

epo file format #25

Open
xinkaitong opened this issue Nov 26, 2021 · 27 comments
Open

epo file format #25

xinkaitong opened this issue Nov 26, 2021 · 27 comments

Comments

@xinkaitong
Copy link

hello, grenaud! now, I just have an outgroup vcf file (other mammals not human), could you tell me how can I transform vcf file to epo file?

@grenaud
Copy link
Owner

grenaud commented Nov 26, 2021 via email

@xixifa
Copy link

xixifa commented Oct 6, 2023

Hello,
If I want to calculate β2 using an outgroup to call substitutions and ancestral alleles, can I just get epo files from outgroup vcf file, without common ancestor?
Thank you very much

@grenaud
Copy link
Owner

grenaud commented Oct 6, 2023

Yes! You need to convert VCF to ACF and run replaceancestor I believe. Let me know if that works!

@xixifa
Copy link

xixifa commented Oct 6, 2023

Thank you. I will try it !
If I use vcfm2acf to convert VCF to ACF , then I use usepopsrootanc to tell the root(outgroup), finally I use the replaceanc, the resulting file is the epo file that I need in betascan2 software?
Does this epo file include all the individuals I calculate later?

@xixifa
Copy link

xixifa commented Oct 6, 2023

Hello, I have another question.
Is the file2.acf.gz in https://github.com/grenaud/glactools the result using the glactools with vcfm2acf --onlyGT --fai parameter?
I am not sure if using these parameters is correct?

Thanks

@xixifa
Copy link

xixifa commented Oct 6, 2023

Hello,
the file1.acf.gz only have a outgroup individual:
image
when I use the "usepopsrootanc" parameter, I only tell the outgroup, then the error is as follows
image
I think it lack ancestral information, it is right? so the file1.acf.gz needs the root and ancestor information?

@xixifa
Copy link

xixifa commented Oct 7, 2023

Hello,
What does "ChimpHumanAncestor " refer to specifically?

Thank you very much

@grenaud
Copy link
Owner

grenaud commented Oct 7, 2023

Thank you. I will try it ! If I use vcfm2acf to convert VCF to ACF , then I use usepopsrootanc to tell the root(outgroup), finally I use the replaceanc, the resulting file is the epo file that I need in betascan2 software? Does this epo file include all the individuals I calculate later?

Would you describe seems like it will work. The EPO is only human+chimp+other great apes and the inferred ancestral states, there is a single human "individual" and that is the reference

@grenaud
Copy link
Owner

grenaud commented Oct 7, 2023

Hello, I have another question. Is the file2.acf.gz in https://github.com/grenaud/glactools the result using the glactools with vcfm2acf --onlyGT --fai parameter? I am not sure if using these parameters is correct?

Thanks

I am not sure how this was generated but acf files can be generated in very different ways.

@grenaud
Copy link
Owner

grenaud commented Oct 7, 2023

There is root and ancestor. In the case of humans, we can think of the chimps as the "root" as all humans are equitistant to the chimp. However there is a chimp-human ancestor, we do not have access to the sequence of its genome as it died out 6 million years ago. However we can make a probabilistic inference of what this genome had in terms of alleles. This is precisely what EPO does.

@xixifa
Copy link

xixifa commented Oct 9, 2023

Hello,
Thank you for your reply!

  1. May I ask if I have only root and ancestor, but no groups like other apes, is it OK to generate the follow file.acf.gz as my epo file for Betascan2?
    image

  2. After I get the epo file, I get an error adding the index file? How to create this index file?
    image

  3. The first line of the epo file seems to be different from the example you showed. Is it correct for my epo file?
    image
    Thanks

@grenaud
Copy link
Owner

grenaud commented Oct 11, 2023

How did you generate this file? Could you paste the command lines?

@xixifa
Copy link

xixifa commented Oct 12, 2023

I think I misunderstood your meaning.
Is file4.acf.gz in your example (glactools replaceanc file2.acf.gz file3.acf.gz > file4.acf.gz) equal to the 1kg_chr6.AFRsubset.acf.gz in following code ( glactools vcfm2acf --onlyGT --epo all.epo.gz --fai human_g1k_v37.fasta.fai chr6.AFRsubset.recode.vcf > 1kg_chr6.AFRsubset.acf.gz)?
These two files are actually equivalent, right?

@grenaud
Copy link
Owner

grenaud commented Oct 12, 2023

file4.acf.gz is just a placeholder name, it could be anything.

@xixifa
Copy link

xixifa commented Oct 13, 2023

sorry,

  1. glactools replaceanc file2.acf.gz <(glactools usepopsrootanc -u file1.acf.gz chimp ChimpHumanAncestor ) > file4.acf.gz
  2. glactools vcfm2acf --onlyGT --epo all.epo.gz --fai human_g1k_v37.fasta.fai chr6.AFRsubset.recode.vcf > 1kg_chr6.AFRsubset.acf.gz

What I understand is that if there is no epo file, file4 generated according to step 1 and 1kg_chr6.AFRsubset.acf.gz generated using epo in 2 are both acf files containing outgroup information. steps 1 and 2 can generate the same file, and the steps are equivalent, is it right?

@grenaud
Copy link
Owner

grenaud commented Oct 15, 2023

While I encourage people to use file descriptors, they are not great for debugging because you don't know which one fails.

Could you try to break down these commands into three separate ones, produce the output for each and inspect that using glactools view?

@xixifa
Copy link

xixifa commented Oct 15, 2023

Yes, I used 3 commands to generate the file successfully.

Since my species is not human, I want to know if the output file of 1 is the equivalent of the output file of 2 ?

Thank you very much

@grenaud
Copy link
Owner

grenaud commented Oct 15, 2023

Could you please paste the commands? I am not sure what is file 1 and file 2 here.

@xixifa
Copy link

xixifa commented Oct 15, 2023

image
I mean these two ways above. Since my object is not human, there is no epo file, so I can only use the method of 1. I am not sure whether the file I generated is the input file of betascan2? the following is my output
image

@grenaud
Copy link
Owner

grenaud commented Oct 15, 2023

Again, please no file descriptors, just single commands.

@xixifa
Copy link

xixifa commented Oct 15, 2023

The following is code of my test file.
IN1 is outgroup, ZA1 is ancestor.
test.chr24.vcf is my pop1 file.

glactools vcfm2acf --onlyGT --fai $ref_genome outgroup_root_test.chr24.vcf > outgroup_root_test.chr24.acf.gz
glactools vcfm2acf --onlyGT --fai $ref_genome test.chr24.vcf > test.chr24.acf.gz
glactools usepopsrootanc outgroup_root_test.chr24.acf.gz IN1 ZA1 > outgroup_root_test_identify.chr24.acf.gz
glactools replaceanc test.chr24.acf.gz outgroup_root_test_identify.chr24.acf.gz > outgroup_root_test_merge.chr24.acf.gz

@grenaud
Copy link
Owner

grenaud commented Oct 15, 2023

ok thank you! which command fails?

@xixifa
Copy link

xixifa commented Oct 15, 2023

No error reported.
I'm just not sure if the outgroup_root_test_merge.chr24.acf.gz is the betascan2 input file?

@grenaud
Copy link
Owner

grenaud commented Oct 15, 2023

ah no it is not! you need to use glactools acf2betascan

@xixifa
Copy link

xixifa commented Oct 15, 2023

glactools acf2betascan --useroot outgroup_root_test_merge.chr24.acf.gz | gzip > outgroup_root_test.chr24.beta.txt.gz

Is it right to do ?

@grenaud
Copy link
Owner

grenaud commented Oct 15, 2023

should be!

@xixifa
Copy link

xixifa commented Oct 15, 2023

Ok! Thank you very much !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants