Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ABBABABAwindows.py output halts half-way through scaffold (possibly due to outgroup?) #108

Open
jahnringge opened this issue Feb 14, 2024 · 2 comments

Comments

@jahnringge
Copy link

Hi Simon,

I am trying to us the ABBABABAwindows.py script to investigate population structure. I am running the script with the input

python ABBABABAwindows.py -g global_newform.geno.gz -f phased -o sliding_window_output_trial.csv -w 5000 -m 100 -s 1000 -P1 gamma -P2 4 -P3 6 -O alpha -T 10 --popsFile global_6clusters.txt --writeFailedWindows

but while the first few rows print well, the script ceases to add to the output file after some time. I have tried re-running the command several times and the output stops being added to in different places every time (eg. after 200 lines, after ca 500 lines, most recently at 302 lines...). Additionally, after the output csv file stops being modified, the script keeps processing, specifically this result:

1733 windows queued 1729 results received 370 results written.
1733 windows queued 1729 results received 370 results written.
1733 windows queued 1729 results received 370 results written.
1733 windows queued 1729 results received 370 results written.
1733 windows queued 1729 results received 370 results written.
1733 windows queued 1729 results received 370 results written.
1733 windows queued 1729 results received 370 results written.
1733 windows queued 1729 results received 370 results written.
1733 windows queued 1729 results received 370 results written.
1733 windows queued 1729 results received 370 results written.
1733 windows queued 1729 results received 370 results written.

and it just keeps going until I halt it manually.

The script runs well when I do not use the chosen outgroup ("alpha"). However, I have a limited data set and this is the only outgroup available to me, and I would really prefer to keep this outgroup, as I have previously used it for whole-genome ABBA-BABA analysis - utilising your freq.py script - successfully (in case this is relevant, it's derived allele freq. = 0).

Do you have any suggestions for how to circumvent this issue? Would really appreciate an answer on this.

Best,
Jahn

@jahnringge
Copy link
Author

jahnringge commented Feb 15, 2024

Update: I have now tried to run this script with several group combinations of my data, and the issue is not just the outgroup but rather the combination of populations used in the command, eg. -P1 3 -P2 1 -P3 6 -O gamma works but -P1 3 -P2 6 -P3 1 -O gamma does not.

Do you have any idea as to why that could be? Could you perhaps explain what kind of data ABBABABAwindows.py is looking for in order to function properly?

Best,
Jahn

@simonhmartin
Copy link
Owner

Hi Jahn, this behaviour can occur when there's an error in one of the threads but not in the main script. This is usually because one of them encounters an illegal nucleotide character. What I suggest is making a very short geno file using something like gunzip -c global_newform.geno.gz | head -n 1000 > temp.geno and re-run with that. If it works, slowly increase the file size until you find the line that is causing the error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants