ABBABABAwindows.py output halts half-way through scaffold (possibly due to outgroup?) #108

jahnringge · 2024-02-14T14:45:35Z

Hi Simon,

I am trying to us the ABBABABAwindows.py script to investigate population structure. I am running the script with the input

python ABBABABAwindows.py -g global_newform.geno.gz -f phased -o sliding_window_output_trial.csv -w 5000 -m 100 -s 1000 -P1 gamma -P2 4 -P3 6 -O alpha -T 10 --popsFile global_6clusters.txt --writeFailedWindows

but while the first few rows print well, the script ceases to add to the output file after some time. I have tried re-running the command several times and the output stops being added to in different places every time (eg. after 200 lines, after ca 500 lines, most recently at 302 lines...). Additionally, after the output csv file stops being modified, the script keeps processing, specifically this result:

1733 windows queued 1729 results received 370 results written.
1733 windows queued 1729 results received 370 results written.
1733 windows queued 1729 results received 370 results written.
1733 windows queued 1729 results received 370 results written.
1733 windows queued 1729 results received 370 results written.
1733 windows queued 1729 results received 370 results written.
1733 windows queued 1729 results received 370 results written.
1733 windows queued 1729 results received 370 results written.
1733 windows queued 1729 results received 370 results written.
1733 windows queued 1729 results received 370 results written.
1733 windows queued 1729 results received 370 results written.

and it just keeps going until I halt it manually.

The script runs well when I do not use the chosen outgroup ("alpha"). However, I have a limited data set and this is the only outgroup available to me, and I would really prefer to keep this outgroup, as I have previously used it for whole-genome ABBA-BABA analysis - utilising your freq.py script - successfully (in case this is relevant, it's derived allele freq. = 0).

Do you have any suggestions for how to circumvent this issue? Would really appreciate an answer on this.

Best,
Jahn

The text was updated successfully, but these errors were encountered:

jahnringge · 2024-02-15T08:31:33Z

Update: I have now tried to run this script with several group combinations of my data, and the issue is not just the outgroup but rather the combination of populations used in the command, eg. -P1 3 -P2 1 -P3 6 -O gamma works but -P1 3 -P2 6 -P3 1 -O gamma does not.

Do you have any idea as to why that could be? Could you perhaps explain what kind of data ABBABABAwindows.py is looking for in order to function properly?

Best,
Jahn

simonhmartin · 2024-02-20T11:11:17Z

Hi Jahn, this behaviour can occur when there's an error in one of the threads but not in the main script. This is usually because one of them encounters an illegal nucleotide character. What I suggest is making a very short geno file using something like gunzip -c global_newform.geno.gz | head -n 1000 > temp.geno and re-run with that. If it works, slowly increase the file size until you find the line that is causing the error.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ABBABABAwindows.py output halts half-way through scaffold (possibly due to outgroup?) #108

ABBABABAwindows.py output halts half-way through scaffold (possibly due to outgroup?) #108

jahnringge commented Feb 14, 2024

jahnringge commented Feb 15, 2024 •

edited

Loading

simonhmartin commented Feb 20, 2024

ABBABABAwindows.py output halts half-way through scaffold (possibly due to outgroup?) #108

ABBABABAwindows.py output halts half-way through scaffold (possibly due to outgroup?) #108

Comments

jahnringge commented Feb 14, 2024

jahnringge commented Feb 15, 2024 • edited Loading

simonhmartin commented Feb 20, 2024

jahnringge commented Feb 15, 2024 •

edited

Loading