-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How can the search parameters be modified? #33
Comments
Yes absolutely.
Another possibility is modifying the minimum score threshold. The default threshold for that score is 4 which roughly means that you need at least 4 shell or cloud genes close together to get a RGP, when other parameters are set to default. If you feel like this is not strict enough, and only want the regions with a lot more genes, you can change this threshold, as such:
This will set the threshold to 8 instead of the default 4. There are other parameters, but they are less straight forward to explain. You can see them all by running Afterward, you can regenerate the 'plastic_regions.tsv' file by running
If you do start tweaking the parameters, you might find the following command useful:
|
Hello! --persistent penalty
If I increase or decrease these values, what should I expect? Thanks for taking the time to answer these basic things. |
Hello Taken alone, those two parameters kind of oppose each other. Persistent penalty default is 3. Decreasing it might fuse two RGPs that are close together along the genome but separated by some persistent genes. Increasing it might divide RGPs into multiple components if there are persistent genes included in them. Variable gain default is 1. Increasing it might fuse two RGPs that are close together along the genome, while decreasing it might divide RGPs into multiple components if there are persistent genes included in them. In any case however, having persistent genes in the middle of RGPs is relatively rare, so modifying those parameters slightly should not have a lot of impact, while changing them greatly might not give you biologically meaningful results anymore, as you may group RGPs together over long stretches of persistent genes. If you want to understand more in detail how all of those parameters interact, the full method is detailed in this preprint : https://www.biorxiv.org/content/10.1101/2020.03.26.007484v1.full In part 2.1.1, parameter p in the formula corresponds to persistent penalty, parameter v to variable gain Only 2.1.1 and 2.1.2 will be of interest for understanding how the RGPs are predicted. If something is unclear, do not hesitate to ask more questions :) |
Since this is from may and there has been no other questions since, I will close this issue. If you have any other question please do not hesitate to reopen it. |
Regards
I am analyzing some bacterial strains in which I am sure there are RGPs and so far Ppanggolin has worked wonders. However I have many RGPs, is there a way to increase the search requirement? Could the threshold be modified? and in this way obtain fewer RGPs
Thanks in advance
The text was updated successfully, but these errors were encountered: