Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible to obtain percent nucleotide divergence from motus svn_call output? #39

Closed
mikemc opened this issue Oct 31, 2019 · 2 comments
Closed
Assignees

Comments

@mikemc
Copy link

mikemc commented Oct 31, 2019

I am trying to determine whether some mOTUs in my samples have more diversity within them than others. In particular, I would like to be able to compare the percent nucleotide divergence (number of SNVs per bp) between two samples that I see for mOTU 1 to what I see for mOTU2. It seems like the distances returned by motus snv_call are not suited for this analysis, since they seem to be normalized differently for different OTUs, and also depend on how much variation is seen in the samples. E.g., if I leave out some samples that contain a lot of diversity for a given OTU and rerun motus snv_call then the distances between the remaining samples will increase.

Is there a way to convert these distances to be in units of nucleotide changes per total marker-gene length, so that they can be compared across OTUs and so the distance between two samples won't depend on the amount of diversity across other samples?

Thank you for your help!

@LucasPaoli
Copy link
Contributor

LucasPaoli commented Nov 6, 2019

Hi Mike,

The distances themselves cannot be converted back to the information you are looking for.

However, if you run motus snv_call with the -k option, it will keep the intermediary files.

With that you can (1) either parse the filtered output for the number of SNVs per mOTU or (2) run:
python /path/to/metasnv/metaSNV_DistDiv.py --filt /path/to/snv_call_output/filtered-m5-d10-b80-c5-p0.9 --div --n_threads n

This command will create new files in the distance directory of the output (in this case distances-m5-d10-b80-c5-p0.9), including some with the suffix .diversity. These are matrices of the intra and inter sample nucleotide diversity, i.e. a measure of the number of SNV weighted by their frequency. The intra-sample diversities are on the diagonal.

From your post I think this is what you want :)

Cheers,

Lucas

@mikemc
Copy link
Author

mikemc commented Nov 16, 2019

Thanks for you advice Lucas. I wasn't able to get your choice (2) to work, as I get the errors

ERROR: No such file '/.all_cov.tab',
ERROR: No such file '/.all_perc.tab',
ERROR: No such file '/bed_header'

and can't seem to tell metaSNV_DistDiv.py where these files are (they are in the snv_call_output/ folder), and copying them into the filtered-m5-d10-b80-c5-p0.9 folder didn't make a difference.

But I've decided for my project to work with the BAM files produced by motus map_snv to do my own genotyping and distance calculations.

@mikemc mikemc closed this as completed Nov 16, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants