Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calling indels (GATK) #164

Closed
gustavo-miranda opened this issue Jul 23, 2019 · 2 comments
Closed

Calling indels (GATK) #164

gustavo-miranda opened this issue Jul 23, 2019 · 2 comments

Comments

@gustavo-miranda
Copy link

Hi Brant,

I am using the seqcap_pop scripts within some of the phyluce tools and I am having a problem with indel calling.

When calling indels, we should provide the path to match-contigs-to-probes fasta file (-R) and to the merged bam file (-I). In my match-contigs-to-probes folder I have several fasta files, one for each individual of SpeciesA. In my merge-bams folder, I have only one single bam file (and its index) with bams of all specimens merged into the same file.

The problem I am having is that when I set the path/to/match-contigs-to-probes/SpeciesA_specimen050.fasta it only reads information of SpeciesA_specimen050 in the bam file and gives me an error for all other specimens. Here is how the output looks like:

MESSAGE: Badly formed genome location: Contig uce-4998_montanus051 given as location, but this contig isn't present in the Fasta sequence dictionary

Is there a way to use a command that makes GATK reads a file with a list of paths to all fastas making the the program run all specimens at once?

Here are the commands I am using:
java -Xmx2g -jar ~/anaconda/GenomeAnalysisTK-3.3-0/GenomeAnalysisTK.jar
-T RealignerTargetCreator
-R /path/to/4_match-contigs-to-probes/Genus_species.fasta
-I /path/to/7_merge-bams/Genus_species.bam
--minReadsAtLocus 7
-o /path/to/8_GATK/Genus_species.intervals

This might be of interest of @mgharvey too.

Thank you.
Gustavo

@brantfaircloth
Copy link
Member

Hi Gustavo,

The way the GATK code is written, at present, requires that you only have a single reference assembly to which you have aligned your individual data. So, you cannot have the code align data for multiple individuals to multiple references.

Hope that helps clarify,
-b

@gustavo-miranda
Copy link
Author

Hi Brant,

Yes, it does clarify. Thank you!

Gustavo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants