Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: Tabix index to find min(POS) and max(POS) in each contig #1652

Open
lindenb opened this issue Feb 14, 2023 · 0 comments
Open

Question: Tabix index to find min(POS) and max(POS) in each contig #1652

lindenb opened this issue Feb 14, 2023 · 0 comments

Comments

@lindenb
Copy link
Contributor

lindenb commented Feb 14, 2023

Hi all, just a question about the API. I want to scan a vcf.gz and it's tbi to extract the min/max variant.POS for each contig.

I'm not sure if I should use the Chunk, the LinearIndex etc..

So here is a snippet of my code so far where I use the first and last entries of the linearIndex.

  final TabixIndex tbi = new TabixIndex(...)
  final List<String> contigs = tbi.getSequenceNames();
  final BinningIndexContent[] binIndexContents = tbi.getIndices();
  (...)
  //loop over each contig
  for(int tid = 0; tid < contigs.size();tid++) {
    // name of the current chromosome
    final String contig = contigs.get(tid);
    // linear index for the current chromosome
    final LinearIndex linearIndex = binIndexContents[tid].getLinearIndex();

        // virtual offset for the first variant. Is it OK ? are the offset ordered ?
	long offset=linearIndex.get(0);
	blocCompressedInputStream.seek(offset);
        // extract first variant
	li = vcfCodec.makeSourceFromStream(blocCompressedInputStream);
	if(!li.hasNext()) continue;
	final VariantContext firstVariant = vcfCodec.decode(li);
	if(firstVariant==null) {
		System.err.println("No variant for "+contig);
		continue;
		}
	 // virtual offset for the last variant.
	offset=linearIndex.get(linearIndex.size()-1);
	blocCompressedInputStream.seek(offset);
	li = vcfCodec.makeSourceFromStream(blocCompressedInputStream);
	VariantContext lastVariant = firstVariant;
	while(li.hasNext()) {
		final VariantContext ctx2 = vcfCodec.decode(li);
		if(ctx2==null) break;
		if(!ctx2.getContig().equals(contig)) break;
		lastVariant = ctx2;
		}
(...)

does it look ok ? or should I use another method ? another class ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant