Skip to content
Brice Letcher edited this page Apr 28, 2020 · 1 revision

Coverage files

Allele base counts

Filename: allele_base_coverage.json

This file contains per base coverage counts for alleles.

Consider a read which maps exactly to the PRG and overlaps some number of allele bases. Allele base counts are incremented for every overlapping read. This file contains separate counts for each allele base.

Example

The following example consists of two sites. The first site consist of two alleles and the second site consists of three alleles.

{
	"allele_base_counts": [
		[
			[0, 0, 0],
			[1, 1, 0]
		],
		[
			[0, 0, 1],
			[2, 2, 0],
			[2, 2, 0, 1, 3]
		]
	]
}

The third allele of the second site consists of five bases and therefore five counts:

sites = data["allele_base_counts"]
first_site = sites[0]
second_site = sites[1]

assert second_site[2] == [2, 2, 0, 1, 3]

Grouped allele counts

Filename: grouped_allele_counts.json

Consider a single read with maps exactly to the PRG multiple times. Lets refer to each distinct mapping of single read as a "mapping instance". When two different mapping instances overlap a common site, the overlapped alleles are grouped together. Then, mapping coverage counts are aggregated for each allele group.

Example

{
	"grouped_allele_counts": {
		"site_counts": [
			{
				"0": 10,
				"1": 3,
				"14": 10
			},
			{
				"3": 30,
				"2": 2,
				"14": 1
			}
		],
		"allele_groups": {
			"0": [0, 2],
			"1": [0, 2, 3],
			"2": [0, 2, 4],
			"3": [2, 5],
			"14": [7, 8]
		}
	}
}
{
	"grouped_allele_counts": {
		"site_counts": [
			{
				"<allele_group_id>": <count>,
				...
			},
                        <site_index>,
                        ...
		],
		"allele_groups": {
			"<allele_group_id>": [<allele_id>, ...],
                        ...
		}
	}
}
grouped_allele_counts = data["grouped_allele_counts"]

sites = grouped_allele_counts["site_counts"]
allele_groups = grouped_allele_counts["allele_groups"]

site = sites[0]
for allele_group_id, count in site.iter():
	allele_ids = allele_groups[allele_group_id]
        print(allele_ids, count)

Allele sum coverage

Filename: allele_sum_coverage

This file contains coverage information for each allele within the PRG.

Each row (line) represents a variant site within the PRG. Each column (space separated within a single line) represents allele coverage counts.

Example

0 0 0
1 0
0 3

This example describes the coverage information for three sites. The first site consists of three alleles. The second and third sites both consist of two alleles each. Read mapping instances have overlapped the first allele of the second site once (hence: 1). Similarly, read mapping instances have overlapped the second allele of the third site three times.