-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue #18. Add pre-computed outputs for the valid files #19
Conversation
I just realised, some genes have duplicates in both chromosome X and Y
For entries in output.csv, where the chromosome is "Y", it is Ok to replace the chromosome with "X". All of these genes have paralogs (duplicate genes) that have identical chromosome coordinates irrespective of whether it's X or Y (output pasted below) As a side, Modify the [
{
"A": {
"input": "NM_172245.4:c.780+135G>A",
"vcf_string": [
"Y-1294596-G-A",
"LRG_186-30797-G-A",
"X-1294596-G-A"
]
}
},
{
"A": {
"input": "NM_006883.2:c.*5G>A",
"vcf_string": [
"Y-658834-G-A",
"X-658834-G-A",
"LRG_710-39491-G-A"
]
}
},
{
"T": {
"vcf_string": [
"Y-631003-C-T",
"X-631003-C-T",
"LRG_710-11660-C-T"
],
"input": "NM_000451.4:c.106C>T"
}
},
{
"A": {
"input": "NM_172245.4:c.663C>A",
"vcf_string": [
"Y-1294344-C-A",
"LRG_186-30545-C-A",
"X-1294344-C-A"
]
}
},
{
"T": {
"vcf_string": [
"Y-630946-A-T",
"X-630946-A-T",
"LRG_710-11603-A-T"
],
"input": "NM_000451.4:c.49A>T"
}
},
{
"C": {
"input": "NM_172245.4:c.960G>C",
"vcf_string": [
"Y-1303936-G-C",
"LRG_186-40137-G-C",
"X-1303936-G-C"
]
}
},
{
"T": {
"vcf_string": [
"Y-630421-C-T",
"X-630421-C-T",
"LRG_710-11078-C-T"
],
"input": "NM_006883.2:c.-432-45C>T"
}
},
{
"A": {
"input": "NM_000451.4:c.506G>A",
"vcf_string": [
"Y-640840-G-A",
"X-640840-G-A",
"LRG_710-21497-G-A"
]
}
},
{
"T": {
"input": "NM_172245.4:c.855C>T",
"vcf_string": [
"Y-1300535-C-T",
"LRG_186-36736-C-T",
"X-1300535-C-T"
]
}
},
{
"T": {
"input": "NM_000451.4:c.391G>T",
"vcf_string": [
"Y-634731-G-T",
"X-634731-G-T",
"LRG_710-15388-G-T"
]
}
},
{
"T": {
"vcf_string": [
"Y-1288900-C-T",
"LRG_186-25101-C-T",
"X-1288900-C-T"
],
"input": "NM_172245.4:c.473+12C>T"
}
},
{
"T": {
"vcf_string": [
"Y-634803-G-T",
"X-634803-G-T",
"LRG_710-15460-G-T"
],
"input": "NM_000451.4:c.463G>T"
}
},
{
"T": {
"input": "NM_172245.4:c.132G>T",
"vcf_string": [
"Y-1285833-G-T",
"LRG_186-22034-G-T",
"X-1285833-G-T"
]
}
},
{
"T": {
"input": "NM_000451.4:c.583C>T",
"vcf_string": [
"Y-641037-C-T",
"X-641037-C-T",
"LRG_710-21694-C-T"
]
}
},
{
"G": {
"vcf_string": [
"Y-641092-C-G",
"X-641092-C-G",
"LRG_710-21749-C-G"
],
"input": "NM_000451.4:c.633+5C>G"
}
},
{
"A": {
"vcf_string": [
"Y-1288439-G-A",
"LRG_186-24640-G-A",
"X-1288439-G-A"
],
"input": "NM_172245.4:c.220-80G>A"
}
},
{
"A": {
"vcf_string": [
"Y-1303909-G-A",
"LRG_186-40110-G-A",
"X-1303909-G-A"
],
"input": "NM_172245.4:c.947-14G>A"
}
}
] |
Ok, I'm now filtering on those starting with
|
Added basic logic to pre-translate the
valid
HGVS entries.I've left it running on my machine to translate the rest, will take 1-2 hours.