Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue #18. Add pre-computed outputs for the valid files #19

Merged
merged 14 commits into from
Nov 27, 2023

Conversation

PeterKnealeCMRI
Copy link

Added basic logic to pre-translate the valid HGVS entries.
I've left it running on my machine to translate the rest, will take 1-2 hours.

HGVS,Chromosome,Position,Ref,Alt
NM_001110556.2:c.3396G>T,X,154360399,C,A
NM_000284.4:c.1172_*3del,X,19359651,TAAGGG,T
NM_004006.3:c.8547+18C>T,X,31496770,G,A
NM_172245.4:c.780+135G>A,Y,1294596,G,A
NM_001353921.2:c.145A>G,X,63724597,T,C

@h-joshi
Copy link

h-joshi commented Nov 15, 2023

I just realised, some genes have duplicates in both chromosome X and Y

## Request (2)
curl -X "POST" "https://rest.ensembl.org/variant_recoder/homo_sapiens?fields=None&vcf_string=1" \
     -H 'Content-Type: application/json' \
     -H 'Accept: application/json' \
     -d $'{
  "ids": [
    "NM_172245.4:c.780+135G>A",
    "NM_006883.2:c.*5G>A",
    "NM_000451.4:c.106C>T",
    "NM_172245.4:c.663C>A",
    "NM_000451.4:c.49A>T",
    "NM_172245.4:c.960G>C",
    "NM_006883.2:c.-432-45C>T",
    "NM_000451.4:c.506G>A",
    "NM_172245.4:c.855C>T",
    "NM_000451.4:c.391G>T",
    "NM_172245.4:c.473+12C>T",
    "NM_000451.4:c.463G>T",
    "NM_172245.4:c.132G>T",
    "NM_000451.4:c.583C>T",
    "NM_000451.4:c.633+5C>G",
    "NM_172245.4:c.220-80G>A",
    "NM_172245.4:c.947-14G>A"
  ]
}'

For entries in output.csv, where the chromosome is "Y", it is Ok to replace the chromosome with "X". All of these genes have paralogs (duplicate genes) that have identical chromosome coordinates irrespective of whether it's X or Y (output pasted below)

As a side, Modify the download.py script, so that only the entry beginning with X- is picked up as the translated VCF

[
  {
    "A": {
      "input": "NM_172245.4:c.780+135G>A",
      "vcf_string": [
        "Y-1294596-G-A",
        "LRG_186-30797-G-A",
        "X-1294596-G-A"
      ]
    }
  },
  {
    "A": {
      "input": "NM_006883.2:c.*5G>A",
      "vcf_string": [
        "Y-658834-G-A",
        "X-658834-G-A",
        "LRG_710-39491-G-A"
      ]
    }
  },
  {
    "T": {
      "vcf_string": [
        "Y-631003-C-T",
        "X-631003-C-T",
        "LRG_710-11660-C-T"
      ],
      "input": "NM_000451.4:c.106C>T"
    }
  },
  {
    "A": {
      "input": "NM_172245.4:c.663C>A",
      "vcf_string": [
        "Y-1294344-C-A",
        "LRG_186-30545-C-A",
        "X-1294344-C-A"
      ]
    }
  },
  {
    "T": {
      "vcf_string": [
        "Y-630946-A-T",
        "X-630946-A-T",
        "LRG_710-11603-A-T"
      ],
      "input": "NM_000451.4:c.49A>T"
    }
  },
  {
    "C": {
      "input": "NM_172245.4:c.960G>C",
      "vcf_string": [
        "Y-1303936-G-C",
        "LRG_186-40137-G-C",
        "X-1303936-G-C"
      ]
    }
  },
  {
    "T": {
      "vcf_string": [
        "Y-630421-C-T",
        "X-630421-C-T",
        "LRG_710-11078-C-T"
      ],
      "input": "NM_006883.2:c.-432-45C>T"
    }
  },
  {
    "A": {
      "input": "NM_000451.4:c.506G>A",
      "vcf_string": [
        "Y-640840-G-A",
        "X-640840-G-A",
        "LRG_710-21497-G-A"
      ]
    }
  },
  {
    "T": {
      "input": "NM_172245.4:c.855C>T",
      "vcf_string": [
        "Y-1300535-C-T",
        "LRG_186-36736-C-T",
        "X-1300535-C-T"
      ]
    }
  },
  {
    "T": {
      "input": "NM_000451.4:c.391G>T",
      "vcf_string": [
        "Y-634731-G-T",
        "X-634731-G-T",
        "LRG_710-15388-G-T"
      ]
    }
  },
  {
    "T": {
      "vcf_string": [
        "Y-1288900-C-T",
        "LRG_186-25101-C-T",
        "X-1288900-C-T"
      ],
      "input": "NM_172245.4:c.473+12C>T"
    }
  },
  {
    "T": {
      "vcf_string": [
        "Y-634803-G-T",
        "X-634803-G-T",
        "LRG_710-15460-G-T"
      ],
      "input": "NM_000451.4:c.463G>T"
    }
  },
  {
    "T": {
      "input": "NM_172245.4:c.132G>T",
      "vcf_string": [
        "Y-1285833-G-T",
        "LRG_186-22034-G-T",
        "X-1285833-G-T"
      ]
    }
  },
  {
    "T": {
      "input": "NM_000451.4:c.583C>T",
      "vcf_string": [
        "Y-641037-C-T",
        "X-641037-C-T",
        "LRG_710-21694-C-T"
      ]
    }
  },
  {
    "G": {
      "vcf_string": [
        "Y-641092-C-G",
        "X-641092-C-G",
        "LRG_710-21749-C-G"
      ],
      "input": "NM_000451.4:c.633+5C>G"
    }
  },
  {
    "A": {
      "vcf_string": [
        "Y-1288439-G-A",
        "LRG_186-24640-G-A",
        "X-1288439-G-A"
      ],
      "input": "NM_172245.4:c.220-80G>A"
    }
  },
  {
    "A": {
      "vcf_string": [
        "Y-1303909-G-A",
        "LRG_186-40110-G-A",
        "X-1303909-G-A"
      ],
      "input": "NM_172245.4:c.947-14G>A"
    }
  }
]

@PeterKnealeCMRI
Copy link
Author

Ok, I'm now filtering on those starting with X- as you can below with number 4, its skipping the Y- and LRG-

Progress 1/1954
        Examining X-154360399-C-A
                Accepted
Progress 2/1954
        Examining X-19359651-TAAGGG-T
                Accepted
Progress 3/1954
        Examining X-31496770-G-A
                Accepted
Progress 4/1954
        Examining Y-1294596-G-A
                Ignored
        Examining LRG_186-30797-G-A
                Ignored
        Examining X-1294596-G-A
                Accepted
Progress 5/1954
        Examining X-63724597-T-C
                Accepted

@PeterKnealeCMRI PeterKnealeCMRI merged commit 070aeea into master Nov 27, 2023
1 check passed
@PeterKnealeCMRI PeterKnealeCMRI deleted the hgvs_to_vcf_translation_file branch November 27, 2023 03:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants