Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Position Converter no longer helpful #14

Open
maddiegillentine opened this issue Oct 20, 2022 · 7 comments
Open

Position Converter no longer helpful #14

maddiegillentine opened this issue Oct 20, 2022 · 7 comments
Labels
question Further information is requested

Comments

@maddiegillentine
Copy link

The position converter for the version 2 of mutalyzer was great at distinguishing between transcripts, but now does not seem to provide that function easily. Is it possible to have the previous features?

@mihailefter
Copy link
Contributor

The position converter from Mutalyzer 2 was not semantically checking nor normalizing the input/output descriptions. Moreover, it was not taking into account the fact that exonic sequences may differ between chromosomal (NC_(NM_)) and RefSeq transcript (NM_) sequences. Please check sections 5.2 and 7 from here for more details.

To lift (convert) a description over to another reference sequence the Mapper tool from Mutalyzer 3 can be used. In this example NM_003002.4:c.274G>T is mapped to NC_000011.10(NM_003002.4):c.274G>T (note that the Slice to box has to be set to transcript). It seems like there are no differences in the exonic sequences between NM_003002.4 and NC_000011.10(NM_003002.4). However, in another example we notice that the exonic sequence of NM_032790.3 is different than the one from NC_000012.12(NM_032790.3) and the mapper outputs NC_000012.12(NM_032790.3):c.[132_134del;277A>T] for NM_032790.3:c.274A>T, since it takes into account for those differences.

Please note that the mapping can be processed also starting from a chromosomal reference to a RefSeq transcript one as here.

To help users, we provide the chromosomal descriptions in the "Normalizer" whenever a transcript is used as input and only if the exonic sequences are identical (check this example). When this is not the case, a warning message is being issued, as here.

@Felicis76
Copy link

How to get geneID and the codant position when we only have genome version and "chr14:g.1524854C>T" ? The position converter for the version 2 of mutalyzer can do this before

@jfjlaros
Copy link
Member

The position converter for the version 2 of mutalyzer can do this

This is true, but the answer Mutalyzer 2 gives (NC_000014.8:g.1524854C>T) is wrong, there is no C at that position in the reference sequence. This has been the main problem of the Mutalyzer 2 Position Converter, it only converts positions, it does not check whether the variant description is correct. This is why the Position Converter in Mutalyzer 3 only accepts positions, it no longer accepts variant descriptions.

In Mutalyzer 3, the functionality you need can be found in the Normalizer under the "Equivalent Descriptions" fold. E.g., NC_000011.10:g.112088970del shows a list of eight descriptions on transcripts, all of which are guaranteed to be correct. If you want to know more about one of these variants e.g., NC_000011.10(NM_003002.4):c.274del, you can follow the link and open the "Reference Sequence Information" fold. There you will find the gene name, its aliases, the HGNC ID and other information.

The only thing that is not supported at the moment is a translation from chromosome number to accession number (e.g., chr14 to NC_000014.8. We are currently in discussion with the HGVS nomenclature committee to see if and how this should be supported. Perhaps something like GRCh38(chr14):g.1524854C>T would be in line with the current nomenclature.

@Felicis76
Copy link

Thank you
Is mutalyzer3 usable with GrCh37 ?

@jfjlaros
Copy link
Member

Is mutalyzer3 usable with GrCh37 ?

Yes, it should work for every build of any organism. GRCh37 and GRCh38 are preloaded though, to speed things up a little.

@jfjlaros jfjlaros added the question Further information is requested label Oct 29, 2022
@budsonjelmont
Copy link

Hello,

I'm trying to determine how to replicate the behavior of the Mutalyzer v2 Position Converter using the v3 API. In the Mutalyzer v2 API, I routinely used the position converter endpoint to convert from cDNA descriptions, such as the following, to genomic coordinates:

NM_000518.4:c.-79A>G

In Mutalyzer V2, this input returns the expected result, NC_000011.9:g.5248330T>C, when using the position converter. As I understand it, the new Position Converter endpoint handles positions only, so isn't quite what I'm looking for. I've tried the Mapper using the arguments below, but this input throws errors due to being out of bounds of the reference sequence and a purported mismatch between the transcript and genome sequence, which seems incorrect:

https://mutalyzer.nl/mapper?description=NM_000518.4%3Ac.-79A%3EG&reference_id=NC_000011.9&slice_to=transcript&filter=true

Is there a better way to achieve the output I'm looking for, or am I using the v3 API incorrectly?

Thank you!

@jfjlaros
Copy link
Member

jfjlaros commented Nov 4, 2022

this input throws errors due to being out of bounds of the reference sequence and a purported mismatch between the transcript and genome sequence, which seems incorrect

The description NM_000518.4:c.-79A>G is indeed invalid according to the HGVS nomenclature guidelines:

a coding or non-coding DNA reference sequence does not contain intron or 5’ and 3’ gene flanking sequences and can not be used to describe variants in introns and up/down-stream of the gene

The only way to describe variants outside of a transcript is to use a genomic reference sequence, e.g., NC_000011.9(NM_000518.5):c.-79A>G is acceptable (and correct).

The Normalizer will suggest a "Chromosomal Description" when the following conditions are met:

  1. The input description is interpretable.
  2. The transcript ID used is present in the NCBI assembly annotation.

E.g., for NM_000518.5:c.-10A>G the suggestion NC_000011.9(NM_000518.5):c.-10A>G is given. The genomic description can be obtained by following the link, the answer can be found in the "Equivalent Descriptions" fold.

Perhaps Mutalyzer 3 can give some suggestions when one or both of the conditions are not met. In this case, the remark that NC_000011.9(NM_000518.5) is the best match for NM_000518.4 in the GRCh37 assembly annotation would help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

5 participants