Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HGNC Symbols #7

Closed
czackl opened this issue Jul 11, 2022 · 3 comments
Closed

HGNC Symbols #7

czackl opened this issue Jul 11, 2022 · 3 comments

Comments

@czackl
Copy link
Collaborator

czackl commented Jul 11, 2022

Immunedeconv requires HGNC symbols. The spatial datasets i worked with typically contained ENSEMBL / some form of gene symbol but not all HGNC.

Is this something we should require the user to manage or would it be benefical to implement something to convert the gene names? Often a simpel toupper() is kind of fixing the issue.

Trying to cut down on dependencies here :)

The toupper() solution fixes the HGNCs that are present in lowercase. All other genes are "filtered" by the intersect step between signature and sample as performed in omnideconv. This could remove genes which are otherwise "usable".

@LorenzoMerotto
Copy link

I would say that the issue is rekevant for first gen methods, not really for second generation ones

  • 2nd gen methods Since users provide both the single cell dataset and the bulk one I would say all the gene nomenclatures are fine, as long as they are in the same format. Since we don't have a pre defined one (I think) it would be too complex to implement all the possible gene names combinations.
  • 1st gen methods For immunedeconv we always asked the users to provide the expression matrix wiht gene names, so either HGNH symbols (for human) or MGI (for mouse). We could theoretically implement a gene conversion approach, similarly to what I did for the mouse data. However this one would have to use biomaRt and connect to the ENSEMBL database, whose servers are not reliable at all. I tried to connect to it lately and it never worked.

I don't know if there is a standard for gene names for spatial data but I guess it depends on the way the data are processed. To use first generation methods we have to obtain HGNC/MGI symbols. The toupper() could be a solution but I don't know how many times it would work and how many genes would be preserved

@LorenzoMerotto
Copy link

One easy solution could be to use the good old annotables package. I used it years ago but perhaps it could come in hand. I was planning to contribute by adding the latest mouse genome version but haven't heard back from the author yet.
The only problem here would be that the gene names might not be up to date as in the ENSEMBL database, but they are for sure more accessible (the data is stored locally)

@czackl
Copy link
Collaborator Author

czackl commented Jul 11, 2022

I think i confused the human / mouse methods here but the questions stays the same. (also didn't knew about the lowercase mouse gene names)

Good comments! I think i will implement something to check for the types of gene symbols provided and output something to the user. (this is a human method and you probably provided mouse data + the other way round)

Edit: i will leave this open until implemented

@czackl czackl closed this as completed Nov 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants