Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

INFO:AnnotatorCore:Cancer type for the sample should be defined for a more accurate result. #213

Open
Teezi opened this issue Dec 11, 2023 · 6 comments

Comments

@Teezi
Copy link

Teezi commented Dec 11, 2023

Hi,

I'm using MafAnnotator.py and encountering numerous warnings: INFO:AnnotatorCore:Cancer type for the sample should be defined for a more accurate result.

I'm wondering how I can define the cancer type for my sample or disable these warnings.

Many thanks!

@jjc2718
Copy link

jjc2718 commented Dec 21, 2023

I don't have any affiliation with OncoKB, but I was looking into this recently for a project of mine and here's what I found:

  • It looks like you can specify a cancer type for each sample using the ONCOTREE_CODE or CANCER_TYPE headers in your MAF file (see here in the code for annotating samples).
  • The name format comes from OncoTree, I tried some examples and it seems like most of the top-level nodes there should work (e.g. "Liver Cancer", "Melanoma", "Breast Cancer", etc). It would be good to see some examples of this, though, since there are no cancer type columns in the example MAFs in the data directory.
  • I think whether or not cancer type is specified only affects the "level" of therapeutic implications for each variant; the oncogenic/neutral/unknown and mutation effect annotations appear to me to be unchanged. I only compared a few annotated MAFs manually, though, before and after specifying a cancer type - it would be good to get confirmation of exactly what part of the annotation process the cancer type specification is influencing from someone on the OncoKB team.
  • You should be able to turn off the warnings by changing the logging level here to something higher than INFO (e.g. logging.WARN should work).

Hope this helps! I'd be particularly curious what the answer is to my third point - I annotated a large number of MAF files without specifying a cancer type, but we're primarily interested in the oncogenic vs. neutral variant annotations. It would be good to know if I need to re-annotate them or if it won't have any effect on those calls.

@zhx828
Copy link
Member

zhx828 commented Dec 22, 2023

  • nd mutation effect annotations appear

Sorry about the late reply! To ur third question, it affects Therapeutics/Diagnostic/Prognostic implications. The tumor type summary will not be included if it's not there.

@paulsalachan
Copy link

Hi,

I get the same warnings when running MafAnnotator.py.

INFO:AnnotatorCore:Cancer type for the sample should be defined for a more accurate result

I have tried to include either CANCER_TYPE OR ONCOTREE_CODE in the clinical data file provided as input using the -c option.
According to the documentation, the cancer type should be assigned based on the clinical data file as it has the highest priority. So something must be going wrong. Having the cancer type column in the input file -i does not help either.

However there are no warnings when the default tumor type -t is set, but this is only possible when you have one cancer type in your dataset. I guess I could subset the data for each cancer type and run the annotation separately, but that would defeat the purpose of the -c option? But also, there does not seem to be any check on whether a valid cancer type is specified by the -t option, so I could specify some random string, and it would not complain or give a warning about cancer type not being specified.

Do you know what could be going on here? Ideally, I would like to be able to specify different levels of ONCOTREE_CODE and get output for those levels.

Thanks for your help.

@zhx828
Copy link
Member

zhx828 commented Feb 13, 2024

Hi,

I get the same warnings when running MafAnnotator.py.

INFO:AnnotatorCore:Cancer type for the sample should be defined for a more accurate result

I have tried to include either CANCER_TYPE OR ONCOTREE_CODE in the clinical data file provided as input using the -c option. According to the documentation, the cancer type should be assigned based on the clinical data file as it has the highest priority. So something must be going wrong. Having the cancer type column in the input file -i does not help either.

However there are no warnings when the default tumor type -t is set, but this is only possible when you have one cancer type in your dataset. I guess I could subset the data for each cancer type and run the annotation separately, but that would defeat the purpose of the -c option? But also, there does not seem to be any check on whether a valid cancer type is specified by the -t option, so I could specify some random string, and it would not complain or give a warning about cancer type not being specified.

Do you know what could be going on here? Ideally, I would like to be able to specify different levels of ONCOTREE_CODE and get output for those levels.

Thanks for your help.

Hi @paulsalachan , in the example script, I have clinical file referenced in most annotator scripts so the -c should work. For the clinical file you created, do you also have SAMPLE_ID column? I'm happy to take a look at your files if you send me a snapshot.

We currently do not have any checks on cancer type which I think is a good idea to support. #214

@paulsalachan
Copy link

Hi @zhx828, thank you for your quick reply. That resolved it. In the clinical file I had the Sample ID column but the column was named 'Tumor_Sample_Barcode' instead. When I renamed it to 'SAMPLE_ID', it is annotating without any warnings, so that's great! A suggestion would be to be able to provide either 'SAMPLE_ID' or 'Tumor_Sample_Barcode' as column header, so that it is consistent with the header in the Maf file. Thanks for your time with the help!

@zhx828
Copy link
Member

zhx828 commented Feb 14, 2024

Hi @zhx828, thank you for your quick reply. That resolved it. In the clinical file I had the Sample ID column but the column was named 'Tumor_Sample_Barcode' instead. When I renamed it to 'SAMPLE_ID', it is annotating without any warnings, so that's great! A suggestion would be to be able to provide either 'SAMPLE_ID' or 'Tumor_Sample_Barcode' as column header, so that it is consistent with the header in the Maf file. Thanks for your time with the help!

Oh Tumor_Sample_Barcode is supposed to be supported but for some reason it's not in for the clinical file. I made a patch to fix the issue https://github.com/oncokb/oncokb-annotator/releases/tag/v3.4.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants