-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Annotate diseases/phenotypes using chatGPT #19
Comments
@KittyMurphy please document your progress on this here |
Annotating HPO phenotypes using chatGPT via gptstudioSet up
Attempt #1Here I'm using the congenital onset terms (without HPO ID) that were provided to us by Peter Robinson. Will also try:
Below is a subset of
Attempt #2What if I run the prompt one phenotype at a time, with 3 iterations?
Below is a subset of
Attempt #3Here I'm repeating attempt #1 with the addition of providing chatGPT with the definition of each congenital onset term.
Here is a subset of
Attempt #4Here I'm repeating attempt #2 with the addition of providing chatGPT with the definition of each congenital onset term.
Here is a subset of
|
That prompt is not including the description of the phenotype is it?
Sent from Outlook for iOS<https://aka.ms/o0ukef>
…________________________________
From: Kitty Murphy ***@***.***>
Sent: Sunday, March 26, 2023 11:55:25 AM
To: neurogenomics/RareDiseasePrioritisation ***@***.***>
Cc: Skene, Nathan G ***@***.***>; Mention ***@***.***>
Subject: Re: [neurogenomics/RareDiseasePrioritisation] Annotate diseases/phenotypes using chatGPT (Issue #19)
This email from ***@***.*** originates from outside Imperial. Do not click on links and attachments unless you recognise the sender. If you trust the sender, add them to your safe senders list<https://spam.ic.ac.uk/SpamConsole/Senders.aspx> to disable email stamping for this address.
Annotating HPO phenotypes using chatGPT via gptstudio
Set up
install.packages("gptstudio")
library(gptstudio)
# Load HPO terms
terms_dt = HPOExplorer::load_phenotype_to_genes(3)
terms_cols = list(name="Phenotype",
id="ID")
# Get unique terms and their ID's
terms_dt_sub <.- unique(terms_dt[,unname(unlist(terms_cols)), with=FALSE])
Attempt #1<#1>
Here I'm using the congenital onset terms (without HPO ID) that were provided to us by Peter Robinson. Will also try:
* inputting HPO ID into prompt
* asking chatGPT to add column with HPO ID
# congenital onset terms without HPO ID
congenital_onset <- "Syndactyly;
Ventricular septal defect; Atrioventricular canal defect;
Atrial septal defect; Abnormal connection of the cardiac segments;
Fetal anomaly; Neural tube defect;
Coloboma; Microtia; Cryptotia;
Cupped ear; Cleft helix; Low-set ears;
Synotia; Holoprosencephaly; Exstrophy;
Abdominal wall defect; Abnormal lung lobation;
Unilateral primary pulmonary dysgenesis"
# define the effects you need answers to e.g. does the phenotype cause death
effects <- "mental retardation, death, impaired mobility,
physical malformations, blindness, sensory impairments,
immunodeficiency, cancer, reduced fertility."
# define the columns of the output table
table_columns <- "phenotype, mental retardation, death, impaired mobility,
physical malformations, blindness, sensory impairments, immunodeficiency, cancer,
reduced fertility, congenital onset, jusitification."
# define chatGPT prompt
question = paste("Do:",
congenital_onset,
", typically cause:",
effects,
"Do they have congenital onset?",
"You must give one-word yes or no answers and give a justification for why they do or don't have congenital onset.",
"You must provide the output in .tsv format with columns:",
table_columns)
question <- gsub("\n", "", question)
# run chatgpt 5 times for the same prompt
n = 5
run_chatgpt <- function(q){
all_res <- gptstudio::openai_create_chat_completion(prompt = question)
choices <- fread(all_res[["choices"]]$message.content)
}
res_allPheno <- lapply(seq_len(n), function(x) run_chatgpt(1))
res_allPheno_dt <- data.table::rbindlist(res_list,fill = TRUE,
use.names = TRUE,
idcol = "iteration")
# order alphabetically so that you can compare results across phenotypes
res_allPheno_dt <- res_allPheno_dt [order(res_allPheno_dt $phenotype), ]
Below is a subset of res_allPheno_dt. The answers chatGPT gives over iterations of the same prompt are not consistent e.g. look at mental retardation for coloboma. A coloboma is an area of missing tissue in your eye, and through a quick google search is not associated with mental retardation.
iteration phenotype mental retardation death impaired mobility physical malformations blindness sensory impairments immunodeficiency cancer reduced fertility congenital onset justification
1 Atrioventricular canal defect Yes Yes Yes Yes No No No No No Yes Congenital heart defect present at birth
2 Atrioventricular canal defect Yes, in some cases May lead to premature death no May lead to growth failure, fatigue or rapid breathing May lead to vision problems None None None No AV canal defect is present at birth and is a congenital condition.
3 Atrioventricular canal defect Yes Yes No Yes No No No No No Yes Atrioventricular canal defect is a congenital heart defect in which there is an opening in the center of the heart where the walls separating the heart chambers should be.
4 Atrioventricular canal defect Yes Possible None Physical malformations No No No No No Yes Congenital onset is typical of this phenotype as it is a result of abnormal development of the heart during fetal development.
5 Atrioventricular canal defect Yes Yes No Yes No No No No No Yes It is a congenital heart defect that is present at birth.
1 Cleft helix No No No Yes No No No No No Yes Congenital ear malformation present at birth
2 Cleft helix No None None May lead to physical malformations of the ear None None None None Yes Cleft helix is present at birth and is a congenital condition.
3 Cleft helix No No No Yes No No No No No Yes Cleft helix is a congenital anomaly characterized by a cleft or gap in the top part of the ear.
4 Cleft helix No None None Physical malformations No No No No No Yes Congenital onset is typical of this phenotype as it is a result of incomplete development of the ear during fetal development.
5 Cleft helix No No No Yes No No No No No Yes A cleft helix is a rare congenital malformation of the ear.
1 Coloboma Yes No No Yes Yes Yes No No No Yes Present at birth and can affect vision and eye structure
2 Coloboma No May lead to vision problems or blindness May depend on location on the body None May lead to vision problems or blindness May lead to hearing loss or deafness None None No Coloboma is present at birth and is a congenital condition.
3 Coloboma Yes No No Yes Yes Yes No No No Yes Coloboma is a congenital anomaly characterized by a gap or hole in one of the structures of the eye.
4 Coloboma No None None Physical malformations Possible Possible No No No Yes Congenital onset is typical of this phenotype as it is a result of incomplete fusion of the tissues that form the eye during fetal development.
5 Coloboma Yes No No Yes Yes No No No No Yes A coloboma is a birth defect that affects the eye.
1 Cryptotia No No No Yes No No No No No Yes Congenital ear malformation present at birth
2 Cryptotia No None None May lead to physical malformations of the ear None None None None Yes Cryptotia is present at birth and is a congenital condition.
3 Cryptotia No No No Yes No No No No No Yes Cryptotia is a congenital anomaly characterized by a hidden ear that is partially or completely covered by skin.
4 Cryptotia No None None Physical malformations No No No No No Yes Congenital onset is typical of this phenotype as it is a result of abnormal development of the ear during fetal development.
5 Cryptotia No No No Yes No No No No No Yes Cryptotia is a congenital ear deformity.
1 Cupped ear No No No Yes No No No No No Yes Congenital ear malformation present at birth
2 Cupped ear No None None May lead to physical malformations of the ear None None None None Yes Cupped ear is present at birth and is a congenital condition.
3 Cupped ear No No No Yes No No No No No Yes Cupped ear is a congenital anomaly characterized by an ear that is shaped like a cup and protrudes outward from the side of the head.
4 Cupped ear No None None Physical malformations No No No No No Yes Congenital onset is typical of this phenotype as it is a result of abnormal development of the ear during fetal development.
5 Cupped ear No No No Yes No No No No No Yes A cupped ear is a congenital malformation.
1 Exstrophy Yes No Yes Yes No No No No No Yes Present at birth and affects bladder and pelvic development
2 Exstrophy No None None May lead to physical malformations of the abdominal wall or pelvic organs None None None May lead to reduced fertility Yes Exstrophy is present at birth and is a congenital condition.
3 Exstrophy Yes No Yes Yes No No No No No Yes Exstrophy is a congenital anomaly characterized by a defect in the abdominal wall or bladder.
4 Exstrophy No None None Physical malformations No No No No No Yes Congenital onset is typical of this phenotype as it is a result of abnormal development of the abdominal wall during fetal development.
5 Exstrophy Yes No Yes Yes No No No No No Yes Exstrophy is a congenital abnormality where the bladd
Attempt #2<#2>
What if I run the prompt one phenotype at a time, with 3 iterations?
congenital_onset_split <- as.list(strsplit(congenital_onset, "; ")[[1]])
results_list <- list()
for (j in 1:3) {
res_individualPheno <- lapply(seq_len(length(congenital_onset_split)), function(i){
pheno <- congenital_onset_split[[i]]
question = paste("Does",
pheno,
"typically cause:",
effects,
"Does",
pheno,
"have congenital onset?",
"You must give one-word yes or no answers and give a justification for why it does or doesn't have congenital onset.",
"You must provide the output in .tsv format with columns:",
table_columns)
question <- gsub("\n", "", question)
print(question)
all_res <- gptstudio::openai_create_chat_completion(prompt = question)
choices <- fread(all_res[["choices"]]$message.content)
return(choices)
})
results_list[[j]] <- res_individualPheno_list # store the result in the list
}
list <- unlist(res_individualPheno_list, recursive = FALSE)
res_individualPheno_dt <- data.table::rbindlist(list,fill = TRUE,
use.names = TRUE,
idcol = "iteration")
# order alphabetically so that you can compare results across phenotypes
res_individualPheno_dt <- res_individualPheno_dt[order(res_individualPheno_dt$phenotype), ]
Below is a subset of res_individualPheno_dt, I've shown the same phenotypes as for res_allPheno_dt for comparison. There seems to be more consistency across the iterations when you run chatgpt on each phenotype individually.
phenotype mental retardation death impaired mobility physical malformations blindness sensory impairments immunodeficiency cancer reduced fertility congenital onset justification
Atrioventricular canal defect no no no yes no no no no no yes NA
Atrioventricular canal defect No No No Yes No No No No No Yes NA
Atrioventricular canal defect No No No Yes No No No No No Yes NA
Cleft helix No No No Yes No No No No No Yes NA
Cleft helix No No No Yes No No No No No Yes NA
Cleft helix No No No Yes No No No No No Yes NA
Coloboma No No No Yes Yes Yes No No No Yes NA
Coloboma No No No Yes Yes Yes No No No Yes NA
Coloboma no no no yes yes yes no no no yes NA
Cryptotia No No No Yes No No No No No Yes NA
Cryptotia No No No Yes No No No No No Yes NA
Cryptotia No No No Yes No No No No No Yes NA
Cupped ear No No No Yes No No No No No Yes NA
Cupped ear no no no yes no no no no no yes NA
Cupped ear No No No Yes No No No No No Yes NA
Exstrophy No No Yes Yes No No No No Yes Yes NA
Exstrophy No No Yes Yes No No No Yes Yes Yes NA
Exstrophy No No Yes Yes No No No No Yes Yes NA
@bschilder<https://github.com/bschilder> @NathanSkene<https://github.com/NathanSkene>
—
Reply to this email directly, view it on GitHub<#19 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AH5ZPE5L3RZIYKPKXP4QW3TW6AVC3ANCNFSM6AAAAAAWBOOU2U>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
Nice progress @KittyMurphy . That's interesting about the responses being more consistent when provided individually. Wondering if this has to with informational overload like we were discussing before. Might be an aspect of chatGPT that other people have noticed and documented. One thing that would be helpful is to come up with a function that computes consistently scores for each metric. That will give us at least some quantitative metric of performance (tho not exactly the ground truth). Something like: dat=xlsx::read.xlsx("~/Downloads/annot.xlsx",1)
avg <- dplyr::group_by(dat, phenotype) |> dplyr::summarise( mental.retardation_consistency=1/length(unique(mental.retardation)))
avg After computing the within phenotype consistency, you can compute mean consistency: mean(avg$mental.retardation_consistency)
# 0.75
@NathanSkene I believe this is only providing the chatGPT with the name of the phenotype, not the full description of it. Thus, any other information about the disease is being pulled from the LLM itself. |
Good idea to get some stats on it. Could also use scoring to compare ChatGPt3 vs 4 consistency: expect some folks will be interested.
Including the HPO description might help it get a more consistent understanding of what the phenotype is. Brian, do you know how the descriptions can be accessed programmatically?
Sent from Outlook for iOS<https://aka.ms/o0ukef>
…________________________________
From: Brian M. Schilder ***@***.***>
Sent: Sunday, March 26, 2023 3:53:41 PM
To: neurogenomics/RareDiseasePrioritisation ***@***.***>
Cc: Skene, Nathan G ***@***.***>; Mention ***@***.***>
Subject: Re: [neurogenomics/RareDiseasePrioritisation] Annotate diseases/phenotypes using chatGPT (Issue #19)
This email from ***@***.*** originates from outside Imperial. Do not click on links and attachments unless you recognise the sender. If you trust the sender, add them to your safe senders list<https://spam.ic.ac.uk/SpamConsole/Senders.aspx> to disable email stamping for this address.
Nice progress @KittyMurphy<https://github.com/KittyMurphy> . That's interesting about the responses being more consistent when provided individually. Wondering if this has to with informational overload like we were discussing before. Might be an aspect of chatGPT that other people have noticed and documented.
One thing that would be helpful is to come up with a function that computes consistently scores for each metric. That will give us at least some quantitative metric of performance (tho not exactly the ground truth). Something like:
dat=xlsx::read.xlsx("~/Downloads/annot.xlsx",1)
dplyr::group_by(dat, phenotype) |> dplyr::summarise( mental.retardation_consistency=1/length(unique(mental.retardation)))
[Screenshot 2023-03-26 at 14 34 57]<https://user-images.githubusercontent.com/34280215/227779366-f8ee8286-30af-486f-b39f-21d3c6ce5767.png>
That prompt is not including the description of the phenotype is it?
@NathanSkene<https://github.com/NathanSkene> I believe this is only providing the chatGPT with the name of the phenotype, not the full description of it. Thus, any other information about the disease is being pulled from the LLM itself.
—
Reply to this email directly, view it on GitHub<#19 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AH5ZPE7U7FACDH2TKEAWITTW6BJ7LANCNFSM6AAAAAAWBOOU2U>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
Already working on adding the description, @bschilder I assume the best way to get this is to use the definition column in HPOExplorer::make_phenos_dataframe? |
Yeah, that'll work. Or the subfunction which is more direct: |
The current prompts do not include a statement for "Do not consider indirect effects". Would be worth adding this in and seeing if it makes any difference. |
I tried out AutoGPT to see if this might be a useful avenue. Here’s what I learned: Pros
Cons
|
I have now performed a trial run to annotate phenotypes using chat gpt via selenium. Initially we asked gpt to provide the output in .tsv format but I had difficulty trying to extract this from the chat interface into python. To overcome this, I asked gpt to provide the output as python code that I could then run to generate a data frame. @bschilder noted that earlier versions of gpt could sometimes be lazy when asking for code. Here is a prompt example: Here is the trial run using ~100 phenotypes (note, there are ~200 because I think I appended the results twice by mistake): annot_HPO_gpt_test.csv @NathanSkene noted that the phenotype 'Azoospermia' is not being annotated as reducing fertility. This is worrying as upon a literature search of this phenotype: Next, I want to:
|
Thanks @KittyMurphy ! A couple of other ideas for reducing token usage (tho whether this helps will depend on how OpenAI counts 'tokens', which i'm still not totally clear on):
|
Annotation output checksAll of the following annotation validation procedures described below can be rerun with any new annotations using the new internal function: Check phenotype namesCheck whether chatGPT hasn't modified the phenotype names such that we can't link it back to the input HPO terms. d <- data.table::fread(path, key = "Phenotype")
annot <- HPOExplorer::load_phenotype_to_genes()
d$Phenotype[!d$Phenotype %in% annot$Phenotype]
# character(0) ✅ All phenotypes in HPO gene annotations file verbatim. Check annotation consistencyFor phenotype that chatGPT annotated more than once, how consistent are the Y/N annotations it gave for each? nm <- names(d)[!names(d) %in% c("Phenotype","Justification")]
d_mean <- d[,lapply(.SD,function(x){mean(x=="Yes")}),.SDcols=nm, by="Phenotype"]
d_consist <- lapply(d_mean[,-1], function(x)sum(x%in%c(0,1)/nrow(d_mean)))
d_consist
mean(unlist(d_consist))
# 0.9770833 ✅ At least In this small subsampling, 9/10 annotation columns are 100% consistent across chatGPT runs. This results in an average consistency score of 97.7% across all annotations. "Reduced_Fertility" is one to look out for, as it does not appear to always provide the same annotation here (77%, which may seem not too bad but remember that baseline is 50% as the options are binary). Check phenotype classificationsAs some of these phenotypes belong to specific branches of the HPO that should guarantee have a particular annotation (e.g. all forms of blindness phenotypes cause Blindness ('Yes'), we can use this information to validate the chatGPT-provided annotations. While we can confirm annotations that we would expect (true positives vs. false negatives), this doesn't really let us definitively says whether some phenotypes do NOT cause a given condition such as blindness (true negatives). d$HPO_ID <- harmonise_phenotypes(phenotypes = d$Phenotype,
as_hpo_ids = TRUE)
## Find matching HPO branches
hpo <- get_hpo()
queries <- list(
Intellectual_Disability=c("intellectual disability"),
Impaired_Mobility=c("Abnormal central motor function",
"Abnormality of movement"),
Physical_Malformations=c("malformation","morphology"),
Blindness=c("^blindness"),
Sensory_Impairments=c("Abnormality of vision",
"Abnormality of the sense of smell",
"Abnormality of taste sensation",
"Somatic sensory dysfunction",
"Hearing abnormality"
),
Immunodeficiency=c("Immunodeficiency"),
Cancer=c("Neoplasm","Cancer"),
Reduced_Fertility=c("Decreased fertility")
)
tiers <- lapply(queries, function(q){
terms <- grep(paste(q,collapse = "|"),
hpo$name,
ignore.case = TRUE, value = TRUE)
ontologyIndex::get_descendants(ontology = hpo,
roots = names(terms),
exclude_roots = FALSE) |>
unique()
})
annot_check <- lapply(seq_len(nrow(d)), function(i){
r <- d[i,]
cbind(
r[,c("Phenotype","HPO_ID")],
lapply(stats::setNames(names(tiers),names(tiers)),
function(x){
if(r$HPO_ID %in% tiers[[x]]){
r[,x,with=FALSE][[1]]=="Yes"
} else {
NA
}
}) |> data.table::as.data.table()
)
}) |> data.table::rbindlist()
### Number of rows where annotation is NA
missing_rate <- sapply(
annot_check[,names(tiers),with=FALSE],
function(x){sum(is.na(x))/length(x)})
missing_rate
True positive rate### Number of rows where the annotation was checkable and TRUE
true_pos_rate <- sapply(annot_check[,names(tiers),with=FALSE], function(x){sum(na.omit(x)==TRUE)/length(na.omit(x))})
true_pos_rate
False negative rate### Number of rows where the annotation was checkable and FALSE
false_neg_rate <- sapply(annot_check[,names(tiers),with=FALSE], function(x){sum(na.omit(x)==FALSE)/length(na.omit(x))})
false_neg_rate
|
I have since updated the prompt twice. Example prompt 1.1: I need to annotate phenotypes as to whether they typically cause: intellectual disability, death, impaired mobility, physical malformations, blindness, sensory impairments, immunodeficiency, cancer, reduced fertility? Do they always have congenital onset? You must give one-word yes or no answers. Do not consider indirect effects. You must provide the output in python code as a data frame called df with columns: phenotype, intellectual_disability, death, impaired_mobility, physical_malformations, blindness, sensory_impairments, immunodeficiency, cancer, reduced_fertility, congenital_onset. Also add justification columns for each outcome. These are the phenotypes: Recurrent urinary tract infections; Neurogenic bladder; Urinary urgency Here are the results for ~500 phenotypes: gpt_hpo_annotations.csv. The issue here was that we were getting non yes or no answers for some of the phenotypic outcomes e.g. 'can be', 'may be'. To get around this, we decided to add a scale for the phenotypic outcomes, so instead of yes or no answers we ask chat gpt to answer using a scale of: never, rarely, often, always. Due to limited token usage we had to drop the number of phenotypes in each prompt to two. Example prompt 1.2: I need to annotate phenotypes as to whether they typically cause: intellectual disability, death, impaired mobility, physical malformations, blindness, sensory impairments, immunodeficiency, cancer, reduced fertility? Do they have congenital onset? To answer, use a severity scale of: never, rarely, often, always. Do not consider indirect effects. You must provide the output in python code as a data frame called df with columns: phenotype, intellectual_disability, death, impaired_mobility, physical_malformations, blindness, sensory_impairments, immunodeficiency, cancer, reduced_fertility, congenital_onset. Also add justification columns for each outcome. These are the phenotypes: Urinary urgency; Hypoplasia of the uterus Here are the results so far: gpt_hpo_annotations_scale.csv Currently waiting for help from Eugene to get this set up on a remote machine so that it can run 24/7, and it will probably take ~2 weeks. |
@KittyMurphy I'm looking into some resources that might be helpful: ChatGPT File uploader (google chrome extension) Bing Chat: Microsoft's iteration of ChatGPT: |
UpdateStage 1
Stage 2
annot=HPOExplorer::load_phenotype_to_genes()
length(unique(annot$hpo_name))
# [1] 10969 @KittyMurphy is running the last of these now. Stage 3
hpo=HPOExplorer::get_hpo()
> length(unique(hpo$name))
# [1] 18057 |
I've actually been using the below code to get the phenotypes:
I'll make sure I run the remaining 15 phenotypes that are called with |
@KittyMurphy Could you check whether this discrepancy stems from :
|
(checked boxes indicate at least an initial attempt has been made)
Annotations
Models
Related
Some of my initial attempts are documented within this R package:
https://github.com/neurogenomics/gptPhD
@KittyMurphy once you have a chance please report your progress here. I'll do the same.
The text was updated successfully, but these errors were encountered: