# Medical Term Definition Analysis for ICD-11 Codes

This notebook processes single-word medical terms from ICD-11 classification and retrieves their definitions from Merriam-Webster's Medical Dictionary API. The workflow includes:

- Loading and filtering single-word medical terms from ICD-11 dataset
- Fetching definitions using Merriam-Webster's Medical Dictionary API
- Processing and cleaning the retrieved definitions
- Creating a mapping between ICD-11 codes and their corresponding medical definitions
- Filtering for terms with single, unambiguous definitions for further analysis

In [30]:
import pandas as pd

df = pd.read_csv("Bert_new_embeddings/icd11-25_data_raw.csv")
print("Loaded CSV from Bert_new_embeddings/icd11-25_data_raw.csv")


Loaded CSV from Bert_new_embeddings/icd11-25_data_raw.csv


In [31]:
# Count number of words in each title
df['title_word_count'] = df['title'].str.split().str.len()

# Show counts for 1-word, 2-word, etc.
word_count_stats = df['title_word_count'].value_counts().sort_index()
print("Number of titles by word count:")
print(word_count_stats)

# Filter only titles with one word
one_word_titles = df[df['title_word_count'] == 1][['title', 'code']]
print(f"\nNumber of one-word titles: {len(one_word_titles)}")

# Save to CSV
one_word_titles.to_csv('one_word_icd11_titles_codes.csv', index=False)

Number of titles by word count:
title_word_count
1      749
2     1963
3     2301
4     2115
5     1748
6     1394
7     1039
8      830
9      553
10     385
11     269
12     175
13     115
14      96
15      80
16      52
17      42
18      19
19      12
20       6
21       9
22       2
23       2
24       1
25       1
26       1
29       1
Name: count, dtype: int64

Number of one-word titles: 749


In [32]:
one_word_titles[one_word_titles["title"] == 'Helminthiases']

Unnamed: 0,title,code
145,Helminthiases,


In [13]:
import time
import requests
import csv
from urllib.parse import quote_plus
import pandas as pd

# 1) Load your sample and extract disease list
diseases = one_word_titles['title'].tolist()
# diseases = ['Typhoid']

# 2) API settings
API_KEY = "0cca9c9d-16e4-4f16-b77a-d0d0c681348a"
BASE_URL = "https://www.dictionaryapi.com/api/v3/references/medical/json/{}?key={}"

# 3) Helper to fetch definitions for one term
def fetch_definitions(term):
    """Return a list of all shortdef strings for 'term'."""
    url_term = quote_plus(term)
    url = BASE_URL.format(url_term, API_KEY)
    resp = requests.get(url, timeout=10)
    resp.raise_for_status()
    data = resp.json()
    
    definitions = []
    # The API may return suggestions (strings) if no entry found,
    # so we only process dict entries that have 'shortdef'.
    for entry in data:
        if isinstance(entry, dict) and 'shortdef' in entry:
            definitions.extend(entry['shortdef'])
    return definitions

# 4) Main loop: query each disease, save to CSV
with open("diseases_definitions.csv", "w", newline="", encoding="utf-8") as fout:
    writer = csv.writer(fout)
    writer.writerow(["Disease", "Definitions"])
    
    for disease in diseases:
        print(f"Looking up: {disease}")
        try:
            defs = fetch_definitions(disease)
            if defs:
                # join multiple defs with semicolon
                defs_text = "; ".join(defs)
            else:
                defs_text = ""
            writer.writerow([disease, defs_text])
            print(f"  → Retrieved {len(defs)} definitions.")
        except Exception as e:
            print(f"  ⚠️ Error for '{disease}': {e}")
            writer.writerow([disease, "ERROR: " + str(e)])
        
        # be polite to the API
        time.sleep(0.5)

print("Done! Results in diseases_definitions.csv")


Looking up: Toxoplasmosis
  → Retrieved 1 definitions.
Looking up: Tetanus
  → Retrieved 7 definitions.
Looking up: Lobomycosis
  → Retrieved 0 definitions.
Looking up: Babesiosis
  → Retrieved 3 definitions.
Looking up: Varicella
  → Retrieved 3 definitions.
Looking up: Rickettsialpox
  → Retrieved 1 definitions.
Looking up: Leptospirosis
  → Retrieved 1 definitions.
Looking up: Phaeohyphomycosis
  → Retrieved 0 definitions.
Looking up: Clonorchiasis
  → Retrieved 1 definitions.
Looking up: Cimicosis
  → Retrieved 0 definitions.
Looking up: Giardiasis
  → Retrieved 1 definitions.
Looking up: Gnathostomiasis
  → Retrieved 1 definitions.
Looking up: Actinomycosis
  → Retrieved 1 definitions.
Looking up: Ascariasis
  → Retrieved 1 definitions.
Looking up: Buffalopox
  → Retrieved 0 definitions.
Looking up: Angiostrongyliasis
  → Retrieved 0 definitions.
Looking up: Onchocerciasis
  → Retrieved 1 definitions.
Looking up: Measles
  → Retrieved 6 definitions.
Looking up: Leprosy
  → Retriev

In [27]:
diseases_definitions = pd.read_csv("diseases_definitions.csv").dropna()

# Merge ICD codes from one_word_titles into diseases_definitions based on the disease name
diseases_definitions = diseases_definitions.merge(
    one_word_titles.rename(columns={'title': 'Disease', 'code': 'ICD_Code'}),
    on='Disease',
    how='left'
)

# Filter for diseases with only one definition (no ';' in Definitions)
single_definition = diseases_definitions[~diseases_definitions['Definitions'].str.contains(';')].copy()
single_definition.reset_index(drop=True, inplace=True)
print(f"Number of diseases with a single definition: {len(single_definition)}")
single_definition.head(10)

Number of diseases with a single definition: 388


Unnamed: 0,Disease,Definitions,ICD_Code
0,Toxoplasmosis,infection with or disease caused by a sporozoa...,1F57
1,Rickettsialpox,"a disease characterized by fever, chills, head...",1C32
2,Leptospirosis,any of several diseases of humans and domestic...,1B91
3,Clonorchiasis,infestation with or disease caused by the Chin...,1F80
4,Giardiasis,infestation with or disease caused by a flagel...,1A31
5,Gnathostomiasis,infestation with or disease caused by nematode...,1F67
6,Ascariasis,infestation with or disease caused by ascarids,1F62
7,Cryptosporidiosis,infection with or disease caused by cryptospor...,1A32
8,Helminthiases,infestation with or disease caused by parasiti...,
9,Mumps,an acute contagious virus disease caused by a ...,1D80


In [25]:
single_definition[['ICD_Code', 'Definitions']].to_csv('single_definition_no_disease.csv', index=False)