# ðŸ§ª Use Case: Drug Safety Monitoring (Pharmacovigilance)

### Business Goal:

Identify and highlight adverse drug reactions (ADRs) and key drug names in unstructured medical reports using TF-IDF.

### ðŸ§¾ Example Corpus (3 Sample Reports)

In [5]:
docs = [
    "Patient reported nausea and headache after taking paracetamol.",
    "Paracetamol overdose caused liver damage and required hospitalization.",
    "No adverse effects observed with ibuprofen; patient showed improvement."
]

### âœ… Applying TF-IDF

In [6]:
# TF-IDF to extract important keywords from each report:

In [7]:
from sklearn.feature_extraction.text import TfidfVectorizer
import pandas as pd

### Initialize and fit TF-IDF


In [13]:
vectorizer = TfidfVectorizer(stop_words='english')
X = vectorizer.fit_transform(docs)
X

<3x18 sparse matrix of type '<class 'numpy.float64'>'
	with 20 stored elements in Compressed Sparse Row format>

### Convert to DataFrame

In [14]:
df = pd.DataFrame(X.toarray(), columns=vectorizer.get_feature_names_out())

In [15]:
print(df.round(2))

   adverse  caused  damage  effects  headache  hospitalization  ibuprofen  \
0     0.00    0.00    0.00     0.00      0.44             0.00       0.00   
1     0.00    0.39    0.39     0.00      0.00             0.39       0.00   
2     0.39    0.00    0.00     0.39      0.00             0.00       0.39   

   improvement  liver  nausea  observed  overdose  paracetamol  patient  \
0         0.00   0.00    0.44      0.00      0.00         0.33     0.33   
1         0.00   0.39    0.00      0.00      0.39         0.30     0.00   
2         0.39   0.00    0.00      0.39      0.00         0.00     0.30   

   reported  required  showed  taking  
0      0.44      0.00    0.00    0.44  
1      0.00      0.39    0.00    0.00  
2      0.00      0.00    0.39    0.00  


![Screenshot%202025-07-22%20104835.png](attachment:Screenshot%202025-07-22%20104835.png)

### ðŸ§  Interpretation:
    
Paracetamol appears in multiple documents â†’ moderate TF-IDF

Headache, nausea, liver, overdose are more document-specific â†’ high TF-IDF

This helps highlight critical drug safety indicators without manual reading



### ðŸ’¬ Key Takeaway:
    
TF-IDF helps pharma teams automatically extract meaningful medical terms from clinical narratives â€” enabling faster and smarter insights from large text data.

