<div class='alert' style='background-color: #1c1a1e; color: #f5f4f0; padding:16px 26px; border-radius:20px; font-size:40px;'><B>GP Patient Survey</b> - Anonymize</div>
<div style='margin:0px 26px; color:#1c1a1e; font-size:16px;'>
<ol>
<li><B>Notebook Documentation</B>: Documenting a Jupyter Notebook is crucial for ensuring that the analysis it contains is understandable and reproducible, not only by others but also by your future self.  
Best practices in documentation start with a clear, informative introduction: what the notebook aims to achieve, the dataset being used, and any prerequisite knowledge or context needed to follow the analysis.  
Each cell, especially those containing key computations or decisions, should be accompanied by Markdown cells explaining the rationale behind the code, any assumptions made, and a summary of the results. Inline comments within the code cells can clarify complex lines of code or non-obvious steps.  
Visualizations should include titles, axis labels, and legends where appropriate to make them self-explanatory. Finally, the notebook should conclude with a summary of findings, any conclusions drawn, and possible next steps. This narrative structure turns your notebook from a mere collection of code cells into a coherent story about your data analysis journey.
</ol>
</div>

# Libraries & Data

In [1]:
# Importing default Libraries
import matplotlib.pyplot as plt
import pandas as pd 
import numpy as np
import seaborn as sns
import warnings
import datetime 
import os 

pd.options.display.max_rows = 1000
pd.options.display.max_columns = 1000

# Hi-resolution Plots and Matplotlib inline
%config InlineBackend.figure_format = 'retina'
%matplotlib inline

# Set the maximum number of rows and columns to be displayed
warnings.filterwarnings('ignore')

# "magic commands" to enable autoreload of your imported packages
%load_ext autoreload
%autoreload 2

## Loading Data

In [2]:
from gp_patient_survey.params import *
from gp_patient_survey.utils import *
from gp_patient_survey.data import *

In [3]:
data = load_google_sheet()

[🏁] FUCTION: load_google_sheet()
[✔️] Completed: load_google_sheet() - Time taken: 2.6586 seconds


In [4]:
data.head(2)

Unnamed: 0,time,surgery,phone,appointment_time,making_appointment,overall_experience,website,free_text
0,2024-02-22 12:00:42,Earls Court Surgery,Very easy,Very satisfied,Very good,Very good,Very good,
1,2024-02-22 12:01:16,Emperor's Gate Health Centre,Very easy,Very satisfied,Very good,Very good,Very good,


In [5]:
data.isna().sum()

time                  0
surgery               0
phone                 0
appointment_time      0
making_appointment    0
overall_experience    0
website               0
free_text             0
dtype: int64

# MAP DATA

In [11]:
import spacy

# Load the spaCy model
nlp = spacy.load("en_core_web_sm")

# Function to anonymize names in text
def anonymize_names(text):
    # Process the text
    doc = nlp(text)
    # Iterate over the entities
    anonymized_text = text
    for ent in doc.ents:
        # Check if the entity is a person
        if ent.label_ == "PERSON":
            # Replace the person's name with [PERSON]
            anonymized_text = anonymized_text.replace(ent.text, "[PERSON]")
    return anonymized_text

# Example text
text = "John Doe and Jane Smith went to Paris."
anonymized_text = anonymize_names(text)

print("Original Text:", text)
print("Anonymized Text:", anonymized_text)


Original Text: John Doe and Jane Smith went to Paris.
Anonymized Text: [PERSON] and [PERSON] went to Paris.


## Test it on rows of free text from my dataframe

In [13]:
for index, row in data.iterrows():
    text = row['free_text']
    text = text.lower()
    print("++ Original Text +++++++++++++++++++++++++++++++++++++++++++++++++++++")
    print(text)
    print("++ Anonemyzed Text +++++++++++++++++++++++++++++++++++++++++++++++++++")
    print(anonymize_names(text))

++ Original Text +++++++++++++++++++++++++++++++++++++++++++++++++++++

++ Anonemyzed Text +++++++++++++++++++++++++++++++++++++++++++++++++++

++ Original Text +++++++++++++++++++++++++++++++++++++++++++++++++++++

++ Anonemyzed Text +++++++++++++++++++++++++++++++++++++++++++++++++++

++ Original Text +++++++++++++++++++++++++++++++++++++++++++++++++++++

++ Anonemyzed Text +++++++++++++++++++++++++++++++++++++++++++++++++++

++ Original Text +++++++++++++++++++++++++++++++++++++++++++++++++++++

++ Anonemyzed Text +++++++++++++++++++++++++++++++++++++++++++++++++++

++ Original Text +++++++++++++++++++++++++++++++++++++++++++++++++++++

++ Anonemyzed Text +++++++++++++++++++++++++++++++++++++++++++++++++++

++ Original Text +++++++++++++++++++++++++++++++++++++++++++++++++++++

++ Anonemyzed Text +++++++++++++++++++++++++++++++++++++++++++++++++++

++ Original Text +++++++++++++++++++++++++++++++++++++++++++++++++++++

++ Anonemyzed Text +++++++++++++++++++++++++++++++++++++++++++++

## Huggingface Annonymized 

In [14]:
from transformers import pipeline

# Load a pre-trained NER pipeline
ner_pipeline = pipeline("ner", model="dbmdz/bert-large-cased-finetuned-conll03-english", aggregation_strategy="simple")

# Function to anonymize names in text
def anonymize_names_with_transformers(text):
    # Run the NER pipeline on the input text
    entities = ner_pipeline(text)
    anonymized_text = text
    # Iterate over detected entities
    for entity in entities:
        # Check if the entity is a person
        if entity['entity_group'] == 'PER':
            # Replace the detected name with [PERSON]
            anonymized_text = anonymized_text.replace(entity['word'], '[PERSON]')
    return anonymized_text

# Example text
text = "John Doe and Jane Smith went to Paris last summer."
anonymized_text = anonymize_names_with_transformers(text)

print("Original Text:", text)
print("Anonymized Text:", anonymized_text)

config.json:   0%|          | 0.00/998 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.33G [00:00<?, ?B/s]

Some weights of the model checkpoint at dbmdz/bert-large-cased-finetuned-conll03-english were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


tokenizer_config.json:   0%|          | 0.00/60.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/213k [00:00<?, ?B/s]

Original Text: John Doe and Jane Smith went to Paris last summer.
Anonymized Text: [PERSON] and [PERSON] went to Paris last summer.


In [16]:
for index, row in data.iterrows():
    text = row['free_text']
    print("++ Original Text +++++++++++++++++++++++++++++++++++++++++++++++++++++")
    print(text)
    print("++ Anonemyzed Text +++++++++++++++++++++++++++++++++++++++++++++++++++")
    print(anonymize_names_with_transformers(text))

++ Original Text +++++++++++++++++++++++++++++++++++++++++++++++++++++

++ Anonemyzed Text +++++++++++++++++++++++++++++++++++++++++++++++++++

++ Original Text +++++++++++++++++++++++++++++++++++++++++++++++++++++

++ Anonemyzed Text +++++++++++++++++++++++++++++++++++++++++++++++++++

++ Original Text +++++++++++++++++++++++++++++++++++++++++++++++++++++

++ Anonemyzed Text +++++++++++++++++++++++++++++++++++++++++++++++++++

++ Original Text +++++++++++++++++++++++++++++++++++++++++++++++++++++

++ Anonemyzed Text +++++++++++++++++++++++++++++++++++++++++++++++++++

++ Original Text +++++++++++++++++++++++++++++++++++++++++++++++++++++

++ Anonemyzed Text +++++++++++++++++++++++++++++++++++++++++++++++++++

++ Original Text +++++++++++++++++++++++++++++++++++++++++++++++++++++

++ Anonemyzed Text +++++++++++++++++++++++++++++++++++++++++++++++++++

++ Original Text +++++++++++++++++++++++++++++++++++++++++++++++++++++

++ Anonemyzed Text +++++++++++++++++++++++++++++++++++++++++++++