### Anonymizing ###
https://python.langchain.com/v0.1/docs/guides/productionization/safety/presidio_data_anonymization/reversible/

In [1]:
# Install necessary packages
%pip install --upgrade --quiet  langchain langchain-experimental langchain-openai presidio-analyzer presidio-anonymizer spacy Faker
# ! python -m spacy download en_core_web_lg


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.0[0m[39;49m -> [0m[32;49m24.1.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip3 install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


In [1]:
from langchain_experimental.data_anonymizer import PresidioReversibleAnonymizer

anonymizer = PresidioReversibleAnonymizer(
    analyzed_fields=["PERSON", "PHONE_NUMBER", "EMAIL_ADDRESS", "CREDIT_CARD"],
    # Faker seed is used here to make sure the same fake data is generated for the test purposes
    # In production, it is recommended to remove the faker_seed parameter (it will default to None)
    faker_seed=42,
)

anonymizer.anonymize(
    "My name is Slim Shady, call me at 313-666-7440 or email me at real.slim.shady@gmail.com. "
    "By the way, my card number is: 4916 0387 9536 0861"
)

Collecting en-core-web-lg==3.7.1
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-3.7.1/en_core_web_lg-3.7.1-py3-none-any.whl (587.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m587.7/587.7 MB[0m [31m2.1 MB/s[0m eta [36m0:00:00[0m00:01[0m00:02[0m
Installing collected packages: en-core-web-lg
Successfully installed en-core-web-lg-3.7.1



[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.0[0m[39;49m -> [0m[32;49m24.1.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip3 install --upgrade pip[0m


[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_lg')
[38;5;3m⚠ Restart to reload dependencies[0m
If you are in a Jupyter or Colab notebook, you may need to restart Python in
order to load all the package's dependencies. You can do this by selecting the
'Restart kernel' or 'Restart runtime' option.


'My name is Maria Lynch, call me at 7344131647 or email me at jamesmichael@example.com. By the way, my card number is: 4838637940262'

In [2]:
# We know this data, as we set the faker_seed parameter
fake_name = "Maria Lynch"
fake_phone = "7344131647"
fake_email = "jamesmichael@example.com"
fake_credit_card = "4838637940262"

anonymized_text = f"""{fake_name} recently lost his wallet. 
Inside is some cash and his credit card with the number {fake_credit_card}. 
If you would find it, please call at {fake_phone} or write an email here: {fake_email}.
{fake_name} would be very grateful!"""

print(anonymized_text)

Maria Lynch recently lost his wallet. 
Inside is some cash and his credit card with the number 4838637940262. 
If you would find it, please call at 7344131647 or write an email here: jamesmichael@example.com.
Maria Lynch would be very grateful!


In [3]:
print(anonymizer.deanonymize(anonymized_text))

Slim Shady recently lost his wallet. 
Inside is some cash and his credit card with the number 4916 0387 9536 0861. 
If you would find it, please call at 313-666-7440 or write an email here: real.slim.shady@gmail.com.
Slim Shady would be very grateful!


## Using with LangChain Expression Language ##

With LCEL we can easily chain together anonymization and deanonymization with the rest of our application. This is an example of using the anonymization mechanism with a query to LLM (without deanonymization for now):

In [4]:
text = """Slim Shady recently lost his wallet. 
Inside is some cash and his credit card with the number 4916 0387 9536 0861. 
If you would find it, please call at 313-666-7440 or write an email here: real.slim.shady@gmail.com."""

In [5]:
from langchain_core.prompts.prompt import PromptTemplate
from langchain_openai import ChatOpenAI

anonymizer = PresidioReversibleAnonymizer()

template = """Rewrite this text into an official, short email:

{anonymized_text}"""
prompt = PromptTemplate.from_template(template)
llm = ChatOpenAI(temperature=0)

chain = {"anonymized_text": anonymizer.anonymize} | prompt | llm
response = chain.invoke(text)
print(response.content)

Subject: Lost Wallet

Dear Sir/Madam,

I am writing to inform you that Patrick Ferrell has recently lost his wallet. Inside the wallet is some cash and his credit card with the number 3571341135737089. If you happen to find it, please contact us at 001-562-667-8690 or email us at hfoster@example.org.

Thank you for your attention to this matter.

Sincerely,
[Your Name]


In [6]:
chain = chain | (lambda ai_message: anonymizer.deanonymize(ai_message.content))
response = chain.invoke(text)
print(response)

Subject: Lost Wallet

Dear Sir/Madam,

I am writing to inform you that Slim Shady has recently lost his wallet. Inside the wallet is some cash and his credit card with the number 4916 0387 9536 0861. If you happen to find it, please contact us at 313-666-7440 or email us at real.slim.shady@gmail.com.

Thank you for your attention to this matter.

Sincerely,
[Your Name]


In [7]:

anonymizer.deanonymizer_mapping

{'PERSON': {'Patrick Ferrell': 'Slim Shady'},
 'CREDIT_CARD': {'3571341135737089': '4916 0387 9536 0861'},
 'PHONE_NUMBER': {'001-562-667-8690': '313-666-7440'},
 'EMAIL_ADDRESS': {'hfoster@example.org': 'real.slim.shady@gmail.com'}}

In [8]:
print(
    anonymizer.anonymize(
        "Do you have his VISA card number? Yep, it's 4001 9192 5753 7193. I'm John Doe by the way."
    )
)

anonymizer.deanonymizer_mapping

Do you have his VISA card number? Yep, it's 3525816562749902. I'm David Wright by the way.


{'PERSON': {'Patrick Ferrell': 'Slim Shady', 'David Wright': 'John Doe'},
 'CREDIT_CARD': {'3571341135737089': '4916 0387 9536 0861',
  '3525816562749902': '4001 9192 5753 7193'},
 'PHONE_NUMBER': {'001-562-667-8690': '313-666-7440'},
 'EMAIL_ADDRESS': {'hfoster@example.org': 'real.slim.shady@gmail.com'}}

In [9]:
print(
    anonymizer.anonymize(
        "My VISA card number is 4001 9192 5753 7193 and my name is John Doe."
    )
)

anonymizer.deanonymizer_mapping

My VISA card number is 3525816562749902 and my name is David Wright.


{'PERSON': {'Patrick Ferrell': 'Slim Shady', 'David Wright': 'John Doe'},
 'CREDIT_CARD': {'3571341135737089': '4916 0387 9536 0861',
  '3525816562749902': '4001 9192 5753 7193'},
 'PHONE_NUMBER': {'001-562-667-8690': '313-666-7440'},
 'EMAIL_ADDRESS': {'hfoster@example.org': 'real.slim.shady@gmail.com'}}

In [10]:
# We can save the deanonymizer mapping as a JSON or YAML file

anonymizer.save_deanonymizer_mapping("deanonymizer_mapping.json")
# anonymizer.save_deanonymizer_mapping("deanonymizer_mapping.yaml")

In [11]:
anonymizer = PresidioReversibleAnonymizer()

anonymizer.deanonymizer_mapping

{}

In [12]:
anonymizer.load_deanonymizer_mapping("deanonymizer_mapping.json")

anonymizer.deanonymizer_mapping

{'PERSON': {'Patrick Ferrell': 'Slim Shady', 'David Wright': 'John Doe'},
 'CREDIT_CARD': {'3571341135737089': '4916 0387 9536 0861',
  '3525816562749902': '4001 9192 5753 7193'},
 'PHONE_NUMBER': {'001-562-667-8690': '313-666-7440'},
 'EMAIL_ADDRESS': {'hfoster@example.org': 'real.slim.shady@gmail.com'}}