This code refers to the blog:

Title: [EN] Santa Claus Meets Generative AI: Deciphering Handwritten Christmas Letters with LLM, LangChain, and Elasticsearch

Author: Alex Salgado

Link: https://discuss.elastic.co/t/dec-22nd-2023-en-santa-claus-meets-genai-deciphering-handwritten-christmas-letters-with-llm-langchain-and-elasticsearch/347311

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/salgado/christmas-2023/blob/main/04_advent_calendar_en.ipynb)


# Santa Claus Meets Generative AI: Deciphering Handwritten Christmas Letters with LLM, LangChain and Elasticsearch

In the heart of the North Pole, Santa Claus' team of elves faced a formidable logistical challenge: how to handle millions of letters from children around the world. With a determined look, Santa Claus decided that it was time to incorporate artificial intelligence into the Christmas operation.

Sitting at his computer, equipped with the latest in AI technology, Santa Claus began to work on a Python script in Jupyter Notebook. The goal was simple, but ambitious: to use Large Language Models (LLM) and LangChain to interpret handwritten letters and extract the necessary data, inserting it in an organized manner into Elasticsearch.


In [23]:
!pip install python-dotenv elasticsearch langchain openai




The first step was to set up the environment variables that would be used as credentials for accessing the OpenAI and Elasticsearch APIs.

In [None]:
import os
from dotenv import load_dotenv

# Replace 'path/to/your/.env' with the correct path to your .env file on Google Drive.
env_path = '/content/drive/MyDrive/@Blogs/04-Advent-2023/env_advent'
load_dotenv(env_path)

# OpenAI API Key
OPENAI_API_KEY = os.getenv('OPENAI_API_KEY')
OPENAI_API_URL = "https://api.openai.com/v1/chat/completions"

# Elastic cloud credentials
es_cloud_id = os.getenv('cloud_id')
es_user = os.getenv('cloud_user')
es_pass = os.getenv('cloud_pass')


In [25]:
from PIL import Image
import requests
import numpy as np


Next, with a scanned image of a Christmas letter, Santa Claus wrote a script to extract the text using "gpt-4-vision-preview". This crucial step transformed the handwritten writing into digital text.

In [None]:
from langchain.chat_models import ChatOpenAI
from langchain.schema.messages import HumanMessage, SystemMessage

image_path = 'https://i.imgur.com/IxC9lgd.png'

chat = ChatOpenAI(model="gpt-4-vision-preview", max_tokens=512)
result = chat.invoke(
    [
        HumanMessage(
            content=[
                {"type": "text", "text": "What is in the picture? Please provide a detailed introduction."},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": image_path,
                        "detail": "auto",
                    },
                },
            ]
        )
    ]
)


print(result.content)

Next, LangChain came into action, analyzing the text and identifying key elements such as the child's name and the wish list.

In [31]:
from langchain.prompts import PromptTemplate
from langchain.chat_models import ChatOpenAI
from langchain.schema import StrOutputParser

chain = ChatOpenAI(model="gpt-3.5-turbo", max_tokens=1024)

prompt = PromptTemplate.from_template(
"""
Extract the list and child's name from the text below and return the data in JSON format using the following name:
- "child_name", "wishlist".

{santalist}

"""
)

runnable = prompt | chain | StrOutputParser()

In [32]:
letter = result.content
wishlist = runnable.invoke({"santalist": letter})
print(wishlist)

{
  "child_name": "Maria",
  "wishlist": [
    "Barbie Dreamhouse Adventures",
    "My Little Pony"
  ]
}


Santa Claus decided to enrich the database a bit, and also asked the AI to estimate the weight of these gifts. This way, he can generate a list in Kibana with the children's gifts divided into each bag and that fit within the space of a sleigh, what organization!!

In [33]:
chain = ChatOpenAI(model="gpt-3.5-turbo", max_tokens=1024)

prompt = PromptTemplate.from_template(
"""

{santalist_json}

From the JSON above, include a new attribute in the JSON called 'weight',
which will calculate the total estimated weight of each item in the list in kilograms.
You will first need to estimate the weight of each item individually.
After that, sum these values to obtain the total weight.
Extract only the numerical value.


"""
)

runnable = prompt | chain | StrOutputParser()

In [34]:
new_wishlist = runnable.invoke({"santalist_json": wishlist})
print(new_wishlist)

{
  "child_name": "Maria",
  "wishlist": [
    "Barbie Dreamhouse Adventures",
    "My Little Pony"
  ],
  "weight": 0.5
}


In [35]:
# Insert into Elasticsearch

Now, with the data structured, it was time to move them into Elasticsearch.

In [36]:
from elasticsearch import Elasticsearch

In [37]:
es = Elasticsearch(cloud_id=es_cloud_id,
                  basic_auth=(es_user, es_pass)
                  )
es.info() # should return cluster info


ObjectApiResponse({'name': 'instance-0000000001', 'cluster_name': 'c731f6bbb8314543bb3648440b501e47', 'cluster_uuid': 'pdZVQFRuTr2u3yh4l0sZyg', 'version': {'number': '8.11.2', 'build_flavor': 'default', 'build_type': 'docker', 'build_hash': '76013fa76dcbf144c886990c6290715f5dc2ae20', 'build_date': '2023-12-05T10:03:47.729926671Z', 'build_snapshot': False, 'lucene_version': '9.8.0', 'minimum_wire_compatibility_version': '7.17.0', 'minimum_index_compatibility_version': '7.0.0'}, 'tagline': 'You Know, for Search'})

In [38]:
import json

In [39]:
# Parse the JSON string
json_string = new_wishlist
data = json.loads(json_string)

# Index name
index_name = "santa_claus_list"

# Index the document
response = es.index(index=index_name, document=data)

# Print the response from Elasticsearch
print(response)

{'_index': 'santa_claus_list', '_id': 'nV8XhIwBrZtDJJrwaZX4', '_version': 1, 'result': 'created', '_shards': {'total': 2, 'successful': 2, 'failed': 0}, '_seq_no': 9, '_primary_term': 1}


Usando o Kibana, uma interface de visualização do Elasticsearch, Papai Noel e os duendes poderiam então facilmente buscar e analisar os dados. Isso permitia uma visão clara das tendências de presentes deste ano, as localizações mais frequentes das cartas, e até mesmo identificar aquelas cartas que expressavam desejos particulares ou urgentes.

In [None]:
 # As in this query using Elasticsearch SQL (ES|QL):

POST /_query?format=txt
{
  "query": """
FROM santa_claus_list
| STATS  sum_toy = SUM(weight) BY child_name
| LIMIT 100
  """
}

# result
    sum_toy    |  child_name
---------------+---------------
30.5           |Maria
1.5            |Mike
3.0            |Theo
2.5            |Isabella
40.0           |William
30.0           |Olivia



Thanks to this innovative solution, Santa Claus was not only able to fulfill requests more efficiently but also gained valuable insights into the joys and hopes of children around the world, all thanks to the power of AI, LangChain, and Elasticsearch. This Christmas promised to be the most magical and well-organized ever!