<div style="width: 100%; overflow: hidden;">
    <div style="width: 150px; float: left;"> <img src="data/D4Sci_logo_ball.png" alt="Data For Science, Inc" align="left" border="0"> </div>
    <div style="float: left; margin-left: 10px;"> <h1>LangChain for Generative AI</h1>
<h1>LangChain</h1>
        <p>Bruno Gonçalves<br/>
        <a href="http://www.data4sci.com/">www.data4sci.com</a><br/>
            @bgoncalves, @data4sci</p></div>
</div>

In [1]:
from collections import Counter
from pprint import pprint
from operator import itemgetter

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt 

import torch

import openai
from openai import OpenAI

import transformers
from transformers import pipeline
from transformers import set_seed
set_seed(42) # Set the seed to get reproducible results


import langchain
from langchain.chains import create_sql_query_chain

import langchain_openai
from langchain_openai import ChatOpenAI

import langchain_anthropic
from langchain_anthropic import ChatAnthropic

from langchain.tools import DuckDuckGoSearchRun

from langchain_core.messages import HumanMessage, SystemMessage
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder, PromptTemplate
from langchain_core.chat_history import BaseChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_core.runnables import RunnablePassthrough

import langchain_community
from langchain_community.chat_message_histories import ChatMessageHistory
from langchain_community.utilities import SQLDatabase
from langchain_community.tools.sql_database.tool import QuerySQLDataBaseTool


import watermark

%load_ext watermark
%matplotlib inline

We start by print out the versions of the libraries we're using for future reference

In [2]:
%watermark -n -v -m -g -iv

Python implementation: CPython
Python version       : 3.11.7
IPython version      : 8.12.3

Compiler    : Clang 14.0.6 
OS          : Darwin
Release     : 23.5.0
Machine     : arm64
Processor   : arm
CPU cores   : 16
Architecture: 64bit

Git hash: bd2418d30c476bc5452faaa2f8e9e7fa77b6d594

langchain_openai   : 0.1.8
watermark          : 2.4.3
langchain_anthropic: 0.1.15
torch              : 2.3.0
pandas             : 2.1.4
numpy              : 1.26.4
transformers       : 4.41.1
langchain_community: 0.2.1
matplotlib         : 3.8.0
openai             : 1.30.5
langchain          : 0.2.2



Load default figure style

In [3]:
plt.style.use('./d4sci.mplstyle')

# OpenAI

The first step is generate API key on the OpenAI website and store it as the "OPENAI_API_KEY" variable in your local environment. Without it we won't be able to do anything. You can find your API key in your using settings: https://help.openai.com/en/articles/4936850-where-do-i-find-my-secret-api-key. Then we are ready to instantiate the client

In [4]:
client = OpenAI()

In [5]:
response = client.chat.completions.create(
  model="gpt-4o",
  messages=[
        {
            "role": "user", 
            "content": "What was Superman's weakness?"
        },
    ]
)

In [6]:
print(response)

ChatCompletion(id='chatcmpl-9Wk4KIhvcyqEQlya3JCWv9DPs9tz3', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content="Superman's primary weakness is Kryptonite, a radioactive element from his home planet, Krypton. When exposed to Kryptonite, Superman's powers are significantly diminished, and prolonged exposure can be fatal to him. Kryptonite typically appears in green form but exists in several other varieties, each with different effects on Superman and others. Additionally, Superman is vulnerable to magic and suffers from the same physical needs (such as the need to breathe) and limitations when facing magical creatures or spells. His powers are also significantly reduced when he is away from the yellow sun that powers him.", role='assistant', function_call=None, tool_calls=None))], created=1717590612, model='gpt-4o-2024-05-13', object='chat.completion', system_fingerprint='fp_319be4768e', usage=CompletionUsage(completion_tokens=113, prompt

In [7]:
print(response.choices[0].message.content)

Superman's primary weakness is Kryptonite, a radioactive element from his home planet, Krypton. When exposed to Kryptonite, Superman's powers are significantly diminished, and prolonged exposure can be fatal to him. Kryptonite typically appears in green form but exists in several other varieties, each with different effects on Superman and others. Additionally, Superman is vulnerable to magic and suffers from the same physical needs (such as the need to breathe) and limitations when facing magical creatures or spells. His powers are also significantly reduced when he is away from the yellow sun that powers him.


# LangChain

We instantiate the LangChain interface for OpenAI

In [8]:
model = ChatOpenAI(model="gpt-4o")

In [9]:
messages = [
    SystemMessage(content="What was Superman's weakness?"),
]

model.invoke(messages)

AIMessage(content="Superman's primary weakness is Kryptonite, a radioactive substance from his home planet, Krypton. Exposure to Kryptonite can weaken Superman, strip him of his powers, and even potentially kill him with prolonged exposure. There are various forms of Kryptonite, each with different effects, but the most commonly depicted is green Kryptonite. Other colors, such as red, blue, gold, and black Kryptonite, have unique and often unpredictable effects on Superman.\n\nIn addition to Kryptonite, Superman is also vulnerable to magic. Unlike his physical invulnerability to most forms of attack, magic can harm him. Furthermore, Superman's powers are dependent on the radiation from Earth's yellow sun, so prolonged absence from such radiation or exposure to red sun radiation can also weaken him.", response_metadata={'token_usage': {'completion_tokens': 151, 'prompt_tokens': 13, 'total_tokens': 164}, 'model_name': 'gpt-4o', 'system_fingerprint': 'fp_319be4768e', 'finish_reason': 'sto

In [10]:
parser = StrOutputParser()

In [11]:
result = model.invoke(messages)

In [12]:
parser.invoke(result)

"Superman's primary weakness is Kryptonite, a mineral from his home planet of Krypton. There are several types of Kryptonite, each with different effects, but the most commonly known is Green Kryptonite, which weakens Superman and can potentially kill him with prolonged exposure. Other forms of Kryptonite, like Red Kryptonite, have unpredictable effects, while Gold Kryptonite can strip Kryptonians of their powers permanently.\n\nIn addition to Kryptonite, Superman is also vulnerable to magic and can be overpowered by beings of sufficient strength or technology. His powers are derived from Earth's yellow sun, so being deprived of sunlight or exposed to a red sun similar to Krypton's can weaken or nullify his abilities."

Let us create our first chain. Stages of the chain are conencted with the pipe '|' character

In [13]:
chain = model | parser

Now whenver we call __invoke()__ on the chain, it automatically runs all the steps

In [14]:
chain.invoke(messages)

"Superman's primary weakness is Kryptonite, a radioactive mineral from his home planet of Krypton. Exposure to Kryptonite weakens Superman, stripping him of his powers and making him vulnerable to harm. Different forms of Kryptonite have varying effects, with green Kryptonite being the most well-known and harmful to Superman. Other forms, like red Kryptonite, have unpredictable and often bizarre effects.\n\nIn addition to Kryptonite, Superman is also vulnerable to magic and can be harmed by magical spells and weapons. His powers also rely on the energy from Earth's yellow sun, so being deprived of solar energy or exposed to red sunlight can weaken him as well."

We can also create templates for our prompts, following conventions similar to the Jinja templating system

In [15]:
system_template = "Translate the following into {language}:"

And we can combine multiple messages into a single template

In [16]:
prompt_template = ChatPromptTemplate.from_messages(
    [
     ("system", system_template), 
     ("user", "{text}")]
)

To instantiate the prompt, we must provide the correct fields

In [17]:
result = prompt_template.invoke({
    "language": "italian", 
    "text": "Be the change that you wish to see in the world."
})

result

ChatPromptValue(messages=[SystemMessage(content='Translate the following into italian:'), HumanMessage(content='Be the change that you wish to see in the world.')])

The full interaction is:

In [18]:
result.to_messages()

[SystemMessage(content='Translate the following into italian:'),
 HumanMessage(content='Be the change that you wish to see in the world.')]

In [19]:
chain = prompt_template | model | parser

In [20]:
chain.invoke({
    "language": "italian", 
    "text": "Be the change that you wish to see in the world."
})

'Sii il cambiamento che vuoi vedere nel mondo.'

# Anthropic

In [21]:
model_a = ChatAnthropic(model="claude-3-opus-20240229")

In [22]:
chain_a = prompt_template | model_a | parser

In [23]:
chain_a.invoke({
    "language": "italian", 
    "text": "Be the change that you wish to see in the world."
})

'Sii il cambiamento che desideri vedere nel mondo.'

# Message History

In [24]:
store = {}


def get_session_history(session_id: str) -> BaseChatMessageHistory:
    if session_id not in store:
        store[session_id] = ChatMessageHistory()
    return store[session_id]


with_message_history = RunnableWithMessageHistory(model, get_session_history)

In [25]:
config = {"configurable": {"session_id": "abc2"}}

In [26]:
response = with_message_history.invoke(
    [HumanMessage(content="Hi! I'm Bob")],
    config=config,
)

response.content

'Hi, Bob! How can I assist you today?'

In [27]:
response = with_message_history.invoke(
    [HumanMessage(content="What's my name?")],
    config=config,
)

response.content

'You mentioned that your name is Bob. How can I assist you further?'

In [28]:
config = {"configurable": {"session_id": "abc3"}}

response = with_message_history.invoke(
    [HumanMessage(content="What's my name?")],
    config=config,
)

response.content

"I'm sorry, but I don't have access to that information. How can I assist you today?"

In [29]:
config = {"configurable": {"session_id": "abc2"}}

response = with_message_history.invoke(
    [HumanMessage(content="What's my name?")],
    config=config,
)

response.content

'Your name is Bob. How can I help you today, Bob?'

In [30]:
prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You are a helpful assistant. Answer all questions to the best of your ability.",
        ),
        MessagesPlaceholder(variable_name="messages"),
    ]
)

chain = prompt | model | parser

In [31]:
response = chain.invoke({"messages": [HumanMessage(content="hi! I'm bob")]})

response

'Hi Bob! How can I help you today?'

In [32]:
with_message_history = RunnableWithMessageHistory(chain, get_session_history)

In [33]:
config = {"configurable": {"session_id": "abc5"}}

In [34]:
response = with_message_history.invoke(
    [HumanMessage(content="Hi! I'm Jim")],
    config=config,
)

response

'Hi Jim! How can I assist you today?'

In [35]:
response = with_message_history.invoke(
    [HumanMessage(content="What's my name?")],
    config=config,
)

response

'Your name is Jim! How can I help you today?'

# Database Integration

In [36]:
db = SQLDatabase.from_uri("sqlite:///data/Northwind_small.sqlite")

In [37]:
print(db.dialect)

sqlite


In [38]:
print(db.get_usable_table_names())

['Category', 'Customer', 'CustomerCustomerDemo', 'CustomerDemographic', 'Employee', 'EmployeeTerritory', 'Order', 'OrderDetail', 'Product', 'Region', 'Shipper', 'Supplier', 'Territory']


In [39]:
llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0) # temperature 0 - meaning no creativity 

In [40]:
write_query = create_sql_query_chain(llm, db)

In [41]:
response = write_query.invoke({"question": "How many customers are there"}) 
response

'SELECT COUNT("Id") AS TotalCustomers FROM Customer;'

In [42]:
db.run(response)

'[(91,)]'

In [43]:
execute_query = QuerySQLDataBaseTool(db=db)

In [44]:
sql_chain = write_query | execute_query

In [45]:
sql_chain.invoke({"question": "How many employees are there"})

'[(9,)]'

In [46]:
answer_prompt = PromptTemplate.from_template(
    """Given the following user question, corresponding SQL query, and SQL result, answer the user question.

Question: {question}
SQL Query: {query}
SQL Result: {result}
Answer: """
)

# Here we are creating a new chain 'answer' that uses another chain 'answer_prompt'
answer = answer_prompt | llm | StrOutputParser() 
chain = (
    RunnablePassthrough.assign(query=write_query).assign(
        result=itemgetter("query") | execute_query
    )
    | answer
)

chain.invoke({"question": "How many employees are there"})

'There are a total of 9 employees.'

In [47]:
search = DuckDuckGoSearchRun()
search.run("When will the next solar eclipse be?")

"Get Ready for These Upcoming Eclipses! More Eclipses Solar Eclipses Date Solar Eclipse Type Geographic Region of Visibility Oct. 2, 2024 Annular An annular solar eclipse will be visible in South America, and a partial eclipse will be visible in South America, Antarctica, Pacific Ocean, Atlantic Ocean, North America March 29, 2025 Partial Europe, Asia, […] It will be 20 years before there's a chance to witness a total solar eclipse in the United States again. According to NASA, after Monday's total solar eclipse, the next one viewable from the ... This map of eclipse paths from 2024 to 2044 reveals that Australia hit the jackpot: Over just 11 years, the continent (lower right) will see four total solar eclipses — in 2028, 2030, 2037 and 2038. On April 8, 2024, a total solar eclipse moved across North America, passing over Mexico, the United States, and Canada. A total solar eclipse happens when the Moon passes between the Sun and Earth, completely blocking the face of the Sun. The sky 

<center>
     <img src="data/D4Sci_logo_full.png" alt="Data For Science, Inc" align="center" border="0" width=300px> 
</center>