## What is langchain?

LangChain is a framework designed to build applications powered by large language models (LLMs), enabling dynamic interactions with tools, APIs, and data sources. It simplifies the development of AI-driven systems like chatbots, autonomous agents, and data processing pipelines by integrating language models seamlessly into workflows. </br> </br>
![LangChain Icon](https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcQlnc1hwPoxs1LQSbeAIk_JfVEwONTMLDrdRA&s)

### Why do we use langchain using AWS?
Combining LangChain with AWS allows developers to leverage LangChain's powerful LLM framework alongside AWS's scalable infrastructure, enabling seamless deployment, efficient resource management, and integration with AWS services like Lambda, SageMaker, and DynamoDB for building robust, production-grade AI applications.

## How to use Langchain.

#### Installing necesseary libraries.

In [1]:
!pip install python-dotenv \
    boto3 \
    langchain_community \
    langchain_experimental \
    presidio-anonymizer \
    presidio-analyzer \
    poppler-utils \
    neo4j \
    wget \
    pymupdf \
    langchain_aws \
    chromadb \
    json-repair \
    pandas \
    numexpr \
    duckduckgo-search \
    Faker -q


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.2[0m[39;49m -> [0m[32;49m24.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


#### Importing necesseary libraries and setting eviromental variables. 

In [3]:
!export AWS_PROFILE=stormy-api

In [4]:
from dotenv import load_dotenv
import boto3
import os
from langchain_community.chat_models import BedrockChat

# load_dotenv()
AWS_PROFILE = 'stormy-api'
print(AWS_PROFILE)

stormy-api


#### How to define which LLM to use using Langchain?

In order to define the LLM we will be using, we need to know 2 things beforehand. <br/>
First, what is provider of that particular LLM. The providers are listed on [the langchain website](https://python.langchain.com/docs/integrations/providers/). <br/>
For the purpose of this tutorial we will use Bedrock, as this provided has been decided to be primary one for Dolby. <br/>
The second issue is the model we can use. This involves model name and its version. In order to make solutions create by LLM's more backward comparible, a lot of providers allow usage of older versions of their top models. </br>
To find out proper model id for configurating chosen chat model, for Bedrock service, visit [their website](https://docs.aws.amazon.com/bedrock/latest/userguide/models-supported.html).


In [15]:
session = boto3.Session(profile_name=AWS_PROFILE)
# session = boto3.Session()
model_id = "anthropic.claude-3-5-sonnet-20240620-v1:0"

bedrock = session.client(service_name="bedrock", region_name="us-west-2")
bedrock_runtime = session.client(
    service_name="bedrock-runtime", region_name="us-west-2"
)
llm = BedrockChat(model_id=model_id, client=bedrock_runtime)

In [16]:
session

Session(region_name='us-west-2')

### How to use Langchain?
This section will be devoted to showing how to use basic Langchain and prompt techniques to develop applications.

#### How to run Inference?
The most basic of the tasks consists of simply asking LLM a question that based on its training data is likely to give correct answer. Examples of tasks that are "baked" into llm's are:

1. **Text Summarization**  
   Generate concise summaries of long texts without requiring training on the specific dataset.

2. **Sentiment Analysis**  
   Determine the sentiment (positive, negative, neutral) of a given text input.

3. **Language Translation**  
   Translate text between languages without prior fine-tuning for the target pair.

4. **Content Classification**  
   Categorize text into predefined categories based on context and semantics.

5. **Question Answering**  
   Provide accurate answers to user queries based on context or provided documents.

6. **Text Generation**  
   Produce coherent and contextually relevant text based on a prompt.

7. **Named Entity Recognition (NER)**  
   Identify and extract entities such as names, dates, and locations from text.

8. **Code Generation**  
   Write syntactically correct code snippets based on plain-text descriptions or specifications.

In [17]:
from langchain.prompts import PromptTemplate

# Example 1: Text Summarization
summarization_prompt = PromptTemplate(
    input_variables=["text"],
    template="Please provide a concise summary of the following text: {text}"
)
summarization_chain =  summarization_prompt | llm

# Input example for summarization
input_text = "LangChain is a framework for developing applications powered by large language models. It provides tools to integrate models with external data sources and build advanced applications."
summary = summarization_chain.invoke(input_text)
print(summary.content)

# Example 2: Sentiment Analysis
sentiment_prompt = PromptTemplate(
    input_variables=["text"],
    template="Determine the sentiment of the following text (positive, negative, or neutral). Provide no additional explanation, just a label: {text}"
)
sentiment_chain = sentiment_prompt | llm

# Input example for sentiment analysis
sentiment_text = "I absolutely love the new features introduced in this product!"
sentiment = sentiment_chain.invoke(sentiment_text)
print("Sentiment:", sentiment.content)


Concise summary:

LangChain is a framework for creating language model-powered applications, offering tools to connect models with external data and develop sophisticated applications.
Sentiment: Positive


#### Streaming
In order to provide user with real time output generation, it is recommended to use streaming feature, especially when expexting large outputs.

In [18]:
question = "Give me an example of streamlit code for a simple dashboard."
for chunk in llm.stream(question):
    print(chunk.content, end="", flush=True)

Certainly! Here's an example of a simple Streamlit dashboard that includes a title, some text, a chart, and a user input field:

```python
import streamlit as st
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Set page title
st.set_page_config(page_title="Simple Dashboard")

# Add a title
st.title("Simple Streamlit Dashboard")

# Add some text
st.write("This is a simple example of a Streamlit dashboard.")

# Create some sample data
data = pd.DataFrame({
    'x': range(10),
    'y': np.random.randn(10)
})

# Create a line chart
st.subheader("Line Chart")
st.line_chart(data)

# Create a bar chart
st.subheader("Bar Chart")
fig, ax = plt.subplots()
ax.bar(data['x'], data['y'])
st.pyplot(fig)

# Add a slider
st.subheader("Slider Input")
number = st.slider("Select a number", 0, 100, 50)
st.write(f"You selected: {number}")

# Add a text input
st.subheader("Text Input")
user_input = st.text_input("Enter some text")
if user_input:
    st.write(f"You entered: {user_inpu

### Batching
To optimize processing time and handle multiple inputs efficiently, batching is recommended. This approach is especially useful when analyzing large datasets or executing repeated tasks, as it allows multiple inputs to be processed in parallel, reducing latency and improving throughput.

In [23]:
batch_texts = [
    "I absolutely love the new features introduced in this product!",
    "The user experience was horrible and frustrating.",
    "This is an average product; nothing special but not terrible either.",
    "Fantastic! I'm thrilled with this service.",
]

# Use invoke_batch for processing multiple texts
batch_sentiments = sentiment_chain.batch(batch_texts)

# Display results
for text, sentiment in zip(batch_texts, batch_sentiments):
    print(f"Text: {text}\nSentiment: {sentiment.content}\n")

Text: I absolutely love the new features introduced in this product!
Sentiment: Positive

Text: The user experience was horrible and frustrating.
Sentiment: Negative

Text: This is an average product; nothing special but not terrible either.
Sentiment: Neutral

Text: Fantastic! I'm thrilled with this service.
Sentiment: Positive



### More advanced prompting techniques
This part of the tutorial will be dedicated to more avanced prompting techniques, that allow for more predice and grounded answers.

#### Few shot propting


In [24]:
from langchain.schema import SystemMessage, HumanMessage, AIMessage
from langchain_core.prompts import (
    ChatPromptTemplate,
    FewShotChatMessagePromptTemplate,
)

# Define the system message and sample dialoguesd
system_message = "Consider yourself as a seasoned novelist."

user_dialogue1 = "Fantasy novel, opening scene. Describe a mystical forest filled with magical creatures."
sample_response1 = """The forest was alive with wonder. Golden leaves rustled softly in the breeze, their edges glinting like precious metals.
Strange, luminous flowers cast a gentle glow, revealing the silhouettes of creatures unseen in the daylight.
A unicorn with a shimmering coat moved gracefully between the trees, while tiny fairies darted about, their laughter like the tinkling of bells."""

user_dialogue2 = "Thriller, a tense chase through a crowded city market."
sample_response2 = """The market was a blur of colors and sounds.
People shouted, bartering over fresh produce and exotic goods.
Amidst the chaos, a man in a dark coat pushed his way through, glancing back frequently.
Behind him, a figure moved with relentless precision, their eyes locked on the target.
The air was thick with tension, every footstep echoing the urgency of the chase."""

user_dialogue3 = "Romantic drama, a heartfelt confession at a picturesque beach."
sample_response3 = """The sun was setting, casting a warm golden glow over the serene beach.
Waves lapped gently at the shore, creating a soothing soundtrack to the evening.
Two figures stood facing each other, the world around them fading away.
'I’ve loved you since the first moment I saw you,' one confessed, their voice trembling with emotion.
The other’s eyes filled with tears, the weight of unspoken feelings finally finding release."""

examples = [
    {"input": user_dialogue1, "output": sample_response1},
    {"input": user_dialogue2, "output": sample_response2},
    {"input": user_dialogue3, "output": sample_response3},
]
# This is a prompt template used to format each individual example.
example_prompt = ChatPromptTemplate.from_messages(
    [
        ("human", "{input}"),
        ("ai", "{output}"),
    ]
)
few_shot_prompt = FewShotChatMessagePromptTemplate(
    example_prompt=example_prompt,
    examples=examples,
)
final_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_message),
        few_shot_prompt,
        ("human", "{input}"),
    ]
)

chain = final_prompt | llm

print(
    chain.invoke({"input": "Scify, music, poem"}).content
)

Here's a science fiction-themed poem about music:

Cosmic Symphony

In the void between stars, a melody grows
Pulsars keep time, as the universe flows
Quantum strings vibrate, an ethereal song
Echoing through space, where no one belongs

Alien harmonies, beyond human ears
Resonate through eons, across light-years
Plasma harps played by solar winds
A galactic orchestra, as dimensions spin

Nebulae hum with celestial tunes
Asteroids percussion, comets swoon
Black holes conduct with gravity's might
An eternal concert in the endless night

Spacefarers listen, their hearts enthralled
By cosmic music, unheard by all
In this vast expanse, cold and dark
Beats the rhythm of creation's spark


### Chain of Though Reasoning

In [25]:
problem = """
Alex, a 35-year-old film enthusiast from Brazil, is watching a documentary about iconic movie soundtracks.
He becomes captivated by a song from a film where a bride seeks revenge.
Intrigued by the lead actress, he investigates her background and travels to her hometown.
Once there, Alex’s attention shifts to the local sports culture.
He learns about a famous basketball team known for its rich history and legacy.
What is the second name of this basketball team?
"""

cot_query = f"""Problem statement:
{problem}
First, list systematically and in detail all the problems in this problem
that need to be solved before we can arrive at the correct answer.
Then, solve each sub problem using the answers of previous problems
and reach a final solution.
"""

print(llm.invoke(cot_query).content)

Let's break down the problems we need to solve:

1. Identify the film with a bride seeking revenge and its soundtrack.
2. Determine the lead actress in this film.
3. Find out the hometown of this lead actress.
4. Identify the famous basketball team in this hometown.
5. Determine the second name of this basketball team.

Now, let's solve each sub-problem:

1. Identify the film with a bride seeking revenge and its soundtrack:
   The most famous film fitting this description is "Kill Bill" directed by Quentin Tarantino. The soundtrack features several iconic songs.

2. Determine the lead actress in this film:
   The lead actress in "Kill Bill" is Uma Thurman.

3. Find out the hometown of this lead actress:
   Uma Thurman was born in Boston, Massachusetts, USA.

4. Identify the famous basketball team in this hometown:
   The famous basketball team in Boston is the Boston Celtics.

5. Determine the second name of this basketball team:
   The second name of the team is "Celtics".

Therefore,

In [9]:
problem = """
take the last letters of the words in "New Hampshire" and concatenate them.
"""
cot_query = f"""Problem statement:
{problem}
First, list systematically and in detail all the problems in this problem that need to be solved before we can arrive at the correct answer. Then, solve each sub problem using the answers of previous problems and reach a final solution.
"""

print(llm.invoke(cot_query).content)


Let's break down the problem and identify the sub-problems that need to be solved:

1. Identifying the words in "New Hampshire"
2. Determining the last letter of each word
3. Concatenating the last letters

Now, let's solve each sub-problem:

1. Identifying the words in "New Hampshire":
   - The phrase "New Hampshire" consists of two words: "New" and "Hampshire"

2. Determining the last letter of each word:
   - For "New": The last letter is "w"
   - For "Hampshire": The last letter is "e"

3. Concatenating the last letters:
   - We need to join the last letters we identified in step 2 in the order they appear in the original phrase

Final solution:
Now that we have solved each sub-problem, we can combine the results to get our final answer.

The last letter of "New" is "w", and the last letter of "Hampshire" is "e".
Concatenating these letters in order gives us: "we"

Therefore, the final answer is "we".


### Forcing models to be consistent by example

In [10]:
template = f"""
Q: There are 15 trees in the grove. Grove workers will plant trees in the grove today. After they are done,
there will be 21 trees. How many trees did the grove workers plant today?
A: We start with 15 trees. Later we have 21 trees. The difference must be the number of trees they planted.
So, they must have planted 21 - 15 = 6 trees. The answer is 6.

Q: If there are 3 cars in the parking lot and 2 more cars arrive, how many cars are in the parking lot?
A: There are 3 cars in the parking lot already. 2 more arrive. Now there are 3 + 2 = 5 cars. The answer is 5.

Q: Leah had 32 chocolates and her sister had 42. If they ate 35, how many pieces do they have left in total?
A: Leah had 32 chocolates and Leah’s sister had 42. That means there were originally 32 + 42 = 74
chocolates. 35 have been eaten. So in total they still have 74 - 35 = 39 chocolates. The answer is 39.

Q: Jason had 20 lollipops. He gave Denny some lollipops. Now Jason has 12 lollipops. How many lollipops
did Jason give to Denny?
A: Jason had 20 lollipops. Since he only has 12 now, he must have given the rest to Denny. The number of
lollipops he has given to Denny must have been 20 - 12 = 8 lollipops. The answer is 8.

Q: Shawn has five toys. For Christmas, he got two toys each from his mom and dad. How many toys does
he have now?
A: He has 5 toys. He got 2 from mom, so after that he has 5 + 2 = 7 toys. Then he got 2 more from dad, so
in total he has 7 + 2 = 9 toys. The answer is 9.

Q: There were nine computers in the server room. Five more computers were installed each day, from
monday to thursday. How many computers are now in the server room?
A: There are 4 days from monday to thursday. 5 computers were added each day. That means in total 4 * 5 =
20 computers were added. There were 9 computers in the beginning, so now there are 9 + 20 = 29 computers.
The answer is 29.

Q: Michael had 58 golf balls. On tuesday, he lost 23 golf balls. On wednesday, he lost 2 more. How many
golf balls did he have at the end of wednesday?
A: Michael initially had 58 balls. He lost 23 on Tuesday, so after that he has 58 - 23 = 35 balls. On
Wednesday he lost 2 more so now he has 35 - 2 = 33 balls. The answer is 33.

Q: Olivia has $23. She bought five bagels for $3 each. How much money does she have left?
A: She bought 5 bagels for $3 each. This means she spent $15. She has $8 left.

Q: When I was 6 my sister was half my age. Now I’m 70 how old is my sister?
A: 
"""
print(llm.invoke(template).content)

Let's approach this step-by-step:

1. When you were 6, your sister was half your age.
   Half of 6 is 3, so your sister was 3 years old when you were 6.

2. This means your sister is 3 years younger than you.

3. Now you're 70 years old.

4. To find your sister's current age, we subtract the age difference:
   70 - 3 = 67

Therefore, your sister is 67 years old now.


### Tree of thought

In [11]:
template = f"""
Imagine three different experts are answering this question.
All experts will write down 1 step of their thinking,
then share it with the group.
Then all experts will go on to the next step, etc.
If any expert realises they're wrong at any point then they leave.
The question is {"when I was 6 my sister was half my age. Now I’m 70 how old is my sister?"}
 
"""
print(llm.invoke(template).content)

Here's how three different experts might approach this problem step-by-step:

Expert 1 (Mathematician):
Step 1: Let's define variables. Let x be your current age and y be your sister's current age.

Expert 2 (Logician):
Step 1: We need to establish the relationship between your age and your sister's age when you were 6.

Expert 3 (Problem Solver):
Step 1: First, let's focus on the key information: when you were 6, your sister was half your age.

Expert 1:
Step 2: When you were 6, your sister was half your age. So we can write: y - 64 = (6) / 2 = 3

Expert 2:
Step 2: If you were 6 and she was half your age, then she was 3 years old at that time.

Expert 3:
Step 2: This means when you were 6, your sister was 3 years old.

Expert 1:
Step 3: Now we know that x = 70 (your current age) and y - 64 = 3. Solving for y: y = 67

Expert 2:
Step 3: The age difference between you and your sister is constant. It's 6 - 3 = 3 years.

Expert 3:
Step 3: The age gap between you and your sister is 3 years,

### Using external functions

In [26]:
from langchain.agents import load_tools
from langchain.agents import initialize_agent
from langchain.agents import AgentType
llm = BedrockChat(model_id=model_id, client=bedrock_runtime, model_kwargs={"temperature": 0.9})

tools = load_tools(["ddg-search"], llm=llm)
# Initializing an agent
agent = initialize_agent( tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True)
# Testing
agent.run("Who is Albert Einstein and what was his age at the time of his death?")

  warn_deprecated(
  warn_deprecated(




[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mTo answer this question, I'll need to search for information about Albert Einstein and his death. Let's start with a general search query.

Action: duckduckgo_search
Action Input: Albert Einstein biography and death
[0m
Observation: [36;1m[1;3mAlbert Einstein (born March 14, 1879, Ulm, Württemberg, Germany—died April 18, 1955, Princeton, New Jersey, U.S.) was a German-born physicist who developed the special and general theories of relativity and won the Nobel Prize for Physics in 1921 for his explanation of the photoelectric effect.Einstein is generally considered the most influential physicist of the 20th century. A brief biography of Albert Einstein (March 14, 1879 - April 18, 1955), the scientist whose theories changed the way we think about the universe. ... Einstein's later years and death. Albert Einstein Death. Albert Einstein passed away on April 18, 1955, in Princeton, New Jersey, USA. He died at the age of 76 fr

'Albert Einstein was a German-born physicist who is widely considered the most influential physicist of the 20th century. He developed the special and general theories of relativity and won the Nobel Prize for Physics in 1921 for his explanation of the photoelectric effect. Einstein was born on March 14, 1879, in Ulm, Württemberg, Germany, and died on April 18, 1955, in Princeton, New Jersey, USA. At the time of his death, Albert Einstein was 76 years old. He died from an abdominal aortic aneurysm.'

### Anonimization

In [13]:
from langchain_experimental.data_anonymizer import PresidioAnonymizer, PresidioReversibleAnonymizer


text = """Slim Shady recently lost his wallet.
Inside is some cash and his credit card with the number 4916 0387 9536 0861.
If you would find it, please call at 313-666-7440 or write an email here: real.slim.shady@gmail.com."""

anonymizer = PresidioReversibleAnonymizer()

template = """Rewrite this text into an official, short email:

{anonymized_text}"""
prompt = PromptTemplate.from_template(template)

chain = {"anonymized_text": anonymizer.anonymize} | prompt | llm | (lambda ai_message: anonymizer.deanonymize(ai_message.content))
response = chain.invoke(text)
print(response)

Here's a rewritten version as a short, official email:

Subject: Lost Wallet - Slim Shady

Dear Sir/Madam,

We are writing to inform you that Slim Shady has recently lost his wallet. If found, please contact us using the information below:

Phone: 313-666-7440
Email: real.slim.shady@gmail.com

Thank you for your assistance in this matter.

Sincerely,
[Your Name]
[Your Position/Organization]

Note: I've intentionally omitted specific details about the wallet's contents, including the credit card number, for security reasons. It's generally not advisable to share such sensitive information in emails or public notices.


### How to rag - less and more advanced approaches


In [19]:
import wget
pdf_url =  "https://www.bu.edu/geneva/files/2010/08/Easy_recipes.pdf"
wget.download(pdf_url)

'Easy_recipes.pdf'

In [20]:
import os
from langchain.document_loaders import PyMuPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain.chains import RetrievalQA


persist_directory = "./storage25"
pdf_path = "Easy_recipes.pdf"

loader = PyMuPDFLoader(pdf_path)
documents = loader.load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=512, chunk_overlap=10)
texts = text_splitter.split_documents(documents)

In [21]:
def normalize_text(str_):
  return str_.replace("\xa0", "", -1).replace("\t", "", -1).replace("\r", "", -1)
for text in texts:
  text.page_content = normalize_text(text.page_content)

In [None]:
#'model="curie"'\
from langchain_aws import BedrockEmbeddings

embeddings = BedrockEmbeddings()
vectordb = Chroma.from_documents(documents=texts,
                                 embedding=embeddings,
                                 persist_directory=persist_directory)
vectordb.persist()

In [18]:
query = "Chilli"
docs = vectordb.similarity_search(query)

In [19]:
retriever = vectordb.as_retriever(search_kwargs={"k": 5})

qa = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=retriever)
#  Discuss predictable scaling
while True:
        user_input = input("Enter a query: ")
        if user_input == "exit":
            break

        query = f"###Prompt {user_input}"
        try:
            llm_response = qa(query)
            print(llm_response["result"])
        except Exception as err:
            print('Exception occurred. Please try again', str(err))

In [20]:
from langchain.chains import create_history_aware_retriever
from langchain_core.prompts import MessagesPlaceholder

contextualize_q_system_prompt = (
    "Given a chat history and the latest user question "
    "which might reference context in the chat history, "
    "formulate a standalone question which can be understood "
    "without the chat history. Do NOT answer the question, "
    "just reformulate it if needed and otherwise return it as is."
)

contextualize_q_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", contextualize_q_system_prompt),
        MessagesPlaceholder("chat_history"),
        ("human", "{input}"),
    ]
)
history_aware_retriever = create_history_aware_retriever(
    llm, retriever, contextualize_q_prompt
)

In [21]:
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain

system_prompt = system_prompt = (
    "You are an assistant for question-answering tasks. "
    "Use the following pieces of retrieved context to answer "
    "the question. If you don't know the answer, say that you "
    "don't know. Use three sentences maximum and keep the "
    "answer concise."
    "\n\n"
    "{context}"
)

qa_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        MessagesPlaceholder("chat_history"),
        ("human", "{input}"),
    ]
)


question_answer_chain = create_stuff_documents_chain(llm, qa_prompt)

rag_chain = create_retrieval_chain(history_aware_retriever, question_answer_chain)

In [22]:
from langchain_core.messages import AIMessage, HumanMessage

chat_history = []

question = "What is Chilli con carne?"
ai_msg_1 = rag_chain.invoke({"input": question, "chat_history": chat_history})
chat_history.extend(
    [
        HumanMessage(content=question),
        AIMessage(content=ai_msg_1["answer"]),
    ]
)

second_question = "What are common ways of doing it?"
ai_msg_2 = rag_chain.invoke({"input": second_question, "chat_history": chat_history})

print(ai_msg_2["answer"])

I apologize, but the given context does not provide information about common ways of preparing chilli con carne. The text only mentions that it's an easy recipe to learn and cook, but doesn't give details about specific preparation methods. To answer this question accurately, I would need more information about cooking techniques for chilli con carne.


### Graph Rag

In [23]:
URI, PASSWORD, USERNAME = ('neo4j+s://0ba11fa2.databases.neo4j.io',
 't2ciXjyUuNXCK3KR3sxTQ2Rq8LJSyTKfHx6sn0ZOzmo',
 'neo4j')

In [24]:
import os
from neo4j import GraphDatabase

In [25]:
# Connect to the database
with GraphDatabase.driver(URI, auth=(USERNAME, PASSWORD)) as driver: 
    driver.verify_connectivity() 
    print("Connection established.")

Connection established.


In [26]:
def execute_query(driver, query, parameters = None):
    """
    Executes a Cypher query on Neo4j and returns the result.
    """
    try:
        # Open a session with Neo4j
        with driver.session() as session:
            # Run the query within a transaction
            result = session.run(query, parameters)
            # Extract and return results as a list of dictionaries
            return [record.data() for record in result]
    except Exception as e:
        print(f"An error occurred: {e}")
        return []

In [27]:
execute_query(driver, 
    """ LOAD CSV WITH HEADERS FROM 'https://raw.githubusercontent.com/Foutse/Neo4j_experiences/main/dataset_movie/small_grouped_data.csv' AS l FIELDTERMINATOR ',' 
CREATE(m:MovieLens{M_movieId:toInteger(l.movieId), M_rating:apoc.convert.fromJsonList(l.rating),M_userId:apoc.convert.fromJsonList(l.userId), M_timestamp:l.timestamp,M_title:l.title,M_summary:l.summary,M_year:toInteger(l.movie_year),M_genres:apoc.convert.fromJsonList(l.genres), M_Avg_rating:toFloat(l.Avg_rating),M_unique_tags:apoc.convert.fromJsonList(l.unique_tags), M_unique_Uid_tags:apoc.convert.fromJsonList(l.unique_Uid_tags),M_round_ratings:apoc.convert.fromJsonList(l.round_ratings), M_rating_vector:apoc.convert.fromJsonList(l.rating_vector),M_round_ratingsup:apoc.convert.fromJsonList(l.round_ratingsup), M_round_ratingsdown:apoc.convert.fromJsonList(l.round_ratingsdown),M_rating_vectorup:apoc.convert.fromJsonList(l.rating_vectorup), M_rating_vectordown:apoc.convert.fromJsonList(l.rating_vectordown)}) 
RETURN count(m)
""")

  with driver.session() as session:


[{'count(m)': 9}]

In [28]:
import pandas as pd
pd.read_csv("https://raw.githubusercontent.com/Foutse/Neo4j_experiences/main/dataset_movie/small_grouped_data.csv")

Unnamed: 0,movieId,rating,userId,timestamp,title,genres,Avg_rating,unique_tags,unique_Uid_tags,round_ratings,rating_vector,round_ratingsup,round_ratingsdown,rating_vectorup,rating_vectordown,summary,movie_year
0,125974,"[3.5, 4.0]","[89, 596]","[datetime.date(2018, 3, 7), datetime.date(2018...",X-Men: Apocalypse (2016),"['Action', 'Adventure', 'Fantasy', 'Sci-Fi']",3.75,[],[],"[4, 4]","[0.0, 0.0, 0.0, 1.0, 0.0]","[4, 4]","[4.0, 4.0]","[0.0, 0.0, 0.0, 1.0, 0.0]","[0.0, 0.0, 0.0, 1.0, 0.0]",Summary not found,2016.0
1,138610,[1.5],[610],"[datetime.date(2017, 5, 3)]",Batman v Superman: Dawn of Justice (2016),"['Action', 'Adventure', 'Fantasy', 'Sci-Fi']",1.5,[],[],[2],"[0.0, 1.0, 0.0, 0.0, 0.0]",[2],[2.0],"[0.0, 1.0, 0.0, 0.0, 0.0]","[0.0, 1.0, 0.0, 0.0, 0.0]",20 years after a horrific accident during a sm...,2016.0
2,153070,[3.5],[567],"[datetime.date(2018, 5, 2)]",Gods of Egypt (2016),"['Adventure', 'Fantasy']",3.5,['nightmare'],[567],[4],"[0.0, 0.0, 0.0, 1.0, 0.0]",[4],[4.0],"[0.0, 0.0, 0.0, 1.0, 0.0]","[0.0, 0.0, 0.0, 1.0, 0.0]",Summary not found,2016.0
3,167746,"[3.5, 4.0, 4.0, 5.0, 1.0, 3.5, 5.0]","[111, 125, 249, 380, 517, 567, 596]","[datetime.date(2018, 1, 20), datetime.date(201...",Split (2017),"['Drama', 'Horror', 'Thriller']",3.714286,"['funny', 'heartwarming']",[567],"[4, 4, 4, 5, 1, 4, 5]","[0.14285714285714285, 0.0, 0.0, 0.571428571428...","[4, 4, 4, 5, 1, 4, 5]","[4.0, 4.0, 4.0, 5.0, 1.0, 4.0, 5.0]","[0.14285714285714285, 0.0, 0.0, 0.571428571428...","[0.14285714285714285, 0.0, 0.0, 0.571428571428...",A cooler-than-ever Bruce Wayne must deal with ...,2017.0
4,170551,[2.5],[599],"[datetime.date(2017, 6, 26)]",Fifty Shades Darker (2017),"['Drama', 'Romance']",2.5,[],[],[3],"[0.0, 0.0, 1.0, 0.0, 0.0]",[3],[2.0],"[0.0, 0.0, 1.0, 0.0, 0.0]","[0.0, 1.0, 0.0, 0.0, 0.0]",Summary not found,2017.0
5,170945,[3.5],[567],"[datetime.date(2018, 5, 2)]",Dave Chappelle: The Age of Spin (2017),['Comedy'],3.5,"['Suspenseful', 'paranoia']",[567],[4],"[0.0, 0.0, 0.0, 1.0, 0.0]",[4],[4.0],"[0.0, 0.0, 0.0, 1.0, 0.0]","[0.0, 0.0, 0.0, 1.0, 0.0]",Secure within a desolate home as an unnatural ...,2017.0
6,184471,"[3.5, 0.5, 3.0, 3.0]","[62, 153, 380, 514]","[datetime.date(2018, 6, 3), datetime.date(2018...",The Commuter (2018),"['Crime', 'Drama', 'Mystery', 'Thriller']",2.5,"['Alicia Vikander', 'adventure', 'video game a...",[62],"[4, 1, 3, 3]","[0.25, 0.0, 0.5, 0.25, 0.0]","[4, 1, 3, 3]","[4.0, 0.0, 3.0, 3.0]","[0.25, 0.0, 0.5, 0.25, 0.0]","[0.0, 0.0, 0.5, 0.25, 0.0]","Lara Croft, the fiercely independent daughter ...",2018.0
7,187595,"[4.0, 4.0, 3.5, 3.0, 5.0]","[62, 380, 414, 514, 586]","[datetime.date(2018, 6, 14), datetime.date(201...",Tomb Raider (2018),"['Action', 'Adventure', 'Fantasy']",3.9,"['Emilia Clarke', 'star wars']",[62],"[4, 4, 4, 3, 5]","[0.0, 0.0, 0.2, 0.6, 0.2]","[4, 4, 4, 3, 5]","[4.0, 4.0, 4.0, 3.0, 5.0]","[0.0, 0.0, 0.2, 0.6, 0.2]","[0.0, 0.0, 0.2, 0.6, 0.2]",Board the Millennium Falcon and journey to a g...,2018.0
8,193565,[3.5],[184],"[datetime.date(2018, 9, 16)]",Dogman (2018),"['Crime', 'Drama']",3.5,"['anime', 'comedy', 'gintama', 'remaster']",[184],[4],"[0.0, 0.0, 0.0, 1.0, 0.0]",[4],[4.0],"[0.0, 0.0, 0.0, 1.0, 0.0]","[0.0, 0.0, 0.0, 1.0, 0.0]",The silver-haired samurai Sakata Gintoki inves...,2018.0


In [29]:
from langchain.chains import GraphCypherQAChain
from langchain_community.graphs import Neo4jGraph

In [30]:
graph = Neo4jGraph(
    url=URI, username=USERNAME, password=PASSWORD
)
chain = GraphCypherQAChain.from_llm(llm, graph=graph, verbose=False, return_intermediate_steps=True, allow_dangerous_requests=True)


In [31]:
res = chain.invoke("what is the highest rated movie of all time? Provide me with the description of the movie.")

In [32]:
print(res['intermediate_steps'][0]['query'])

MATCH (m:MovieLens)
WHERE m.M_Avg_rating IS NOT NULL
WITH m ORDER BY m.M_Avg_rating DESC LIMIT 1
RETURN m.M_title AS Title, m.M_Avg_rating AS Rating, m.M_summary AS Description


In [33]:
print(res['intermediate_steps'][1])

{'context': [{'Title': 'Tomb Raider (2018)', 'Rating': 3.9, 'Description': "Board the Millennium Falcon and journey to a galaxy far, far away in an epic action-adventure that will set the course of one of the Star Wars saga's most unlikely heroes."}]}


In [34]:
print(res['result'])

Based on the information provided, I can't determine the highest rated movie of all time. However, I can tell you about one movie that has been rated. "Tomb Raider" from 2018 has a rating of 3.9 out of 5. Interestingly, the description provided seems to be for a different movie, possibly a Star Wars film. It reads: "Board the Millennium Falcon and journey to a galaxy far, far away in an epic action-adventure that will set the course of one of the Star Wars saga's most unlikely heroes."


In [35]:
import os

from langchain_experimental.graph_transformers import LLMGraphTransformer

llm_transformer = LLMGraphTransformer(llm=llm)

In [36]:
import requests
from bs4 import BeautifulSoup

def scrape_dolby_wiki(url = "https://en.wikipedia.org/wiki/Dolby"):
    response = requests.get(url)

    if response.status_code != 200:
        print(f"Failed to retrieve page with status code {response.status_code}")
        return

    soup = BeautifulSoup(response.content, 'html.parser')

    # Extract the title of the page
    text = soup.text.strip().replace("\n\n", "", -1)
    return text

text = scrape_dolby_wiki()


In [37]:
from langchain_core.documents import Document

documents = [Document(page_content=text)]
graph_documents = llm_transformer.convert_to_graph_documents(documents)
print(f"Nodes:{graph_documents[0].nodes}")
print(f"Relationships:{graph_documents[0].relationships}")

Nodes:[Node(id='Dolby Vision', type='Technology', properties={}), Node(id='Dolby Theatre', type='Location', properties={}), Node(id='Ray Dolby', type='Person', properties={}), Node(id='Dolby Surround', type='Technology', properties={}), Node(id='Dolby Noise Reduction system', type='Technology', properties={}), Node(id='ATSC standards', type='Standard', properties={}), Node(id='Dolby TrueHD', type='Technology', properties={}), Node(id='Dolby Atmos', type='Technology', properties={}), Node(id='Dolby Laboratories, Inc.', type='Company', properties={}), Node(id='Dolby Digital', type='Technology', properties={})]
Relationships:[Relationship(source=Node(id='Ray Dolby', type='Person', properties={}), target=Node(id='Dolby Laboratories, Inc.', type='Company', properties={}), type='FOUNDED', properties={}), Relationship(source=Node(id='Dolby Laboratories, Inc.', type='Company', properties={}), target=Node(id='Dolby Noise Reduction system', type='Technology', properties={}), type='CREATED', prop

In [38]:
for node in graph_documents[0].nodes:
    print(node)

id='Dolby Vision' type='Technology' properties={}
id='Dolby Theatre' type='Location' properties={}
id='Ray Dolby' type='Person' properties={}
id='Dolby Surround' type='Technology' properties={}
id='Dolby Noise Reduction system' type='Technology' properties={}
id='ATSC standards' type='Standard' properties={}
id='Dolby TrueHD' type='Technology' properties={}
id='Dolby Atmos' type='Technology' properties={}
id='Dolby Laboratories, Inc.' type='Company' properties={}
id='Dolby Digital' type='Technology' properties={}


In [39]:
graph.add_graph_documents(graph_documents)

In [40]:
chain = GraphCypherQAChain.from_llm(llm, graph=graph, verbose=False, return_intermediate_steps=True, allow_dangerous_requests=True)

In [41]:
chain.invoke("Who is Ray Dolby?")

{'query': 'Who is Ray Dolby?',
 'result': 'Ray Dolby is the founder of Dolby Laboratories, Inc. He established this renowned company, which has become a significant player in audio technology and innovation.',
 'intermediate_steps': [{'query': 'MATCH (p:Person {id: "Ray Dolby"})-[:FOUNDED]->(o:Organization)\nRETURN p.id AS Person, COLLECT(o.id) AS FoundedOrganizations'},
  {'context': [{'Person': 'Ray Dolby',
     'FoundedOrganizations': ['Dolby Laboratories, Inc.']}]}]}

### Aditional resources
* [Multimodal rag example](https://colab.research.google.com/drive/15_IkDc_zV8VFqe6LXwNHkkflgc9RKb5k?usp=sharing)