#**Building a Chatbot with a LLM**

In this project, I will build a chatbot powered by a Language Model (LLM) that can answer questions by pulling information from sources like webpages and CSV files. I used LangChain to set up the chatbot, allowing it to read through documents, break them down into smaller chunks, and use those pieces to provide accurate answers.


## Setting up the API

In [None]:
# Install OpenAI library
!pip install openai



In [None]:
import openai
import config
from openai import OpenAI

In [None]:
# Initialize OpenAI client with your own API key
client = OpenAI(api_key=config.api_key)

## Generating Text

In [None]:
def generate_text(prompt):
    """
    Generate a response from GPT model using the provided prompt.

    Parameters:
    prompt (list): A list of dictionaries with 'role' and 'content' for chat completion.

    Returns:
    str: The generated text response.
    """
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        temperature=0, # Lower temperature for deterministic responses
        messages=prompt
        )
    return response.choices[0].message.content.strip()

In [None]:
prompt = [
    {"role": "user", "content": "Who won the world series in 2020?"},
  ]

In [None]:
generate_text(prompt)


"The Los Angeles Dodgers won the World Series in 2020. They defeated the Tampa Bay Rays, clinching the championship in six games. This victory marked the Dodgers' first World Series title since 1988."

## Summarising Text

In [None]:
def text_summarizer(prompt):
  """
    Summarizes text by extracting keywords from a given block of text.

    Parameters:
    prompt (str): The input text to summarize.

    Returns:
    str: The list of extracted keywords.
    """
    response = client.chat.completions.create(
      model="gpt-4o-mini",
      messages=[
        {
          "role": "system",
          "content": "You will be provided with a block of text, and your task is to extract a list of keywords from it."
        },
        {
          "role": "user",
          "content": "A flying saucer seen by a guest house, a 7ft alien-like figure coming out of a hedge and a \"cigar-shaped\" UFO near a school yard.\n\nThese are just some of the 450 reported extraterrestrial encounters from one of the UK's largest mass sightings in a remote Welsh village.\n\nThe village of Broad Haven has since been described as the \"Bermuda Triangle\" of mysterious craft sightings and sightings of strange beings.\n\nResidents who reported these encounters across a single year in the late seventies have now told their story to the new Netflix documentary series 'Encounters', made by Steven Spielberg's production company.\n\nIt all happened back in 1977, when the Cold War was at its height and Star Wars and Close Encounters of the Third Kind - Spielberg's first science fiction blockbuster - dominated the box office."
        },
        {
          "role": "assistant",
          "content": "flying saucer, guest house, 7ft alien-like figure, hedge, cigar-shaped UFO, school yard, extraterrestrial encounters, UK, mass sightings, remote Welsh village, Broad Haven, Bermuda Triangle, mysterious craft sightings, strange beings, residents, single year, late seventies, Netflix documentary series, Steven Spielberg, production company, 1977, Cold War, Star Wars, Close Encounters of the Third Kind, science fiction blockbuster, box office."
        },
        {
          "role": "user",
          "content": "Each April, in the village of Maeliya in northwest Sri Lanka, Pinchal Weldurelage Siriwardene gathers his community under the shade of a large banyan tree. The tree overlooks a human-made body of water called a wewa – meaning reservoir or \"tank\" in Sinhala. The wewa stretches out besides the village's rice paddies for 175-acres (708,200 sq m) and is filled with the rainwater of preceding months.    \n\nSiriwardene, the 76-year-old secretary of the village's agrarian committee, has a tightly-guarded ritual to perform. By boiling coconut milk on an open hearth beside the tank, he will seek blessings for a prosperous harvest from the deities residing in the tree. \"It's only after that we open the sluice gate to water the rice fields,\" he told me when I visited on a scorching mid-April afternoon.\n\nBy releasing water into irrigation canals below, the tank supports the rice crop during the dry months before the rains arrive. For nearly two millennia, lake-like water bodies such as this have helped generations of farmers cultivate their fields. An old Sinhala phrase, \"wewai dagabai gamai pansalai\", even reflects the technology's centrality to village life; meaning \"tank, pagoda, village and temple\"."
        },
        {
          "role": "assistant",
          "content": "April, Maeliya, northwest Sri Lanka, Pinchal Weldurelage Siriwardene, banyan tree, wewa, reservoir, tank, Sinhala, rice paddies, 175-acres, 708,200 sq m, rainwater, agrarian committee, coconut milk, open hearth, blessings, prosperous harvest, deities, sluice gate, rice fields, irrigation canals, dry months, rains, lake-like water bodies, farmers, cultivate, Sinhala phrase, technology, village life, pagoda, temple."
        },
        {
          "role": "user",
          "content": prompt
        }
      ],
      temperature=0.5,
      max_tokens=256
    )
    return response.choices[0].message.content.strip()

In [None]:
prompt = "Master Reef Guide Kirsty Whitman didn't need to tell me twice. Peering down through my snorkel mask in the direction of her pointed finger, I spotted a huge male manta ray trailing a female in perfect sync – an effort to impress a potential mate, exactly as Whitman had described during her animated presentation the previous evening. Having some knowledge of what was unfolding before my eyes on our snorkelling safari made the encounter even more magical as I kicked against the current to admire this intimate undersea ballet for a few precious seconds more."
print(prompt)

Master Reef Guide Kirsty Whitman didn't need to tell me twice. Peering down through my snorkel mask in the direction of her pointed finger, I spotted a huge male manta ray trailing a female in perfect sync – an effort to impress a potential mate, exactly as Whitman had described during her animated presentation the previous evening. Having some knowledge of what was unfolding before my eyes on our snorkelling safari made the encounter even more magical as I kicked against the current to admire this intimate undersea ballet for a few precious seconds more.


In [None]:
text_summarizer(prompt)

'Master Reef Guide, Kirsty Whitman, snorkel mask, manta ray, female, potential mate, animated presentation, snorkelling safari, encounter, magical, current, intimate, undersea ballet, precious seconds.'

## Poetic Chatbot

In [None]:
def poetic_chatbot(prompt):
    response = client.chat.completions.create(
        model = "gpt-4o-mini",
        messages = [
            {
                "role": "system",
                "content": "You are a poetic chatbot."
            },
            {
                "role": "user",
                "content": "When was Google founded?"
            },
            {
                "role": "assistant",
                "content": "In the late '90s, a spark did ignite, Google emerged, a radiant light. By Larry and Sergey, in '98, it was born, a search engine new, on the web it was sworn."
            },
            {
                "role": "user",
                "content": "Which country has the youngest president?"
            },
            {
                "role": "assistant",
                "content": "Ah, the pursuit of youth in politics, a theme we explore. In Austria, Sebastian Kurz did implore, at the age of 31, his journey did begin, leading with vigor, in a world filled with din."
            },
            {
                "role": "user",
                "content": prompt
            }
        ],
        temperature = 1,
        max_tokens=256
    )
    return response.choices[0].message.content.strip()

In [None]:
prompt = "Data Analytics at Langara College"
poetic_chatbot(prompt)

"At Langara, where knowledge takes its flight,  \nData Analytics shines, a beacon of light.  \nCourses designed to sharpen your mind,  \nIn numbers and trends, insights you'll find.  \n\nWith skills in demand, you'll learn to decode,  \nPatterns and stories in data bestowed.  \nFrom programming to statistics, a journey profound,  \nIn this college of learning, bright futures abound."

In [None]:
prompt = "What is most popular course at langara College?"
poetic_chatbot(prompt)

'At Langara College, where knowledge takes flight, The most popular course shines bright, In the realm of arts and sciences combined, The Health and Human Services often aligned. Yet varied are paths, as students explore, Their passions unfurling, seeking knowledge galore.'

## **Langchain**
 *What is Language Chaining*

It's a concept where multiple language models (LLMs) and external data sources, like documents or databases, are connected in a pipeline. The main goal is to combine the power of LLMs with structured data for more interactive and contextual applications.

In [None]:
# Provides community-driven extensions for LangChain
!pip install langchain-community
# A tokenizer for the OpenAI API that helps process text for embedding generation
!pip install tiktoken
# A storage solution for vector embeddings (using FAISS in this case)
!pip install vectorstore
# Provides integration between LangChain and OpenAI’s language models
!pip install langchain-openai
# A library that enables efficient similarity search for embeddings
!pip install faiss-cpu

Collecting faiss-cpu
  Downloading faiss_cpu-1.8.0.post1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.7 kB)
Downloading faiss_cpu-1.8.0.post1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (27.0 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m27.0/27.0 MB[0m [31m50.4 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: faiss-cpu
Successfully installed faiss-cpu-1.8.0.post1


# **Webpage Querying with LangChain:**

In [None]:
from langchain.document_loaders import WebBaseLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import FAISS
from langchain.memory import ConversationBufferMemory
from langchain.llms import OpenAI
from langchain.chains import ConversationalRetrievalChain
from langchain_openai import OpenAI, OpenAIEmbeddings

In [None]:
# Loading a webpage and querying it
url = "https://langara.ca/programs-and-courses/programs/data-analytics/index.html"

In [None]:
loader = WebBaseLoader(url) # Load the webpage
raw_documents = loader.load() # Retrieve raw HTML content
text_splitter = RecursiveCharacterTextSplitter() # Split the document into smaller pieces
documents = text_splitter.split_documents(raw_documents) # Split into smaller chunks
embeddings = OpenAIEmbeddings(openai_api_key = config.api_key) # Convert text to embeddings
vectorstore = FAISS.from_documents(documents, embeddings)  # Store embeddings in FAISS for fast retrieval
memory = ConversationBufferMemory(memory_key = "chat_history", return_messages=True) # Store conversation history

# Define a conversational retrieval chain
qa = ConversationalRetrievalChain.from_llm(OpenAI(openai_api_key=config.api_key, temperature=0), vectorstore.as_retriever(), memory=memory)

In [None]:
query = "Is visualization Taught at langara?"
result = qa({"question": query})
result["answer"]

' The Data Analytics program at Langara offers courses on data visualization, which is the process of presenting data in a visual format such as charts, graphs, and maps. These courses aim to teach students how to effectively communicate their findings and insights from data analysis through visual representations. Some of the topics covered in these courses may include data visualization principles, tools and techniques, and best practices for creating impactful visualizations. Students will also have the opportunity to work with real-life data sets and develop their skills in using industry-standard software applications for data visualization.'

## **Loading a CSV File and Asking Questions**

In [None]:
from langchain_community.document_loaders.csv_loader import CSVLoader
loader = CSVLoader(file_path="HollywoodsMostProfitableStories.csv")

In [None]:
# Loading a CSV file
loader = CSVLoader(file_path="HollywoodsMostProfitableStories.csv")
raw_documents = loader.load()  # Load CSV content
text_splitter = RecursiveCharacterTextSplitter()  # Split CSV content into smaller chunks
documents = text_splitter.split_documents(raw_documents)  # Split into chunks
embeddings = OpenAIEmbeddings(openai_api_key=config.api_key)  # Convert to embeddings
vectorstore = FAISS.from_documents(documents, embeddings)  # Store embeddings in FAISS
memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)  # Store chat history

# Define the conversational retrieval chain
qa = ConversationalRetrievalChain.from_llm(
    OpenAI(openai_api_key=config.api_key, temperature=0.2),
    vectorstore.as_retriever(),  # Use FAISS as a retriever
    memory=memory  # Keep track of the conversation history
)

In [None]:
query = "What studios are mentioned in the dataset?"
result = qa({"question": query})
result["answer"]

' Paramount, Summit, Disney'

### **Summary**

I wanted to create a smart chatbot that could answer questions by pulling information from different sources, like webpages and CSV files. To make this happen, I used a tool called LangChain, which helps the chatbot read through documents, break them down into smaller chunks, and use those chunks to answer specific questions.  

For example, I had the chatbot look at the Langara Data Analytics program webpage and answer questions about the courses they offer. I also used a movie dataset to ask the chatbot about which studios were mentioned. It was really exciting to see the chatbot pull relevant information and give accurate answers.

### **I Learned...**

- How to teach the chatbot to read and make sense of large pieces of text from webpages and documents.
- Why it’s important to organize the information so the chatbot can easily find what it needs to answer your questions.
- How to help the chatbot remember past questions and answers, so it can keep the conversation feeling natural and smooth.
- How to bring together different tools to create a system that can give smart, data-driven answers to specific questions.

### **How the LLM Applies**

The chatbot uses a language model (LLM) to understand your questions and find the best answers. It can read and process information from webpages or CSV files, then pull out the most relevant details to give you an accurate response. The LLM helps the chatbot pick out key information from the data, so it can respond in a helpful way. This project shows how smart chatbots can turn data into useful answers, making them perfect for things like customer support, research, and a lot more.