# Part 1: Basics of GPT and LLM

**-----------------------------------------------------------------------------------------------------------------**

*In this lecture we are going to explore:*

1. What are GPT models?
2. How GPT models Works?
3. Python implementation of LangChain.

**-----------------------------------------------------------------------------------------------------------------**

![1 GPT 1.png](attachment:02b91b3c-4bc1-41f1-ad2e-5acaf43bcd6e.png)

* A brief illustration for the technical evolution of GPT-series models.

* Generative pretrained transformers (GPTs) are a family of large language models (LLMs) based on a transformer deep learning architecture.
* Developed by OpenAI, these foundation models power ChatGPT and other generative AI applications capable of simulating human-created output.

* **Why is GPT important?**
* GPT models have accelerated generative AI development thanks to their transformer architecture, a type of neural network introduced in 2017 in the Google Brain paper Attention Is All You Need2.
* Transformer models including GPT and BERT have powered many notable developments in generative AI since then, with OpenAI’s ChatGPT chatbot taking center stage.

![1 GPT 2.png](attachment:c2bdffd5-04b7-4c17-a573-ae8c83cd5fdf.png)

* An illustration of a typical data preprocessing pipeline for pre-training large language models

* Large language models, also known as LLMs, are very large deep learning models that are pre-trained on vast amounts of data.
* The underlying transformer is a set of neural networks that consist of an encoder and a decoder with self-attention capabilities.
* The encoder and decoder extract meanings from a sequence of text and understand the relationships between words and phrases in it.

* **Three common learning models exist:**
  
1. Zero-shot learning; Base LLMs can respond to a broad range of requests without explicit training, often through prompts, although answer accuracy varies.
2. Few-shot learning: By providing a few relevant training examples, base model performance significantly improves in that specific area.
3. Fine-tuning: This is an extension of few-shot learning in that data scientists train a base model to adjust its parameters with additional data relevant to the specific application.

![1 GPT 3.png](attachment:5dbe3c27-2772-422e-8849-41164c6909a6.png)

# How do LLMs work?
* A simplified version of the LLM training process

![1 GPT 4.png](attachment:33e08c6c-3899-4b12-b0b2-39614dd7a4bc.png)

References: https://www.databricks.com/glossary/large-language-models-llm

# Part 2: Setting up the API

In [None]:
# pip install openai==0.28

In [None]:
import openai


In [None]:
api_key="your secret key here"

In [None]:
open.api_key=api_key

In [None]:
# Make payment or check from here
# https://platform.openai.com/usage
# https://platform.openai.com/settings/organization/billing/overview



## Generating Text

In [None]:
import os

In [None]:
# Set up the OpenAI client
client = openai.OpenAI(api_key=api_key)
  # Fetch API key from env

In [None]:
# Temprature reference: https://community.openai.com/t/cheat-sheet-mastering-temperature-and-top-p-in-chatgpt-api/172683

In [None]:
def generate_text(prompt):
    response = client.chat.completions.create(  # Use chat.completions.create instead of completions.create
        model="gpt-3.5-turbo",
        messages=[{"role": "user", "content": prompt}],  # Use the correct message format
        max_tokens=50,
        temperature=0.7
    )
    return response.choices[0].message.content.strip()

In [None]:
# Teleperformnace Resource: https://en.wikipedia.org/wiki/Teleperformance

In [None]:
prompt ="Tell me about Teleperformance Private Company"
generated_text = generate_text(prompt)
print(generated_text)




Teleperformance is a global outsourcing company that offers customer service, technical support, sales, and other business process outsourcing services to companies around the world. The company was founded in 1978 in Paris, France, and has since grown to become one of


In [None]:
# you must use the messages format when working with gpt-3.5-turbo or gpt-4,
# because these models are designed for chat-based interactions. The messages parameter is required and cannot be omitted.

In [None]:
# prompt ="Write a short poem about AI"


## Customizing the Output

In [None]:
def generate_text(prompt, max_tokens, temperature):
    response = client.chat.completions.create(        # Use chat.completions.create instead of completions.create
        model="gpt-3.5-turbo",
        messages=[{"role": "user", "content": prompt}],  # Use the correct message format
        max_tokens=max_tokens,
        temperature=temperature
    )
    return response.choices[0].message.content.strip()






In [None]:
generated_text = generate_text(prompt, 50, 0)
print(prompt, generated_text)

Tell me about Teleperformance Private Company Teleperformance is a global outsourcing company that provides customer service and technical support to businesses around the world. The company was founded in 1978 in Paris, France, and has since grown to become one of the largest outsourcing companies in the world.

Tele


In [None]:
generated_text = generate_text(prompt, 500, 1)
print(prompt, generated_text)

Tell me about Teleperformance Private Company Teleperformance is a private company that is a global leader in customer experience management. The company was founded in 1978 and is headquartered in Paris, France. Teleperformance provides a wide range of outsourcing services, including customer care, technical support, and sales support.

The company has a presence in over 80 countries and operates in more than 160 languages and dialects. Teleperformance serves a variety of industries, including telecommunications, financial services, healthcare, retail, and technology. 

Teleperformance prides itself on its commitment to excellence and innovation in customer service. The company has been recognized with numerous awards for its work in the industry, including being named a Leader in the 2021 Gartner Magic Quadrant for Customer Service BPO.

Overall, Teleperformance is a trusted partner for businesses looking to improve their customer experience and drive growth through outsourcing servi

In [None]:
# It makes sense where the temperature is one.

# It will be more random.

# But because we've limited the number of tokens, it's quite hard to see that.

# So let's up our number of max tokens to 50.

# So when we allow the max tokens as 50 and the temperature of zero, we get the following response.

# Once upon a time, there was a little girl who was born with a very special gift.

## Part 3: Summarising Text

In [None]:
# we're going to learn how to create a function that can be given a large chunk of text and summarize

# it by picking out the key words.

# Because the model is able to understand language, it is then able to pick out the most important words

# and phrases for us.

# We create our function and provide messages to the model with instructions and examples of how we want

# the model to respond.

# There are three roles that can be used for messages.

# The first messages are system messages.

# These messages provide instructions to the model on what you want it to do and how it should respond

# and behave right.

# The second type of message is a user message.

# This is an example of a user input.

# So in this case the user input would be the block of text we want to summarize.

# The third type of message is an assistant message.

# This is an example of the correct output to the user input.

# So here we have key words that have been pulled out from the block of text provided by the user.

# These assistant messages serve as an example of how the model should respond.

In [None]:

# Set up the OpenAI client
client = openai.OpenAI(api_key=api_key)  # Fetch API key from env

def text_summarizer(prompt):
    response = client.chat.completions.create(  # Updated function call
        model="gpt-3.5-turbo",
        messages=[
              {
          "role": "system",
          "content": "You will be provided with a block of text, and your task is to extract a list of keywords from it."
        },
        {
          "role": "user",
          "content": "A flying saucer seen by a guest house, a 7ft alien-like figure coming out of a hedge and a \"cigar-shaped\" UFO near a school yard.\n\nThese are just some of the 450 reported extraterrestrial encounters from one of the UK's largest mass sightings in a remote Welsh village.\n\nThe village of Broad Haven has since been described as the \"Bermuda Triangle\" of mysterious craft sightings and sightings of strange beings.\n\nResidents who reported these encounters across a single year in the late seventies have now told their story to the new Netflix documentary series 'Encounters', made by Steven Spielberg's production company.\n\nIt all happened back in 1977, when the Cold War was at its height and Star Wars and Close Encounters of the Third Kind - Spielberg's first science fiction blockbuster - dominated the box office."
        },
        {
          "role": "assistant",
          "content": "flying saucer, guest house, 7ft alien-like figure, hedge, cigar-shaped UFO, school yard, extraterrestrial encounters, UK, mass sightings, remote Welsh village, Broad Haven, Bermuda Triangle, mysterious craft sightings, strange beings, residents, single year, late seventies, Netflix documentary series, Steven Spielberg, production company, 1977, Cold War, Star Wars, Close Encounters of the Third Kind, science fiction blockbuster, box office."
        },
        {
          "role": "user",
          "content": "Each April, in the village of Maeliya in northwest Sri Lanka, Pinchal Weldurelage Siriwardene gathers his community under the shade of a large banyan tree. The tree overlooks a human-made body of water called a wewa – meaning reservoir or \"tank\" in Sinhala. The wewa stretches out besides the village's rice paddies for 175-acres (708,200 sq m) and is filled with the rainwater of preceding months.    \n\nSiriwardene, the 76-year-old secretary of the village's agrarian committee, has a tightly-guarded ritual to perform. By boiling coconut milk on an open hearth beside the tank, he will seek blessings for a prosperous harvest from the deities residing in the tree. \"It's only after that we open the sluice gate to water the rice fields,\" he told me when I visited on a scorching mid-April afternoon.\n\nBy releasing water into irrigation canals below, the tank supports the rice crop during the dry months before the rains arrive. For nearly two millennia, lake-like water bodies such as this have helped generations of farmers cultivate their fields. An old Sinhala phrase, \"wewai dagabai gamai pansalai\", even reflects the technology's centrality to village life; meaning \"tank, pagoda, village and temple\"."
        },
        {
          "role": "assistant",
          "content": "April, Maeliya, northwest Sri Lanka, Pinchal Weldurelage Siriwardene, banyan tree, wewa, reservoir, tank, Sinhala, rice paddies, 175-acres, 708,200 sq m, rainwater, agrarian committee, coconut milk, open hearth, blessings, prosperous harvest, deities, sluice gate, rice fields, irrigation canals, dry months, rains, lake-like water bodies, farmers, cultivate, Sinhala phrase, technology, village life, pagoda, temple."
        },
        {
          "role": "user",
          "content": prompt
        }
        ],
        temperature=0.5,
        max_tokens=256
    )
    return response.choices[0].message.content.strip()






In [None]:
# Example usage
prompt = """Teleperformance secured multiple United Kingdom government and NHS contracts to operate support services for the public during the COVID-19 pandemic. In 2020, Teleperformance launched Teleperformance Cloud Campus in Portugal, with mobile cloud-enabled workstations for virtual onboarding, training, and employee meetings. There are currently cloud campus hubs in the Philippines and Mexico."""
summary = text_summarizer(prompt)
print(summary)

Teleperformance, United Kingdom, government contracts, NHS contracts, support services, COVID-19 pandemic, 2020, Teleperformance Cloud Campus, Portugal, mobile cloud-enabled workstations, virtual onboarding, training, employee meetings, cloud campus hubs, Philippines, Mexico.


In [None]:
prompt = "Teleperformance employs content moderators for platforms including TikTok, with moderators in Tunisia and Colombia, reviewing explicit content to remove it from the platform. Its Trust and Safety branch and moderation program were launched in 2019"
print(prompt)

In [None]:
text_summarizer(prompt)
# When we print this generated text out, we can see that it's the key words from our piece of text above.

# In this lesson, we've seen how we're able to provide messages to our model in order to get it to work

# with different tasks.

'Teleperformance, United Kingdom, government contracts, NHS, support services, COVID-19 pandemic, 2020, Teleperformance Cloud Campus, Portugal, mobile cloud-enabled workstations, virtual onboarding, training, employee meetings, cloud campus hubs, Philippines, Mexico.'

## Poetic Chatbot

In [None]:
# In the poetic_chatbot function, the messages parameter in openai.ChatCompletion.create() is a list that defines the conversation history.

# Each message in the list has a role ("system", "user", or "assistant") and a content field that contains the actual text.

# Message Breakdown:
# System Message ("role": "system")

# "content": "You are a poetic chatbot."
# This sets the AI’s behavior, instructing it to respond poetically.
# User Messages ("role": "user")

# These represent the questions asked by the user.
# Example: "content": "When was Google founded?"
# Assistant Messages ("role": "assistant")

# These are the AI’s poetic responses.
# Example: "content": "In the late '90s, a spark did ignite..."
# Dynamic User Prompt ("role": "user")

# The final message dynamically inserts the prompt parameter, meaning whatever input the user provides will be added to the conversation.

In [None]:
def poetic_chatbot(prompt):
    response = client.chat.completions.create(
        model = "gpt-3.5-turbo",
        messages = [
            {
                "role": "system",
                "content": "You are a poetic chatbot."
            },
            {
                "role": "user",
                "content": "When was Google founded?"
            },
            {
                "role": "assistant",
                "content": "In the late '90s, a spark did ignite, Google emerged, a radiant light. By Larry and Sergey, in '98, it was born, a search engine new, on the web it was sworn."
            },
            {
                "role": "user",
                "content": "Which country has the youngest president?"
            },
            {
                "role": "assistant",
                "content": "Ah, the pursuit of youth in politics, a theme we explore. In Austria, Sebastian Kurz did implore, at the age of 31, his journey did begin, leading with vigor, in a world filled with din."
            },
            {
                "role": "user",
                "content": prompt
            }
        ],
        temperature = 1,
        max_tokens=256
    )
    return response.choices[0].message.content.strip()

In [None]:
prompt = "How Teleperformance deals with employees"
ans=poetic_chatbot(prompt)

In [None]:
ans

'In the realm of customer care, Teleperformance takes great heed, nurturing employees, their success indeed. Training and support they provide with care, fostering growth, ensuring employees fare. In a culture of empathy and respect, they lead, a team united, their greatness does seed.'

## Part 4: Langchain

In [None]:
# the data it's been trained on.

# Let's see what happens when we ask a more specific question to this model.

# So we can create a new prompt that asks, what is the next course to be uploaded on the 365 data Science

# platform?

# As.

# You can see that the model has given a response, but there's no actual answer in this response.

# The model isn't able to tell us which course will be uploaded next onto the 365 data science platform,

# because it just doesn't have this data available.

# GPT models have been trained on a lot of data, but this data only goes as far as 2021, so they might

# not have the most up to date information.

# But there is a solution.

# Using Lang Chain, we are able to import our own data and have this read by our language models.

# Our language models can then reference this when creating a response using the Lang chain framework.

# So let's have a look at what Lang chain is and what it can be used for.

In [None]:
prompt = "What was the CSAT score achieved for the luxury automotive brand, and how did it compare to the target?"
poetic_chatbot(prompt)

'In the realm of luxury, the CSAT score did rise, surpassing their target, reaching for the skies. Customer satisfaction high, a sign of great esteem, in the world of luxury, it was but a dream.'

In [None]:
# pip install langchain_community

In [None]:
from langchain.document_loaders import WebBaseLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import FAISS
from langchain.memory import ConversationBufferMemory
from langchain.llms import OpenAI
from langchain.chains import ConversationalRetrievalChain
from langchain.chat_models import ChatOpenAI

USER_AGENT environment variable not set, consider setting it to identify your requests.


In [None]:
url = "https://www.tp.com/industries/automotive/"

In [None]:
# Teleperformnace Resource: https://www.tp.com/industries/automotive/

In [None]:
loader = WebBaseLoader(url)

In [None]:
raw_documents = loader.load()

In [None]:
# raw_documents

In [None]:
# The next step is to take our data and chunk it up into small pieces.

# This is necessary in order to make sure we only pass the smallest, most relevant pieces of text to

# the language model.

# So we'll create our text splitter.

# And then use the recursive character text splitter.

# We can then create our documents.

# So we'll use text splitter dot split documents over our raw documents.

# This will take the text from our URL and break it up into small pieces for us.

In [None]:
text_splitter = RecursiveCharacterTextSplitter()
documents = text_splitter.split_documents(raw_documents)

In [None]:
embeddings = OpenAIEmbeddings(openai_api_key = api_key)

  embeddings = OpenAIEmbeddings(openai_api_key = api_key)


In [None]:
pip install tiktoken

In [None]:
# !pip install faiss-cpu
# FAISS (Facebook AI Similarity Search)

# or, if you're using GPUs:

!pip install faiss-gpu

In [None]:
# pip install --upgrade openai langchain langchain-community faiss-cpu



In [None]:
pip install --upgrade langchain langchain-community openai

In [None]:
from openai import RateLimitError, APIConnectionError, APIError, Timeout


In [None]:
os.environ["OPENAI_API_KEY"] =api_key

In [None]:
# pip install openai==0.27.0


In [None]:


from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings

# Initialize embeddings correctly
embeddings = OpenAIEmbeddings()

# Assuming 'documents' is a list of LangChain Document objects
vectorstore = FAISS.from_documents(documents, embeddings)


In [None]:
# We can now create a memory object which is necessary to track the inputs and outputs, and for the model

# to hold a conversation.

# So to create our memory we use conversation buffer memory.

# We'll give the memory keys as the chat history.

# And specify return messages equals true.

# We can now initialize the conversational retrieval chain.
memory = ConversationBufferMemory(memory_key = "chat_history", return_messages=True)


  memory = ConversationBufferMemory(memory_key = "chat_history", return_messages=True)


In [None]:
# We'll use conversational retrieval chain from LM.

# We then provide our OpenAI API key.

# And specify the temperature of the response.

# If you remember the temperature just controls how random the response is.

# We then pass our vector store as retriever.

# And pass our memory.

# And that's it.

# We're now ready to ask our model questions based on this text we provided from the 365 Data Science

# website.

qa = ConversationalRetrievalChain.from_llm(ChatOpenAI(openai_api_key=api_key,
                                                  model="gpt-3.5-turbo",
                                                  temperature=0),
                                           vectorstore.as_retriever(),
                                           memory=memory)


### Query 1

In [None]:
# Teleperformance Resources: https://www.tp.com/industries/automotive/

In [None]:
# Fact given on website: 97% CSAT WHICH SIGNIFICANTLY EXCEEDED THEIR TARGET OF 80% FOR A LUXURY AUTOMOTIVE BRAND.

In [None]:
query1 = "What was the CSAT score achieved for the luxury automotive brand, and how did it compare to the target?"

In [None]:
result = qa({"question": query1})


  result = qa({"question": query1})


In [None]:
result["answer"]

'The CSAT score achieved for the luxury automotive brand significantly exceeded their target of 80%.'

### Query 2

In [None]:
# Teleperformance Resources: https://www.tp.com/industries/automotive/

# Fact given on website: " Together, we're pushing boundaries, pioneering new technologies and redefining what's possible. Their innovation and support are critical to helping us deliver the world-class vehicles our customers have come to expect. "
# Vice President of Global Purchasing and Supply for a leading US auto manufacture

In [None]:
query2= "Together, we're pushing boundaries, pioneering new technologies and redefining what's possible. Their innovation and support are critical to helping us deliver the world-class vehicles our customers have come to expect., who said this?"

In [None]:
result = qa({"question": query2})


In [None]:
result["answer"]

'The quote "Together, we\'re pushing boundaries, pioneering new technologies and redefining what\'s possible. Their innovation and support are critical to helping us deliver the world-class vehicles our customers have come to expect" was said by the Vice President of Global Purchasing and Supply for a leading US auto manufacturer.'