### GPT For Me: A Langchain Introduction


In [None]:
# pip install langchain
# pip install dotenv

#### Getting Started

In [62]:
#Insert your values here
from dotenv import load_dotenv
import os

DEPLOYMENT_NAME = os.environ["DEPLOYMENT_NAME"]

load_dotenv()

True

#### Temperature 
Temerature in LLMs is a parameter to define the randomness of the generated response. Therefore, the higher the temperature, the more creative the response appears and the lower the response, the more precise the model's generations appear to be.

In [63]:
from langchain.chat_models import AzureChatOpenAI
llm = AzureChatOpenAI(deployment_name=DEPLOYMENT_NAME, 
                  temperature=0.2
                  )


In [64]:
llm.predict("List the last two Microsoft CEOs")

<openai.openai_response.OpenAIResponse object at 0x000001A6D768AF10>


'1. Satya Nadella (2014-present)\n2. Steve Ballmer (2000-2014)'

## Prompt Template

In [37]:
from langchain import PromptTemplate

template = "List the last two {company} CEOS"

prompt = PromptTemplate.from_template(template)


In [34]:
promptVavlue = prompt.format(company="Apple")

In [39]:
llm.predict(promptVavlue)

<openai.openai_response.OpenAIResponse object at 0x000001A6D7274610>


'1. Tim Cook (2011-present)\n2. Steve Jobs (1997-2011)'

What if you have mulitple input values to expect?

In [41]:
new_prompt_template = "List the last {count} {company} CEOs"

new_prompt = PromptTemplate.from_template(new_prompt_template)
# also same as 
# new_prompt = PromptTemplate(template=new_prompt_template,
#                       input_variables=["count", "company"])

new_prompt_value = new_prompt.format(count="three", company="Google")
llm.predict(new_prompt_value)

<openai.openai_response.OpenAIResponse object at 0x000001A6D71EA390>


'1. Sundar Pichai (2015-present)\n2. Larry Page (2011-2015)\n3. Eric Schmidt (2001-2011)'

## Chunking

There are mulitple text splitters available in Langchain. You can chose to split by character, code, or tokens. Langchain also provides a Markdown header splitter.

For this exmaple, we'll use the Recursive Character Text Splitter. It tries to split on them in order until the chunks are small enough. The default list is `["\n\n", "\n", " ", ""]`. This has the effect of trying to keep all paragraphs (and then sentences, and then words) together as long as possible, as those would generically seem to be the strongest semantically related pieces of text.

Overlapping between the chunks also helps to ensure that important features aren't missed at the boundaries of the chunks. 

In [74]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

with open("internal_doc.txt", encoding="utf8") as file:
    internal_doc = file.read()

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 100, 
    chunk_overlap= 20
)

split_texts = text_splitter.create_documents([internal_doc])
print(split_texts[50])
print(split_texts[51])


page_content='Analyze and predict threats to your secrets: Ensure that all secret accesses are analyzed and' metadata={}
page_content='are analyzed and systems are in place that can detect a deviation from normal usage and alert' metadata={}


## Embedding
You can upload to using Microsoft Services, specifically Azure Cognitive Search. It also serves as a vector store that we can later use a retriever for our LLM.

In [None]:
#pip install --index-url=https://pkgs.dev.azure.com/azure-sdk/public/_packaging/azure-sdk-for-python/pypi/simple/ azure-search-documents==11.4.0a20230509004
#pip install azure-search-documents
#pip install azure-identity

In [66]:
AZURE_COGNITIVE_SEARCH_URL = os.environ["AZURE_COGNITIVE_SEARCH_URL"]
AZURE_COGNITIVE_SEARCH_PASSWORD = os.environ["AZURE_COGNITIVE_SEARCH_PASSWORD"]
AZURE_COGNITIVE_SEARCH_INDEX_NAME = os.environ["AZURE_COGNITIVE_SEARCH_INDEX_NAME"]


In [70]:
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores.azuresearch import AzureSearch
import uuid
from azure.core.credentials import AzureKeyCredential
from azure.search.documents import SearchClient


embeddings: OpenAIEmbeddings = OpenAIEmbeddings(model="text-embedding-ada-002", chunk_size=200)
vector_store: AzureSearch = AzureSearch(
    azure_search_endpoint=AZURE_COGNITIVE_SEARCH_URL,
    azure_search_key=AZURE_COGNITIVE_SEARCH_PASSWORD,
    index_name=AZURE_COGNITIVE_SEARCH_INDEX_NAME,
    embedding_function=embeddings.embed_query,
)

docs_to_embed = []
for doc in split_texts:
    idx_fields = {
    "id": str(uuid.uuid1()),
    "content": doc.page_content, 
    }  
    docs_to_embed.append(idx_fields)
search_client = SearchClient(AZURE_COGNITIVE_SEARCH_URL, AZURE_COGNITIVE_SEARCH_INDEX_NAME, AzureKeyCredential(AZURE_COGNITIVE_SEARCH_PASSWORD))
try:
    #upload to Azure Cognitive Store for embedding
    search_client.upload_documents(docs_to_embed)
except Exception as e:
    print(e)


## Chains


In [101]:
from langchain.chains import LLMChain
ceo_chain = LLMChain(llm=llm, prompt=new_prompt, output_key="ceos_list", verbose=True)
ceo_chain.run({
    "count": "three",
    "company": "Netflix"
})



[1m> Entering new  chain...[0m
Prompt after formatting:
[32;1m[1;3mList the last three Netflix CEOs[0m
<openai.openai_response.OpenAIResponse object at 0x000001A6D71388D0>

[1m> Finished chain.[0m


'1. Reed Hastings (2002-present)\n2. Ted Sarandos (co-CEO, 2020-present)\n3. Greg Peters (co-CEO, 2020-present)'

#### Retrieval Chains


In [82]:
from langchain.chains import RetrievalQA
from langchain.retrievers import AzureCognitiveSearchRetriever

retriever = AzureCognitiveSearchRetriever(content_key="content")
qa = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=retriever)

qa.run("tell me about secrets")

got to search function
got here 
response is <urllib3.response.HTTPResponse object at 0x000001A6D983F6A0>
<openai.openai_response.OpenAIResponse object at 0x000001A6D9315650>


'In the context of cybersecurity, secrets refer to sensitive information that is used to authenticate or authorize access to systems, applications, or data. Examples of secrets include passwords, cryptographic keys, tokens, and certificates. Adversaries may attempt to steal secrets in order to gain unauthorized access to systems or data. They may use various techniques such as phishing, social engineering, or exploiting vulnerabilities in software or systems to obtain secrets. It is important to protect secrets by using strong authentication mechanisms, encrypting sensitive data, and implementing access controls to limit who can access secrets. Additionally, it is important to regularly monitor and audit access to secrets to detect and respond to any unauthorized access attempts.'

##### Sequential Chain 

In [102]:
from langchain.chains import SequentialChain
second_template = """Given a list of CEOs, return their age of when they became CEOs.
CEOs: {ceos_list}
"""
ceo_age_prompt = PromptTemplate.from_template(second_template)
ceo_age_chain = LLMChain(llm=llm, prompt=ceo_age_prompt, output_key="answer", verbose=True)
final_chain = SequentialChain(chains=[ceo_chain, ceo_age_chain],
                                    input_variables = ["count", "company"],
                                    verbose=True)

final_chain({
    "count": "two",
    "company": "Apple"
})




[1m> Entering new  chain...[0m


[1m> Entering new  chain...[0m
Prompt after formatting:
[32;1m[1;3mList the last two Apple CEOs[0m
<openai.openai_response.OpenAIResponse object at 0x000001A6D7151490>

[1m> Finished chain.[0m


[1m> Entering new  chain...[0m
Prompt after formatting:
[32;1m[1;3mGiven a list of CEOs, return their age of when they became CEOs.
CEOs: 1. Tim Cook (2011-present)
2. Steve Jobs (1997-2011)
[0m
<openai.openai_response.OpenAIResponse object at 0x000001A6D7218790>

[1m> Finished chain.[0m

[1m> Finished chain.[0m


{'count': 'two',
 'company': 'Apple',
 'answer': '3. Jeff Bezos (1994-present)\n4. Mark Zuckerberg (2004-present)\n5. Sundar Pichai (2015-present)\n\n1. Tim Cook became CEO in 2011 at the age of 50.\n2. Steve Jobs became CEO in 1997 at the age of 42.\n3. Jeff Bezos became CEO in 1994 at the age of 30.\n4. Mark Zuckerberg became CEO in 2004 at the age of 19.\n5. Sundar Pichai became CEO in 2015 at the age of 43.'}

## Agents



In [104]:
from langchain.agents import load_tools
from langchain.agents import initialize_agent
from langchain.agents import AgentType

tools = load_tools(["serpapi", "llm-math"], llm=llm)
agent = initialize_agent(tools=tools, llm=llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True)

agent.run("What new Bing chat feature did Microsoft announce this week?")


ValidationError: 1 validation error for SerpAPIWrapper
__root__
  Did not find serpapi_api_key, please add an environment variable `SERPAPI_API_KEY` which contains it, or pass  `serpapi_api_key` as a named parameter. (type=value_error)