# Langchain Quick Start
https://python.langchain.com/docs/get_started/quickstart

Create OpenAI api in https://platform.openai.com/
export OPENAI_API_KEY="..."

In [3]:
#enter open ai key in the code
# import os

# os.environ['OPENAI_API_KEY'] = 'xxx'

In [3]:
#enter open ai key in a popup window
import os
import getpass

os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key:')

In [7]:
os.environ['OPENAI_API_KEY']

'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'

In [8]:
from langchain.llms import OpenAI
from langchain.chat_models import ChatOpenAI

llm = OpenAI() #llm = OpenAI(openai_api_key="...")


In [9]:
llm.predict("Hello!")

Retrying langchain.llms.openai.completion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised RateLimitError: You exceeded your current quota, please check your plan and billing details..
Retrying langchain.llms.openai.completion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised RateLimitError: You exceeded your current quota, please check your plan and billing details..


KeyboardInterrupt: 

In [None]:
chat_model = ChatOpenAI()

In [5]:
chat_model.predict("hi, I am learning large language models.")

"That's great! Large language models can be very powerful tools for natural language processing tasks. Is there anything specific you would like to know or discuss about them?"

In [6]:
text = "What is genre of stories"

llm.predict(text)

'?\n\nGenre refers to a type or category of literature, such as romance, fantasy, science fiction, mystery, thriller, horror, and comedy.'

In [7]:
chat_model.predict("What is the genre of a simple story that a mouse saved a lion.")

"The genre of a simple story where a mouse saves a lion could be classified as a fable or a children's story."

In [8]:
chat_model.predict("What is a fable.")

'A fable is a short fictional narrative that typically features animals, plants, or inanimate objects as characters. It often conveys a moral or lesson at the end. Fables have been used throughout history to teach and entertain, and they typically employ anthropomorphic characters (animals with human qualities) to illustrate a moral or ethical principle. The most famous collection of fables is attributed to the ancient Greek storyteller Aesop, whose fables are still widely known and read today.'

In [12]:
chat_model.predict("What's the typical values show in kid's story. For example, self-reliance, rationality")

"In children's stories, some typical values that are often portrayed include:\n\n1. Self-reliance: Encouraging children to believe in themselves and their abilities, teaching them to be independent and have confidence in their decision-making skills.\n\n2. Kindness: Promoting empathy, compassion, and being considerate towards others. Children are often encouraged to help and support one another.\n\n3. Honesty: Teaching the importance of telling the truth and being trustworthy. Stories often highlight the consequences of lying or being deceitful.\n\n4. Perseverance: Instilling the value of persistence and determination, teaching children to never give up even when faced with challenges or setbacks.\n\n5. Respect: Emphasizing the importance of treating others with respect, regardless of differences in age, gender, or background. Children are encouraged to listen, be polite, and consider the feelings of others.\n\n6. Courage: Encouraging children to face their fears and overcome obstacles

Use the predict_messages method to run over a list of messages.

In [9]:
from langchain.schema import HumanMessage

text = "What would be a good company name for a company that makes colorful socks?"
messages = [HumanMessage(content=text)]

llm.predict_messages(messages)

AIMessage(content='\n\nSocktastic!', additional_kwargs={}, example=False)

In [10]:
chat_model.predict_messages(messages)

AIMessage(content='VibrantSox', additional_kwargs={}, example=False)

Prompt templates

In [11]:
from langchain.prompts import PromptTemplate

prompt = PromptTemplate.from_template("What is a good name for a company that makes {product}?")
prompt.format(product="colorful socks")

'What is a good name for a company that makes colorful socks?'

The advantages of using these over raw string formatting are several. You can "partial" out variables - eg you can format only some of the variables at a time. You can compose them together, easily combining different templates into a single prompt.

In [13]:
from langchain.chat_models import ChatOpenAI
from langchain.prompts.chat import (
    ChatPromptTemplate,
    SystemMessagePromptTemplate,
    HumanMessagePromptTemplate,
)
from langchain.chains import LLMChain
from langchain.schema import BaseOutputParser

class CommaSeparatedListOutputParser(BaseOutputParser):
    """Parse the output of an LLM call to a comma-separated list."""


    def parse(self, text: str):
        """Parse the output of an LLM call."""
        return text.strip().split(", ")

template = """You are a helpful assistant who generates comma separated lists.
A user will pass in a category, and you should generate 5 objects in that category in a comma separated list.
ONLY return a comma separated list, and nothing more."""
system_message_prompt = SystemMessagePromptTemplate.from_template(template)
human_template = "{text}"
human_message_prompt = HumanMessagePromptTemplate.from_template(human_template)

chat_prompt = ChatPromptTemplate.from_messages([system_message_prompt, human_message_prompt])
chain = LLMChain(
    llm=ChatOpenAI(),
    prompt=chat_prompt,
    output_parser=CommaSeparatedListOutputParser()
)
chain.run("colors")
# >> ['red', 'blue', 'green', 'yellow', 'orange']

['red', 'blue', 'green', 'yellow', 'purple']

In [14]:
! pip install pypdf

Collecting pypdf
  Downloading pypdf-3.15.1-py3-none-any.whl (271 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m271.0/271.0 kB[0m [31m1.5 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[?25hInstalling collected packages: pypdf
Successfully installed pypdf-3.15.1


https://python.langchain.com/docs/modules/data_connection/document_loaders/pdf

Load PDF using pypdf into array of documents, where each document contains the page content and metadata with page number.

In [15]:
from langchain.document_loaders import PyPDFLoader

loader = PyPDFLoader("../sampledata/Ohana_Resources.pdf")
pages = loader.load_and_split()

In [16]:
print (f'You have {len(pages)} document(s) in your data')

You have 118 document(s) in your data


In [17]:
pages[0]

Document(page_content='Source/Topic:\nMotivational\nInterviewing\nin\nJuly,\nfor\nthe\nSchool\nMental\nHealth\nWorkforce\nInformation:\nWorkshop\n1:\nCognitive\nBehavioral\nTherapy\n(CBT)\n&\nMotivational\nInterviewing\nas\nSchool\nMental\nProviders \nThinking\nThrough\nHow\nto\nIntegrate\nInterventions\nMonday,\nJuly\n24,\n2023\n●\n3:00pm\n-\n5:00pm\nMotivational\ninterviewing\nand\nstandard\nversions\nof\nCognitive\nBehavioral\nTherapy\nare\npowerful\nand\neffective\nstrategies\nto\nhelp\nstudents\nand\ntheir\nfamilies\nrealize\nchange.\nAlthough\nboth\nare\nvery\neffective,\npractitioners\nare\noften\nchallenged\nby\nthe\ndecision\nto\nuse\none\nintervention\nfor\na\nvariety\nof\nschool-based\nsocial\nand\nemotional\nissues.\nIn\nthis\nsession,\nwe\nwill\nconsider\nwhen\nto\nuse\ncommon\nMI\nand\nCBT\ninterventions\nbased\non\nthe\nstages\nof\nchange\nbeing\nexperienced\nby\nthe\nstudent.\nWe\nwill\nalso\nexplore\nwhich\nMI\nand\nCBT\ninterventions\noverlap,\nwhich\nhave\nspecific\n

In [20]:
! pip install tiktoken

Collecting tiktoken
  Downloading tiktoken-0.4.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.7/1.7 MB[0m [31m3.3 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
Installing collected packages: tiktoken
Successfully installed tiktoken-0.4.0


$ conda install -c conda-forge faiss

In [23]:
from langchain.vectorstores import FAISS
from langchain.embeddings.openai import OpenAIEmbeddings

faiss_index = FAISS.from_documents(pages, OpenAIEmbeddings())

In [24]:
faiss_index

<langchain.vectorstores.faiss.FAISS at 0x7f417525f790>

In [25]:
docs = faiss_index.similarity_search("How will the community be engaged?", k=2)
for doc in docs:
    print(str(doc.metadata["page"]) + ":", doc.page_content[:300])

73: Community
is
engaged
and
invested
in
the
development
of
the
program
Time
Consuming
Based
on
empirically
supported
intervention
principles
Assumes
the
core
components
of
an
evidence-base
d
program
are
applicable
across
cultural
groups
Tests
the
applicability
of
generic/
universal
prevention
principle
96: Keep
in
mind
that
the
Opioid
Response
Network
cannot
provide
assistance
in
obtaining
grants,
nor
can
it
allocate
funds
to
support
your
event,
but
we
can
provide
someone
to
train/educate
or
provide
feedback
on
your
work.
Simply
put,
the
network
provides
education
and
training
at
the
local
level
to
ma


In [26]:
from langchain.document_loaders import UnstructuredPDFLoader, OnlinePDFLoader, PyPDFLoader

from langchain.text_splitter import RecursiveCharacterTextSplitter
import os

In [27]:
!pip install unstructured

Collecting unstructured
  Downloading unstructured-0.10.0-py3-none-any.whl (1.5 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.5/1.5 MB[0m [31m2.1 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
Collecting emoji
  Downloading emoji-2.8.0-py2.py3-none-any.whl (358 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m358.9/358.9 kB[0m [31m3.2 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[?25hCollecting python-magic
  Downloading python_magic-0.4.27-py2.py3-none-any.whl (13 kB)
Collecting nltk
  Downloading nltk-3.8.1-py3-none-any.whl (1.5 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.5/1.5 MB[0m [31m3.4 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0mm
[?25hCollecting lxml
  Downloading lxml-4.9.3-cp310-cp310-manylinux_2_28_x86_64.whl (7.9 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.9/7.9 MB[0m [31m4.8 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[?25hCollecting chardet
  Downloading ch

In [30]:
!pip install pdf2image

Collecting pdf2image
  Downloading pdf2image-1.16.3-py3-none-any.whl (11 kB)
Installing collected packages: pdf2image
Successfully installed pdf2image-1.16.3


In [32]:
!pip install pdfminer

Collecting pdfminer
  Downloading pdfminer-20191125.tar.gz (4.2 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m4.2/4.2 MB[0m [31m4.0 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[?25h  Preparing metadata (setup.py) ... [?25ldone
[?25hCollecting pycryptodome
  Downloading pycryptodome-3.18.0-cp35-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.1/2.1 MB[0m [31m6.9 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0mm
[?25hBuilding wheels for collected packages: pdfminer
  Building wheel for pdfminer (setup.py) ... [?25ldone
[?25h  Created wheel for pdfminer: filename=pdfminer-20191125-py3-none-any.whl size=6140065 sha256=c6f54d1681ca06a5019a20bfc277945d83d9a9aa2d44ce838aeeab09cdacc152
  Stored in directory: /home/lkk/.cache/pip/wheels/4e/c1/68/f7bd0a8f514661f76b5cbe3b5f76e0033d79f1296012cbbf72
Successfully built pdfminer
Installing collected packages: pycryptodome, pdfminer
Success

In [34]:
!pip install pdfminer.six

Collecting pdfminer.six
  Downloading pdfminer.six-20221105-py3-none-any.whl (5.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m5.6/5.6 MB[0m [31m2.7 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
Installing collected packages: pdfminer.six
Successfully installed pdfminer.six-20221105


In [28]:
from langchain.document_loaders import OnlinePDFLoader
loader = OnlinePDFLoader("https://www.samhsa.gov/data/sites/default/files/reports/rpt23266/National_Directory_MH_facilities.pdf")

In [35]:
data = loader.load()

print(data)

[nltk_data] Downloading package punkt to /home/lkk/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /home/lkk/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.


[Document(page_content='Behavioral Health Services Information System Series\n\nNational Directory of Mental Health Treatment Facilities 2020\n\nDEPARTMENT OF HEALTH AND HUMAN SERVICES Substance Abuse and Mental Health Services Administration\n\nACKNOWLEDGMENTS\n\nThis publication was prepared for the Center for Behavioral Health Statistics and Quality (CBHSQ), Substance Abuse and Mental Health Services Administration (SAMHSA), U.S. Department of Health and Human Services (HHS). Work was performed under Contract HHSS283200700048I/HHSS283201600001C, Reference No. 283-16-0490. The Contracting Officer\'s Representative at SAMHSA was Nichele Waller.\n\nSAMHSA complies with applicable federal civil rights laws and does not discriminate on the basis of race, color, national origin, age, disability, or sex. SAMHSA cumple con las leyes federales de derechos civiles aplicables y no discrimina por motivos de raza, color, nacionalidad, edad, discapacidad o sexo.\n\nPUBLIC DOMAIN NOTICE\n\nAll mat

In [37]:
len(data)

1

In [38]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=2000, chunk_overlap=0)
texts = text_splitter.split_documents(data)

In [39]:
print (f'Now you have {len(texts)} documents')

Now you have 1827 documents


In [40]:
texts[0]

Document(page_content='Behavioral Health Services Information System Series\n\nNational Directory of Mental Health Treatment Facilities 2020\n\nDEPARTMENT OF HEALTH AND HUMAN SERVICES Substance Abuse and Mental Health Services Administration\n\nACKNOWLEDGMENTS\n\nThis publication was prepared for the Center for Behavioral Health Statistics and Quality (CBHSQ), Substance Abuse and Mental Health Services Administration (SAMHSA), U.S. Department of Health and Human Services (HHS). Work was performed under Contract HHSS283200700048I/HHSS283201600001C, Reference No. 283-16-0490. The Contracting Officer\'s Representative at SAMHSA was Nichele Waller.\n\nSAMHSA complies with applicable federal civil rights laws and does not discriminate on the basis of race, color, national origin, age, disability, or sex. SAMHSA cumple con las leyes federales de derechos civiles aplicables y no discrimina por motivos de raza, color, nacionalidad, edad, discapacidad o sexo.\n\nPUBLIC DOMAIN NOTICE\n\nAll mate

In [41]:
from langchain.embeddings import OpenAIEmbeddings

embeddings_model = OpenAIEmbeddings()

In [42]:
embeddings = embeddings_model.embed_documents(
    [
        "Hi there!",
        "Oh, hello!",
        "What's your name?",
        "My friends call me World",
        "Hello World!"
    ]
)
len(embeddings), len(embeddings[0])

(5, 1536)

In [43]:
embedded_query = embeddings_model.embed_query("What was the name mentioned in the conversation?")
embedded_query[:5]

[0.005367529112845659,
 -0.0005752103170379996,
 0.03887332230806351,
 -0.0029528012964874506,
 -0.008912870660424232]

# Question Answer
https://python.langchain.com/docs/use_cases/question_answering/

In [53]:
! pip install openai chromadb

Collecting chromadb
  Downloading chromadb-0.4.6-py3-none-any.whl (405 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m405.5/405.5 kB[0m [31m1.7 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
Collecting posthog>=2.4.0
  Downloading posthog-3.0.2-py2.py3-none-any.whl (37 kB)
Collecting pypika>=0.48.9
  Downloading PyPika-0.48.9.tar.gz (67 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m67.3/67.3 kB[0m [31m4.8 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h  Preparing metadata (pyproject.toml) ... [?25ldone
Collecting overrides>=7.3.1
  Downloading overrides-7.4.0-py3-none-any.whl (17 kB)
Collecting chroma-hnswlib==0.7.2
  Downloading chroma-hnswlib-0.7.2.tar.gz (31 kB)
  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h  Preparing metadata (pyproject.toml) ... [?25ldone
[?25hCollec

In [55]:
os.environ['OPENAI_API_KEY'] = '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'

In [56]:
from langchain.document_loaders import WebBaseLoader
from langchain.indexes import VectorstoreIndexCreator

loader = WebBaseLoader("https://lilianweng.github.io/posts/2023-06-23-agent/")
index = VectorstoreIndexCreator().from_loaders([loader])

AuthenticationError: Incorrect API key provided: sk-22F6T***************************************LOXB. You can find your API key at https://platform.openai.com/account/api-keys.

https://python.langchain.com/docs/integrations/vectorstores/pinecone

https://colab.research.google.com/drive/1L4zYgzawqf-YUoFZbntKT_jNKGSU6kBM?usp=sharing#scrollTo=2lnC3ziClq2o

In [44]:
! pip install pinecone-client

Collecting pinecone-client
  Downloading pinecone_client-2.2.2-py3-none-any.whl (179 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m179.1/179.1 kB[0m [31m1.4 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
Collecting dnspython>=2.0.0
  Downloading dnspython-2.4.2-py3-none-any.whl (300 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m300.4/300.4 kB[0m [31m6.6 MB/s[0m eta [36m0:00:00[0m00:01[0m
Collecting loguru>=0.5.0
  Downloading loguru-0.7.0-py3-none-any.whl (59 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m60.0/60.0 kB[0m [31m4.5 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: loguru, dnspython, pinecone-client
Successfully installed dnspython-2.4.2 loguru-0.7.0 pinecone-client-2.2.2


In [45]:
import pinecone

  from tqdm.autonotebook import tqdm


In [51]:
pinecone.init(
    api_key='d0d09cf7-00a6-4a5d-a975-7e3d2ca3a69e',  # find at app.pinecone.io
    environment='northamerica-northeast1-gcp'  # next to api key in console
)
index_name = "langchainindex" # put in the name of your pinecone index here

In [47]:
texts

[Document(page_content='Behavioral Health Services Information System Series\n\nNational Directory of Mental Health Treatment Facilities 2020\n\nDEPARTMENT OF HEALTH AND HUMAN SERVICES Substance Abuse and Mental Health Services Administration\n\nACKNOWLEDGMENTS\n\nThis publication was prepared for the Center for Behavioral Health Statistics and Quality (CBHSQ), Substance Abuse and Mental Health Services Administration (SAMHSA), U.S. Department of Health and Human Services (HHS). Work was performed under Contract HHSS283200700048I/HHSS283201600001C, Reference No. 283-16-0490. The Contracting Officer\'s Representative at SAMHSA was Nichele Waller.\n\nSAMHSA complies with applicable federal civil rights laws and does not discriminate on the basis of race, color, national origin, age, disability, or sex. SAMHSA cumple con las leyes federales de derechos civiles aplicables y no discrimina por motivos de raza, color, nacionalidad, edad, discapacidad o sexo.\n\nPUBLIC DOMAIN NOTICE\n\nAll mat

In [49]:
from langchain.vectorstores import Chroma, Pinecone

In [52]:
docsearch = Pinecone.from_texts([t.page_content for t in texts], embeddings, index_name=index_name)

ValueError: No active indexes found in your Pinecone project, are you sure you're using the right API key and environment?

In [None]:
query = "What is the leading cause of death for Asian American and Pacific Islander youth aged 12-19 years old"
docs = docsearch.similarity_search(query)