## Week 2

*Welcome back! Scroll down to week 4.*

We first download the required software: LangChain and its dependency `pypdf`

In [1]:
!pip install --upgrade pip 
!pip install --upgrade langchain pypdf



In [2]:
!pip install pip install pypdf

Collecting install
  Using cached install-1.3.5-py3-none-any.whl (3.2 kB)
Installing collected packages: install
Successfully installed install-1.3.5


We then load LangChain's `pypdf` loader.

In [3]:
from langchain.document_loaders import PyPDFLoader, PyPDFDirectoryLoader

Let's first load our PDF... 

In [4]:
loader = PyPDFLoader("Data/2021-census-population-occupied-private-dwellings-community-2001-2021.pdf")

In [5]:
pdf = loader.load_and_split()

In [6]:
loader = PyPDFDirectoryLoader("Data/Supreme Court opinions 2014/")

In [7]:
many_pdfs = loader.load_and_split()

Having loaded both the single PDF and a directory of PDFs, let's now load the CSV. 

In [8]:
from langchain.document_loaders.csv_loader import CSVLoader

In [9]:
loader = CSVLoader("Data/Urban_Design_and_Architecture_Awards_Recipients.csv")

In [10]:
csv = loader.load()

Having loaded our data, we'll now download and load the embedding model.

In [11]:
!pip install sentence_transformers > /dev/null

In [12]:
from langchain.embeddings import HuggingFaceEmbeddings, SentenceTransformerEmbeddings

In [13]:
embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")

Let's try embedding some text. Observe the output. Once you've tried it, scroll down to continue.

In [14]:
text = "This is a test document."

In [15]:
embeddings.embed_query(text)

[-0.03833852708339691,
 0.1234646886587143,
 -0.028642937541007996,
 0.05365273728966713,
 0.00884535163640976,
 -0.03983931988477707,
 -0.07300585508346558,
 0.04777132347226143,
 -0.030462520197033882,
 0.05497976765036583,
 0.08505293726921082,
 0.0366566926240921,
 -0.005319987423717976,
 -0.0022331285290420055,
 -0.06071098893880844,
 -0.027237899601459503,
 -0.011351611465215683,
 -0.042437728494405746,
 0.009129906073212624,
 0.100815549492836,
 0.07578731328248978,
 0.06911718100309372,
 0.009857481345534325,
 -0.0018377420492470264,
 0.026249045506119728,
 0.032902419567108154,
 -0.07177435606718063,
 0.028384260833263397,
 0.061709530651569366,
 -0.052529558539390564,
 0.03366165980696678,
 0.07446815818548203,
 0.07536036521196365,
 0.03538402169942856,
 0.06713403761386871,
 0.010798039846122265,
 0.08167023211717606,
 0.01656291075050831,
 0.03283059597015381,
 0.03632567450404167,
 0.002172845648601651,
 -0.09895741194486618,
 0.005046740174293518,
 0.05089650675654411,
 

We now have a working embedding function. Let's install Chroma.

In [16]:
!pip install -U chromadb

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)




In [17]:
from langchain.vectorstores import Chroma

Let's make a vector store for our loaded documents!

In [18]:
db = Chroma.from_documents(pdf, embeddings)

In [19]:
db.add_documents(csv)

['8189c296-ace5-11ee-83fb-acde48001122',
 '8189c32c-ace5-11ee-83fb-acde48001122',
 '8189c368-ace5-11ee-83fb-acde48001122',
 '8189c39a-ace5-11ee-83fb-acde48001122',
 '8189c3c2-ace5-11ee-83fb-acde48001122',
 '8189c3ea-ace5-11ee-83fb-acde48001122',
 '8189c412-ace5-11ee-83fb-acde48001122',
 '8189c430-ace5-11ee-83fb-acde48001122',
 '8189c458-ace5-11ee-83fb-acde48001122',
 '8189c480-ace5-11ee-83fb-acde48001122',
 '8189c4a8-ace5-11ee-83fb-acde48001122',
 '8189c4d0-ace5-11ee-83fb-acde48001122',
 '8189c4f8-ace5-11ee-83fb-acde48001122',
 '8189c516-ace5-11ee-83fb-acde48001122',
 '8189c53e-ace5-11ee-83fb-acde48001122',
 '8189c566-ace5-11ee-83fb-acde48001122',
 '8189c58e-ace5-11ee-83fb-acde48001122',
 '8189c5b6-ace5-11ee-83fb-acde48001122',
 '8189c5de-ace5-11ee-83fb-acde48001122',
 '8189c5fc-ace5-11ee-83fb-acde48001122',
 '8189c624-ace5-11ee-83fb-acde48001122',
 '8189c64c-ace5-11ee-83fb-acde48001122',
 '8189c674-ace5-11ee-83fb-acde48001122',
 '8189c69c-ace5-11ee-83fb-acde48001122',
 '8189c6c4-ace5-

In [20]:
db.add_documents(many_pdfs)

['86a99e86-ace5-11ee-83fb-acde48001122',
 '86a99f9e-ace5-11ee-83fb-acde48001122',
 '86a99fe4-ace5-11ee-83fb-acde48001122',
 '86a9a020-ace5-11ee-83fb-acde48001122',
 '86a9a048-ace5-11ee-83fb-acde48001122',
 '86a9a07a-ace5-11ee-83fb-acde48001122',
 '86a9a0ac-ace5-11ee-83fb-acde48001122',
 '86a9a0d4-ace5-11ee-83fb-acde48001122',
 '86a9a0fc-ace5-11ee-83fb-acde48001122',
 '86a9a12e-ace5-11ee-83fb-acde48001122',
 '86a9a160-ace5-11ee-83fb-acde48001122',
 '86a9a188-ace5-11ee-83fb-acde48001122',
 '86a9a1ba-ace5-11ee-83fb-acde48001122',
 '86a9a1e2-ace5-11ee-83fb-acde48001122',
 '86a9a20a-ace5-11ee-83fb-acde48001122',
 '86a9a23c-ace5-11ee-83fb-acde48001122',
 '86a9a264-ace5-11ee-83fb-acde48001122',
 '86a9a296-ace5-11ee-83fb-acde48001122',
 '86a9a2be-ace5-11ee-83fb-acde48001122',
 '86a9a2f0-ace5-11ee-83fb-acde48001122',
 '86a9a318-ace5-11ee-83fb-acde48001122',
 '86a9a340-ace5-11ee-83fb-acde48001122',
 '86a9a372-ace5-11ee-83fb-acde48001122',
 '86a9a39a-ace5-11ee-83fb-acde48001122',
 '86a9a3c2-ace5-

Let's try retrieving a relevant document.

In [21]:
query = "An award concerning art."
db.similarity_search(query)

[Document(page_content='\ufeffX: 591976.7816\nY: 4790547.4424\nOBJECTID: 86\nAWARD_WINNER: The James North Art Crawl\nPROJECT_DESCRIPTION: On the second Friday evening of every month this event programs the historic James Street North streetscape from Murray to King Street with an eclectic array of gallery openings, performances and outdoor art reflective of the emerging arts community in t\nRECIPIENT: The Gallery and Studies of the James North Community\nAWARD_YEAR: 2007\nCATEGORY: Award of Merit for Visionary Project\nLOCATION: James Street North between Murray and King Streets\nCOMMUNITY: Hamilton\nLATITUDE: 43.2621251\nLONGITUDE: -79.8667415', metadata={'row': 85, 'source': 'Data/Urban_Design_and_Architecture_Awards_Recipients.csv'}),
 Document(page_content='\ufeffX: 591942.8531\nY: 4790007.4241\nOBJECTID: 45\nAWARD_WINNER: Empire Times\nPROJECT_DESCRIPTION: The project is an adaptive re-use of an historic building into a performing arts centre and affordable housing for artists. T

In [22]:
query = "What exceptions does Rule 606(b)(1) contain?"
db.similarity_search(query)

[Document(page_content='2 PEREZ v. MORTGAGE BANKERS ASSN. \nSCALIA , J., concurring in judgment \nadministrators whose zeal might otherwise have carried \nthem to excesses not contemplated in legislation creating\ntheir offices.” United States v. Morton Salt Co., 338 U. S. \n632, 644 (1950). The Act guards against excesses in rule-\nmaking by requiring notice and comment.  Before an \nagency makes a rule, it normally must notify the public of\nthe proposal, invite them to comment on its shortcomings, \nconsider and respond to their arguments, and explain itsfinal decision in a statement of the rule’s basis and pur-\npose. 5 U. S. C. §553(b)–(c); ante, at 2. \nThe APA exempts interpretive rules from these re-\nquirements.  §553(b)(A). But this concession to agencies\nwas meant to be more modest in its effects than it is today.For despite exempting interpretive rules from notice and\ncomment, the Act provides that “the reviewing court \nshall . . . interpret constituti onal and statutory

# Week 3

We'll now try making an agent. We'll start by downloading a library to let us run a local language model.

If you're on Windows, a non-Apple Silicon Mac, or Linux, use this command:

In [23]:
!pip install llama-cpp-python --upgrade --force-reinstall --no-cache-dir

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Collecting llama-cpp-python
  Downloading llama_cpp_python-0.2.27.tar.gz (9.4 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m9.4/9.4 MB[0m [31m16.5 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25h  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h  Installing backend dependencies ... [?25ldone
[?25h  Preparing metadata (pyproject.toml) ... [?25ldone
[?25hCollecting typing-extensions>=4.5.0 (from llama-cpp-python)
  Downloading typing_extensions-4.9.0-py3-none-any.whl.metadata (3.0 kB)
Collecting numpy>=1.20.0 (from llama-cpp-python)
  Downloading numpy-1.24.4-cp38-cp38-macosx_10_9_x86_64.whl.metadata (5.6 kB)
Collecting diskcache>=5.6.1 (from llama-cpp-python)
  Downloading diskcache-5.6.3-py3-none-any.whl.metadata (20 kB)
Downloading diskcache-5.6.3-py3-none-any.whl (45 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m45.5/45.5 kB[0m [31m22.8 MB/s[0m eta [36m0:00:00[0m


If you're on Apple Silicon use this command:

In [None]:
!CMAKE_ARGS="-DLLAMA_METAL=on" pip install llama-cpp-python  --upgrade --force-reinstall --no-cache-dir

Let's load the software.

In [24]:
from langchain.llms import LlamaCpp
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler

callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])

With that loaded, let's download a language model.

In [25]:
!wget https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.1-GGUF/resolve/main/mistral-7b-instruct-v0.1.Q4_0.gguf

zsh:1: command not found: wget


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Okay! Let's try loading it.

In [25]:
llm = LlamaCpp(
    model_path="mistral-7b-instruct-v0.1.Q4_0.gguf",
    temperature=0.8,
    max_tokens=200,
    n_ctx=4096,
    top_p=1,
    n_gpu_layers=-1,
    f16_kv=True,
    verbose=True,
    callback_manager=callback_manager
)

llama_model_loader: loaded meta data with 20 key-value pairs and 291 tensors from mistral-7b-instruct-v0.1.Q4_0.gguf (version GGUF V2)
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = mistralai_mistral-7b-instruct-v0.1
llama_model_loader: - kv   2:                       llama.context_length u32              = 32768
llama_model_loader: - kv   3:                     llama.embedding_length u32              = 4096
llama_model_loader: - kv   4:                          llama.block_count u32              = 32
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 14336
llama_model_loader: - kv   6:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv   7:                 llama.attent

Let's try it out!

In [27]:
llm("Where is Chesterfield, Mo.?")

  warn_deprecated(
Llama.generate: prefix-match hit



A: Chesterfield, Mo. is a city located in the state of Missouri, United States. It is about 20 miles southwest of St. Louis and is part of the St. Louis metropolitan area.


llama_print_timings:        load time =   13993.65 ms
llama_print_timings:      sample time =      16.12 ms /    48 runs   (    0.34 ms per token,  2977.30 tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     1 tokens (    0.00 ms per token,      inf tokens per second)
llama_print_timings:        eval time =   41248.29 ms /    48 runs   (  859.34 ms per token,     1.16 tokens per second)
llama_print_timings:       total time =   41488.62 ms


'\nA: Chesterfield, Mo. is a city located in the state of Missouri, United States. It is about 20 miles southwest of St. Louis and is part of the St. Louis metropolitan area.'

Having verified the language model works, let's try making a tool for it to access our vector store.

In [28]:
from langchain.chains import RetrievalQA

Let's define our agent.

In [29]:
from langchain.agents import initialize_agent, Tool
from langchain.agents import AgentType
from langchain.tools import BaseTool
from langchain.chains import LLMMathChain
from langchain.utilities import SerpAPIWrapper
from langchain.agents.agent_toolkits import create_retriever_tool

In [30]:
retriever = db.as_retriever()

In [31]:
tools = [
    create_retriever_tool(
        retriever,
        name="Search knowledge",
        description="Useful for when you need to answer a question. If the user asks a question concerning the Supreme Court or Hamilton, Ontario, find the answer to their question with this tool. Only use this tool once.",
    ),
]

In [32]:
agent = initialize_agent(
    tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True
)

  warn_deprecated(
  warn_deprecated(
  warn_deprecated(


And let's try it out.

In [33]:
agent.run("Who, in Hamilton, won an award for art?")

  warn_deprecated(
  warn_deprecated(
  warn_deprecated(
Llama.generate: prefix-match hit




[1m> Entering new AgentExecutor chain...[0m
 First, try searching knowledge.
Action: Search knowledge.
Action Input: Hamilton, art award.


llama_print_timings:        load time =   13993.65 ms
llama_print_timings:      sample time =       9.15 ms /    25 runs   (    0.37 ms per token,  2731.34 tokens per second)
llama_print_timings: prompt eval time =  163720.46 ms /   190 tokens (  861.69 ms per token,     1.16 tokens per second)
llama_print_timings:        eval time =   20754.69 ms /    24 runs   (  864.78 ms per token,     1.16 tokens per second)
llama_print_timings:       total time =  184661.72 ms
  warn_deprecated(
Llama.generate: prefix-match hit


[32;1m[1;3m First, try searching knowledge.
Action: Search knowledge.
Action Input: Hamilton, art award.[0m
Observation: Search knowledge. is not a valid tool, try one of [Search knowledge].
Thought: I see, the search didn't work, let me try using [Search knowledge] instead.
Action: Search knowledge.
Action Input: Who won an art award in Hamilton?.[32;1m[1;3m I see, the search didn't work, let me try using [Search knowledge] instead.
Action: Search knowledge.
Action Input: Who won an art award in Hamilton?.[0m
Observation: Search knowledge. is not a valid tool, try one of [Search knowledge].
Thought:


llama_print_timings:        load time =   13993.65 ms
llama_print_timings:      sample time =      14.71 ms /    42 runs   (    0.35 ms per token,  2855.98 tokens per second)
llama_print_timings: prompt eval time =   17659.80 ms /    21 tokens (  840.94 ms per token,     1.19 tokens per second)
llama_print_timings:        eval time =   35373.34 ms /    41 runs   (  862.76 ms per token,     1.16 tokens per second)
llama_print_timings:       total time =   53235.12 ms
  warn_deprecated(
Llama.generate: prefix-match hit


 Alright, searching didn't yield any results, it looks like I need to think about this more.
Final Answer: Unknown.[32;1m[1;3m Alright, searching didn't yield any results, it looks like I need to think about this more.
Final Answer: Unknown.[0m

[1m> Finished chain.[0m



llama_print_timings:        load time =   13993.65 ms
llama_print_timings:      sample time =      10.60 ms /    30 runs   (    0.35 ms per token,  2831.26 tokens per second)
llama_print_timings: prompt eval time =   17939.50 ms /    21 tokens (  854.26 ms per token,     1.17 tokens per second)
llama_print_timings:        eval time =   25211.42 ms /    29 runs   (  869.36 ms per token,     1.15 tokens per second)
llama_print_timings:       total time =   43300.58 ms


'Unknown.'

# Week 4

Welcome back! We'll now try making an interface for our agent. First, let's download Gradio.

In [34]:
!pip install gradio==3.48.0

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)




Next, let's define a `message` function. This will let our Gradio interface talk to our agent.

In [35]:
import gradio as gr

history = []

def predict(message, history):
    response = agent.run("<s>[INST] " + message + " [/INST]")
    return response

With that done, let's try loading an interface in just a few lines of code.

In [36]:
gr.ChatInterface(predict, theme="soft", title="My Fancy Chatbot").launch(share=False)

Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.




  warn_deprecated(
  warn_deprecated(
  warn_deprecated(
Llama.generate: prefix-match hit




[1m> Entering new AgentExecutor chain...[0m
 I'm just a program, so I don't have feelings or emotions
Action: Search knowledge
Action Input: "how do robots feel?"[32;1m[1;3m I'm just a program, so I don't have feelings or emotions
Action: Search knowledge
Action Input: "how do robots feel?"[0m
Observation: [36;1m[1;3m[Document(page_content='2 YATES v. UNITED STATES \nKAGAN , J., dissenting \nmore conventional result: A “tangible object” is an object\nthat’s tangible.  I would apply the statute that Congress\nenacted and affirm the judgment below. \nI \nWhile the plurality starts its analysis with §1519’s\nheading, see ante, at 10 (“We note first §1519’s caption”), I \nwould begin with §1519’s text. When Congress has notsupplied a definition, we generally give a statutory term its ordinary meaning. See, e.g., Schindler Elevator Corp. \nv. United States ex rel. Kirk , 563 U. S. ___, ___ (2011) (slip \nop., at 5). As the plurality must acknowledge, the ordi-\nnary meaning of “tan


llama_print_timings:        load time =   13993.65 ms
llama_print_timings:      sample time =      13.78 ms /    36 runs   (    0.38 ms per token,  2612.86 tokens per second)
llama_print_timings: prompt eval time =   17196.28 ms /    19 tokens (  905.07 ms per token,     1.10 tokens per second)
llama_print_timings:        eval time =   31035.91 ms /    35 runs   (  886.74 ms per token,     1.13 tokens per second)
llama_print_timings:       total time =   48418.68 ms
  warn_deprecated(
Llama.generate: prefix-match hit


Super! It works. We now can talk to an agent hooked up to some data through a fancy website.