## Week 2

*Welcome back! Scroll down to week 4.*

We first download the required software: LangChain and its dependency `pypdf`

In [1]:
!pip install --upgrade pip 
!pip install --upgrade langchain pypdf



In [2]:
!pip install pip install pypdf



We then load LangChain's `pypdf` loader.

In [3]:
from langchain.document_loaders import PyPDFLoader, PyPDFDirectoryLoader

Let's first load our PDF... 

In [4]:
loader = PyPDFLoader("Data/2021-census-population-occupied-private-dwellings-community-2001-2021.pdf")

In [5]:
pdf = loader.load_and_split()

In [6]:
loader = PyPDFDirectoryLoader("Data/Supreme Court opinions 2014/")

In [7]:
many_pdfs = loader.load_and_split()

Having loaded both the single PDF and a directory of PDFs, let's now load the CSV. 

In [8]:
from langchain.document_loaders.csv_loader import CSVLoader

In [9]:
loader = CSVLoader("Data/Urban_Design_and_Architecture_Awards_Recipients.csv")

In [10]:
csv = loader.load()

Having loaded our data, we'll now download and load the embedding model.

In [11]:
!pip install sentence_transformers > /dev/null

In [12]:
from langchain.embeddings import HuggingFaceEmbeddings, SentenceTransformerEmbeddings

In [13]:
embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")

Let's try embedding some text. Observe the output. Once you've tried it, scroll down to continue.

In [14]:
text = "This is a test document."

In [15]:
embeddings.embed_query(text)

[-0.03833852708339691,
 0.1234646886587143,
 -0.028642937541007996,
 0.05365273728966713,
 0.00884535163640976,
 -0.03983931988477707,
 -0.07300585508346558,
 0.04777132347226143,
 -0.030462520197033882,
 0.05497976765036583,
 0.08505293726921082,
 0.0366566926240921,
 -0.005319987423717976,
 -0.0022331285290420055,
 -0.06071098893880844,
 -0.027237899601459503,
 -0.011351611465215683,
 -0.042437728494405746,
 0.009129906073212624,
 0.100815549492836,
 0.07578731328248978,
 0.06911718100309372,
 0.009857481345534325,
 -0.0018377420492470264,
 0.026249045506119728,
 0.032902419567108154,
 -0.07177435606718063,
 0.028384260833263397,
 0.061709530651569366,
 -0.052529558539390564,
 0.03366165980696678,
 0.07446815818548203,
 0.07536036521196365,
 0.03538402169942856,
 0.06713403761386871,
 0.010798039846122265,
 0.08167023211717606,
 0.01656291075050831,
 0.03283059597015381,
 0.03632567450404167,
 0.002172845648601651,
 -0.09895741194486618,
 0.005046740174293518,
 0.05089650675654411,
 

We now have a working embedding function. Let's install Chroma.

In [16]:
!pip install -U chromadb

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)




In [17]:
from langchain.vectorstores import Chroma

Let's make a vector store for our loaded documents!

In [18]:
db = Chroma.from_documents(pdf, embeddings)

In [19]:
db.add_documents(csv)

['ea857076-abf8-11ee-8113-acde48001122',
 'ea857116-abf8-11ee-8113-acde48001122',
 'ea857152-abf8-11ee-8113-acde48001122',
 'ea857184-abf8-11ee-8113-acde48001122',
 'ea8571b6-abf8-11ee-8113-acde48001122',
 'ea8571e8-abf8-11ee-8113-acde48001122',
 'ea857210-abf8-11ee-8113-acde48001122',
 'ea857242-abf8-11ee-8113-acde48001122',
 'ea85726a-abf8-11ee-8113-acde48001122',
 'ea857292-abf8-11ee-8113-acde48001122',
 'ea8572ba-abf8-11ee-8113-acde48001122',
 'ea8572d8-abf8-11ee-8113-acde48001122',
 'ea857300-abf8-11ee-8113-acde48001122',
 'ea85731e-abf8-11ee-8113-acde48001122',
 'ea857346-abf8-11ee-8113-acde48001122',
 'ea857364-abf8-11ee-8113-acde48001122',
 'ea85738c-abf8-11ee-8113-acde48001122',
 'ea8573b4-abf8-11ee-8113-acde48001122',
 'ea8573dc-abf8-11ee-8113-acde48001122',
 'ea8573fa-abf8-11ee-8113-acde48001122',
 'ea857418-abf8-11ee-8113-acde48001122',
 'ea857440-abf8-11ee-8113-acde48001122',
 'ea85745e-abf8-11ee-8113-acde48001122',
 'ea8574d6-abf8-11ee-8113-acde48001122',
 'ea857508-abf8-

In [20]:
db.add_documents(many_pdfs)

['56da954e-abf9-11ee-8113-acde48001122',
 '56da9652-abf9-11ee-8113-acde48001122',
 '56da968e-abf9-11ee-8113-acde48001122',
 '56da96b6-abf9-11ee-8113-acde48001122',
 '56da96e8-abf9-11ee-8113-acde48001122',
 '56da971a-abf9-11ee-8113-acde48001122',
 '56da9742-abf9-11ee-8113-acde48001122',
 '56da976a-abf9-11ee-8113-acde48001122',
 '56da979c-abf9-11ee-8113-acde48001122',
 '56da97c4-abf9-11ee-8113-acde48001122',
 '56da97ec-abf9-11ee-8113-acde48001122',
 '56da981e-abf9-11ee-8113-acde48001122',
 '56da9846-abf9-11ee-8113-acde48001122',
 '56da986e-abf9-11ee-8113-acde48001122',
 '56da9896-abf9-11ee-8113-acde48001122',
 '56da98be-abf9-11ee-8113-acde48001122',
 '56da98e6-abf9-11ee-8113-acde48001122',
 '56da9918-abf9-11ee-8113-acde48001122',
 '56da9940-abf9-11ee-8113-acde48001122',
 '56da9968-abf9-11ee-8113-acde48001122',
 '56da9990-abf9-11ee-8113-acde48001122',
 '56da99b8-abf9-11ee-8113-acde48001122',
 '56da99e0-abf9-11ee-8113-acde48001122',
 '56da9a08-abf9-11ee-8113-acde48001122',
 '56da9a3a-abf9-

Let's try retrieving a relevant document.

In [21]:
query = "An award concerning art."
db.similarity_search(query)

[Document(page_content='\ufeffX: 591976.7816\nY: 4790547.4424\nOBJECTID: 86\nAWARD_WINNER: The James North Art Crawl\nPROJECT_DESCRIPTION: On the second Friday evening of every month this event programs the historic James Street North streetscape from Murray to King Street with an eclectic array of gallery openings, performances and outdoor art reflective of the emerging arts community in t\nRECIPIENT: The Gallery and Studies of the James North Community\nAWARD_YEAR: 2007\nCATEGORY: Award of Merit for Visionary Project\nLOCATION: James Street North between Murray and King Streets\nCOMMUNITY: Hamilton\nLATITUDE: 43.2621251\nLONGITUDE: -79.8667415', metadata={'row': 85, 'source': 'Data/Urban_Design_and_Architecture_Awards_Recipients.csv'}),
 Document(page_content='\ufeffX: 591942.8531\nY: 4790007.4241\nOBJECTID: 45\nAWARD_WINNER: Empire Times\nPROJECT_DESCRIPTION: The project is an adaptive re-use of an historic building into a performing arts centre and affordable housing for artists. T

In [22]:
query = "What exceptions does Rule 606(b)(1) contain?"
db.similarity_search(query)

[Document(page_content='2 PEREZ v. MORTGAGE BANKERS ASSN. \nSCALIA , J., concurring in judgment \nadministrators whose zeal might otherwise have carried \nthem to excesses not contemplated in legislation creating\ntheir offices.” United States v. Morton Salt Co., 338 U. S. \n632, 644 (1950). The Act guards against excesses in rule-\nmaking by requiring notice and comment.  Before an \nagency makes a rule, it normally must notify the public of\nthe proposal, invite them to comment on its shortcomings, \nconsider and respond to their arguments, and explain itsfinal decision in a statement of the rule’s basis and pur-\npose. 5 U. S. C. §553(b)–(c); ante, at 2. \nThe APA exempts interpretive rules from these re-\nquirements.  §553(b)(A). But this concession to agencies\nwas meant to be more modest in its effects than it is today.For despite exempting interpretive rules from notice and\ncomment, the Act provides that “the reviewing court \nshall . . . interpret constituti onal and statutory

# Week 3

We'll now try making an agent. We'll start by downloading a library to let us run a local language model.

If you're on Windows, a non-Apple Silicon Mac, or Linux, use this command:

In [23]:
!pip install llama-cpp-python --upgrade --force-reinstall --no-cache-dir

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Collecting llama-cpp-python
  Downloading llama_cpp_python-0.2.27.tar.gz (9.4 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m9.4/9.4 MB[0m [31m8.6 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[?25h  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h  Installing backend dependencies ... [?25ldone
[?25h  Preparing metadata (pyproject.toml) ... [?25ldone
[?25hCollecting typing-extensions>=4.5.0 (from llama-cpp-python)
  Downloading typing_extensions-4.9.0-py3-none-any.whl.metadata (3.0 kB)
Collecting numpy>=1.20.0 (from llama-cpp-python)
  Downloading numpy-1.26.3-cp311-cp311-macosx_10_9_x86_64.whl.metadata (61 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m61.2/61.2 kB[0m [31m11.9 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting diskcache>=5.6.1 (from llama-cpp-python)
  Downloading diskcache-5.6.3-py3-none-any.whl.metadata (20 kB)
Downloading diskcache-5.6.3-py3-none-any.whl (

If you're on Apple Silicon use this command:

In [None]:
!CMAKE_ARGS="-DLLAMA_METAL=on" pip install llama-cpp-python  --upgrade --force-reinstall --no-cache-dir

Let's load the software.

In [24]:
from langchain.llms import LlamaCpp
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler

callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])

With that loaded, let's download a language model.

In [25]:
!wget https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.1-GGUF/resolve/main/mistral-7b-instruct-v0.1.Q4_0.gguf

zsh:1: command not found: wget


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Okay! Let's try loading it.

In [25]:
llm = LlamaCpp(
    model_path="mistral-7b-instruct-v0.1.Q4_0.gguf",
    temperature=0.8,
    max_tokens=200,
    n_ctx=4096,
    top_p=1,
    n_gpu_layers=-1,
    f16_kv=True,
    verbose=True,
    callback_manager=callback_manager
)

llama_model_loader: loaded meta data with 20 key-value pairs and 291 tensors from mistral-7b-instruct-v0.1.Q4_0.gguf (version GGUF V2)
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = mistralai_mistral-7b-instruct-v0.1
llama_model_loader: - kv   2:                       llama.context_length u32              = 32768
llama_model_loader: - kv   3:                     llama.embedding_length u32              = 4096
llama_model_loader: - kv   4:                          llama.block_count u32              = 32
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 14336
llama_model_loader: - kv   6:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv   7:                 llama.attent

Let's try it out!

In [27]:
llm("Where is Chesterfield, Mo.?")

Llama.generate: prefix-match hit



A: Chesterfield is a city located in St. Louis County, Missouri, in the United States. It is situated approximately 20 miles south of downtown St. Louis and 10 miles northwest of the Illinois River.


llama_print_timings:        load time =   13687.86 ms
llama_print_timings:      sample time =      14.01 ms /    50 runs   (    0.28 ms per token,  3569.64 tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     1 tokens (    0.00 ms per token,      inf tokens per second)
llama_print_timings:        eval time =   43443.33 ms /    50 runs   (  868.87 ms per token,     1.15 tokens per second)
llama_print_timings:       total time =   43629.89 ms


'\nA: Chesterfield is a city located in St. Louis County, Missouri, in the United States. It is situated approximately 20 miles south of downtown St. Louis and 10 miles northwest of the Illinois River.'

Having verified the language model works, let's try making a tool for it to access our vector store.

In [28]:
from langchain.chains import RetrievalQA

Let's define our agent.

In [29]:
from langchain.agents import initialize_agent, Tool
from langchain.agents import AgentType
from langchain.tools import BaseTool
from langchain.chains import LLMMathChain
from langchain.utilities import SerpAPIWrapper
from langchain.agents.agent_toolkits import create_retriever_tool

In [30]:
retriever = db.as_retriever()

In [31]:
tools = [
    create_retriever_tool(
        retriever,
        name="Search knowledge",
        description="Useful for when you need to answer a question. If the user asks a question concerning the Supreme Court or Hamilton, Ontario, find the answer to their question with this tool. Only use this tool once.",
    ),
]

In [32]:
agent = initialize_agent(
    tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True
)

And let's try it out.

In [None]:
agent.run("Who, in Hamilton, won an award for art?")



[1m> Entering new AgentExecutor chain...[0m


Llama.generate: prefix-match hit


# Week 4

Welcome back! We'll now try making an interface for our agent. First, let's download Gradio.

In [35]:
!pip install gradio==3.48.0

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Collecting gradio==3.48.0
  Using cached gradio-3.48.0-py3-none-any.whl.metadata (17 kB)
Collecting aiofiles<24.0,>=22.0 (from gradio==3.48.0)
  Using cached aiofiles-23.2.1-py3-none-any.whl.metadata (9.7 kB)
Collecting altair<6.0,>=4.2.0 (from gradio==3.48.0)
  Using cached altair-5.2.0-py3-none-any.whl.metadata (8.7 kB)
Collecting ffmpy (from gradio==3.48.0)
  Using cached ffmpy-0.3.1-py3-none-any.whl
Collecting gradio-client==0.6.1 (from gradio==3.48.0)
  Using cached gradio_client-0.6.1-py3-none-any.whl.metadata (7.1 kB)
Collecting httpx (from gradio==3.48.0)
  Using cached httpx-0.26.0-py3-none-any.whl.metadata (7.6 kB)
Collecting matplotlib~=3.0 (from gradio==3.48.0)
  Using cached matplotlib-3.8.2-cp311-cp311-macosx_10_12_x86_64.whl.metadata (5.8 kB)
Collecting orjson~=3.0 (from gradio==3.48.0)
  Using cached orjson-3.9.10-cp311-cp311-macosx_10_15_x86_64.macosx_11_0_arm64.macosx_10_15_universal2.whl.metadata (49 kB)
Collecting pandas<3.0,>=1.0 (from gradio==3.48.0)
  Using cache

Next, let's define a `message` function. This will let our Gradio interface talk to our agent.

In [41]:
import gradio as gr

history = []

def predict(message, history):
    response = agent.run("<s>[INST] " + message + " [/INST]")
    return response

With that done, let's try loading an interface in just a few lines of code.

In [50]:
gr.ChatInterface(predict, theme="soft", title="My Fancy Chatbot").launch(share=True)

Running on local URL:  http://127.0.0.1:7868
 It does not appear that there was an award

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


 given for anRunning on public URL: https://e8d93b29f04146810c.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)
 artist specifically in Hamilton, Ontario. I should check the source of the data to see if



 it contains any information about artists specifically from Hamilton.
Action: Search knowledge
Action Input: "Artists from Hamilton, Ontario"[32;1m[1;3m It does not appear that there was an award given for an artist specifically in Hamilton, Ontario. I should check the source of the data to see if it contains any information about artists specifically from Hamilton.
Action: Search knowledge
Action Input: "Artists from Hamilton, Ontario"[0m
Observation: [36;1m[1;3m[Document(page_content="\ufeffX: 591517.3657\nY: 4789987.399\nOBJECTID: 99\nAWARD_WINNER: The Art Gallery of Hamilton Renewal\nPROJECT_DESCRIPTION: Alterations and additions to the 1970's era Art Gallery building to provide more income generating space, resolve building envelope concerns and improve the presence of the building in the city.\nRECIPIENT: Kuwabara Payne McKenna Blumbering Architects Inc., Yolles Partnership Inc., Smith & Anderson Consulting Engineers, Carinci Burt Rogers Engineering, Haisall Associates Ltd.


llama_print_timings:        load time =     234.60 ms
llama_print_timings:      sample time =       8.21 ms /    61 runs   (    0.13 ms per token,  7429.96 tokens per second)
llama_print_timings: prompt eval time =   16901.42 ms /  1122 tokens (   15.06 ms per token,    66.38 tokens per second)
llama_print_timings:        eval time =    1970.18 ms /    60 runs   (   32.84 ms per token,    30.45 tokens per second)
llama_print_timings:       total time =   19220.88 ms
Traceback (most recent call last):
  File "/Users/srhm/Library/jupyterlab-desktop/jlab_server/lib/python3.8/site-packages/gradio/routes.py", line 534, in predict
    output = await route_utils.call_process_api(
  File "/Users/srhm/Library/jupyterlab-desktop/jlab_server/lib/python3.8/site-packages/gradio/route_utils.py", line 226, in call_process_api
    output = await app.get_blocks().process_api(
  File "/Users/srhm/Library/jupyterlab-desktop/jlab_server/lib/python3.8/site-packages/gradio/blocks.py", line 1550, in process



[1m> Entering new AgentExecutor chain...[0m


Llama.generate: prefix-match hit


 To find out which businesses won awards for art last year, we should use the search knowledge tool.
Action: Search knowledge
Action Input: "awards for art businesses last year"[32;1m[1;3m To find out which businesses won awards for art last year, we should use the search knowledge tool.
Action: Search knowledge
Action Input: "awards for art businesses last year"[0m
Observation: [36;1m[1;3m[Document(page_content="\ufeffX: 591517.3657\nY: 4789987.399\nOBJECTID: 99\nAWARD_WINNER: The Art Gallery of Hamilton Renewal\nPROJECT_DESCRIPTION: Alterations and additions to the 1970's era Art Gallery building to provide more income generating space, resolve building envelope concerns and improve the presence of the building in the city.\nRECIPIENT: Kuwabara Payne McKenna Blumbering Architects Inc., Yolles Partnership Inc., Smith & Anderson Consulting Engineers, Carinci Burt Rogers Engineering, Haisall Associates Ltd., PCL Constructors Canada Inc\nAWARD_YEAR: 2005\nCATEGORY: Award of Merit in


llama_print_timings:        load time =     234.60 ms
llama_print_timings:      sample time =       7.70 ms /    42 runs   (    0.18 ms per token,  5455.25 tokens per second)
llama_print_timings: prompt eval time =     314.22 ms /    16 tokens (   19.64 ms per token,    50.92 tokens per second)
llama_print_timings:        eval time =     767.96 ms /    42 runs   (   18.28 ms per token,    54.69 tokens per second)
llama_print_timings:       total time =    1200.02 ms
Llama.generate: prefix-match hit


 We found 4 businesses that won awards for art last year.
Final Answer: The Art Gallery of Hamilton Renewal, The James North Art Crawl, Arts Centre and Lofts, Empire Times[32;1m[1;3m We found 4 businesses that won awards for art last year.
Final Answer: The Art Gallery of Hamilton Renewal, The James North Art Crawl, Arts Centre and Lofts, Empire Times[0m

[1m> Finished chain.[0m



llama_print_timings:        load time =     234.60 ms
llama_print_timings:      sample time =       5.46 ms /    43 runs   (    0.13 ms per token,  7879.79 tokens per second)
llama_print_timings: prompt eval time =   13512.20 ms /  1109 tokens (   12.18 ms per token,    82.07 tokens per second)
llama_print_timings:        eval time =    1069.13 ms /    42 runs   (   25.46 ms per token,    39.28 tokens per second)
llama_print_timings:       total time =   14842.22 ms


Super! It works. We now can talk to an agent hooked up to some data through a fancy website.