## Week 2

*Welcome back! Scroll down to week 4.*

We first download the required software: LangChain and its dependency `pypdf`

In [1]:
!pip install --upgrade pip 
!pip install --upgrade langchain pypdf

Collecting pip
  Using cached pip-24.0-py3-none-any.whl.metadata (3.6 kB)
Using cached pip-24.0-py3-none-any.whl (2.1 MB)
Installing collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 23.3.2
    Uninstalling pip-23.3.2:
      Successfully uninstalled pip-23.3.2
Successfully installed pip-24.0
Collecting langchain
  Using cached langchain-0.1.16-py3-none-any.whl.metadata (13 kB)
Collecting pypdf
  Using cached pypdf-4.2.0-py3-none-any.whl.metadata (7.4 kB)
Collecting langchain-community<0.1,>=0.0.32 (from langchain)
  Using cached langchain_community-0.0.32-py3-none-any.whl.metadata (8.5 kB)
Collecting langchain-core<0.2.0,>=0.1.42 (from langchain)
  Using cached langchain_core-0.1.42-py3-none-any.whl.metadata (5.9 kB)
Collecting langsmith<0.2.0,>=0.1.17 (from langchain)
  Downloading langsmith-0.1.46-py3-none-any.whl.metadata (13 kB)
Collecting orjson<4.0.0,>=3.9.14 (from langsmith<0.2.0,>=0.1.17->langchain)
  Using cached orjson-3.10.0-cp38-cp38-m

In [2]:
!pip install pip install pypdf



We then load LangChain's `pypdf` loader.

In [3]:
from langchain.document_loaders import PyPDFLoader, PyPDFDirectoryLoader

Let's first load our PDF... 

In [4]:
loader = PyPDFLoader("Data/2021-census-population-occupied-private-dwellings-community-2001-2021.pdf")

In [5]:
pdf = loader.load_and_split()

In [6]:
loader = PyPDFDirectoryLoader("Data/Supreme Court opinions 2014/")

In [7]:
many_pdfs = loader.load_and_split()

Having loaded both the single PDF and a directory of PDFs, let's now load the CSV. 

In [8]:
from langchain.document_loaders.csv_loader import CSVLoader

In [9]:
loader = CSVLoader("Data/Urban_Design_and_Architecture_Awards_Recipients.csv")

In [10]:
csv = loader.load()

Having loaded our data, we'll now download and load the embedding model.

In [11]:
!pip install sentence_transformers > /dev/null

In [12]:
from langchain.embeddings import HuggingFaceEmbeddings, SentenceTransformerEmbeddings

In [13]:
embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")

Let's try embedding some text. Observe the output. Once you've tried it, scroll down to continue.

In [14]:
text = "This is a test document."

In [15]:
embeddings.embed_query(text)

[-0.03833852708339691,
 0.1234646886587143,
 -0.028642937541007996,
 0.05365273728966713,
 0.00884535163640976,
 -0.03983931988477707,
 -0.07300585508346558,
 0.04777132347226143,
 -0.030462520197033882,
 0.05497976765036583,
 0.08505293726921082,
 0.0366566926240921,
 -0.005319987423717976,
 -0.0022331285290420055,
 -0.06071098893880844,
 -0.027237899601459503,
 -0.011351611465215683,
 -0.042437728494405746,
 0.009129906073212624,
 0.100815549492836,
 0.07578731328248978,
 0.06911718100309372,
 0.009857481345534325,
 -0.0018377420492470264,
 0.026249045506119728,
 0.032902419567108154,
 -0.07177435606718063,
 0.028384260833263397,
 0.061709530651569366,
 -0.052529558539390564,
 0.03366165980696678,
 0.07446815818548203,
 0.07536036521196365,
 0.03538402169942856,
 0.06713403761386871,
 0.010798039846122265,
 0.08167023211717606,
 0.01656291075050831,
 0.03283059597015381,
 0.03632567450404167,
 0.002172845648601651,
 -0.09895741194486618,
 0.005046740174293518,
 0.05089650675654411,
 

We now have a working embedding function. Let's install Chroma.

In [15]:
!pip install -U chromadb

Collecting chromadb
  Using cached chromadb-0.4.24-py3-none-any.whl.metadata (7.3 kB)
Using cached chromadb-0.4.24-py3-none-any.whl (525 kB)
Installing collected packages: chromadb
  Attempting uninstall: chromadb
    Found existing installation: chromadb 0.4.22
    Uninstalling chromadb-0.4.22:
      Successfully uninstalled chromadb-0.4.22
Successfully installed chromadb-0.4.24


In [16]:
from langchain.vectorstores import Chroma

Let's make a vector store for our loaded documents!

In [17]:
db = Chroma.from_documents(pdf, embeddings)

In [18]:
db.add_documents(csv)

['1d63e35b-1af3-46bc-bca3-414173f4179d',
 'cdefcc56-5f66-454c-a33d-5b4125cdf056',
 '6554f03e-de2c-42e1-a303-ae06327cdea8',
 '4f4afa1e-7860-4f6a-8d50-e6ad5e427b23',
 'e6a9b650-783c-4f32-aa5f-79d7efe46700',
 '72f9431f-4892-4e81-a6e6-dd489ffa83a7',
 '44a21dbb-c0c5-4413-91e4-68afb7db6715',
 '03bae676-ab83-4edf-83ff-10e9a9f59f70',
 '328f86a7-60d1-4f84-93b1-551c0c267dd9',
 '374f93a3-f549-480c-ab07-f93046ed385c',
 'dd160691-4b82-488e-823f-35a2c9f4eec1',
 'ec1663b4-9f96-4e13-884a-023286b89428',
 'ba155dce-83f7-455f-953f-40e270ad4085',
 'a7cb4eae-2b80-4092-81e7-b07f15ae132c',
 '3f7cd559-524a-44fe-aa90-68011ac8ec12',
 '400d9e32-aff6-4124-b2fc-76ea93fe86e2',
 'c2ba5a54-ccc8-4367-972c-2505cb329fbb',
 '58dd7d8f-9b7c-48d3-9682-ff3b0beb0774',
 '44fda813-0d56-4511-a2fc-9023ca82c6b1',
 'aca7dbe4-a7b3-4387-9bca-d448712b5bf8',
 'd4228d81-5826-4452-a23f-9e429b506612',
 '8e535177-3dff-4ddd-bfd0-ba56294be63d',
 'f5667083-aa60-44c6-a579-e0239423d3e2',
 'b24ce5c8-15cb-423f-822e-68788c368bee',
 '5b5aef73-4231-

In [19]:
db.add_documents(many_pdfs)

['d4c9153e-d139-4807-925d-73ba64a8887f',
 'd18f3c72-e575-461b-b1a0-d9ec2bf8f4fd',
 '39484ed2-16b4-4b05-967e-c97a83de5e9e',
 'b99657eb-cc95-41cf-bd03-7072389d0317',
 '06a1cfda-ee19-4039-9b12-5b0dbc62ac14',
 '82cf09d6-9b61-4892-a6b5-7272f1e39701',
 'a4cb407f-4678-458e-bd7f-3285abe2ab78',
 '459d8282-67a2-4e03-beab-52e5d8e7678e',
 'c1346430-a642-4a9e-a2b4-098bf2a136b9',
 '2a10b35c-e490-4e67-915e-3766bbf18755',
 'f3d3bc11-fca0-48fc-a959-f13454ff8736',
 '6c589778-b651-4085-b702-cde716669712',
 'fcb95ba2-66a2-4b6b-98fa-10e1e2f27a41',
 '6c68a38e-3a8c-4c2d-8ec4-ce40e3ea147f',
 '9cfba994-5e95-47f3-828d-b60a226781e7',
 'a49073d7-1608-4ba5-bda5-dcd95053280c',
 'dfafe63b-c5dc-4d41-9c61-823b06f91d78',
 '68ea6881-26a1-470f-b802-f4b099596d29',
 '1cd306be-4299-4e18-bebe-ddb349ca32b1',
 'a4210a0a-2c42-4f61-a8b4-8a023f7deee1',
 '0888bd4d-cecf-4a64-a87f-97e8eb2e8e05',
 '391d70cf-fe1f-4a76-8a4c-a365cfbd882c',
 '0173189b-b477-42e3-a43e-a9000b4f0da0',
 '0295caa1-2a29-48dc-81de-2c04296bd684',
 '760c638b-6bc8-

Let's try retrieving a relevant document.

In [21]:
query = "An award concerning art."
db.similarity_search(query)

[Document(page_content='\ufeffX: 591976.7816\nY: 4790547.4424\nOBJECTID: 86\nAWARD_WINNER: The James North Art Crawl\nPROJECT_DESCRIPTION: On the second Friday evening of every month this event programs the historic James Street North streetscape from Murray to King Street with an eclectic array of gallery openings, performances and outdoor art reflective of the emerging arts community in t\nRECIPIENT: The Gallery and Studies of the James North Community\nAWARD_YEAR: 2007\nCATEGORY: Award of Merit for Visionary Project\nLOCATION: James Street North between Murray and King Streets\nCOMMUNITY: Hamilton\nLATITUDE: 43.2621251\nLONGITUDE: -79.8667415', metadata={'row': 85, 'source': 'Data/Urban_Design_and_Architecture_Awards_Recipients.csv'}),
 Document(page_content='\ufeffX: 591942.8531\nY: 4790007.4241\nOBJECTID: 45\nAWARD_WINNER: Empire Times\nPROJECT_DESCRIPTION: The project is an adaptive re-use of an historic building into a performing arts centre and affordable housing for artists. T

In [22]:
query = "What exceptions does Rule 606(b)(1) contain?"
db.similarity_search(query)

[Document(page_content='2 PEREZ v. MORTGAGE BANKERS ASSN. \nSCALIA , J., concurring in judgment \nadministrators whose zeal might otherwise have carried \nthem to excesses not contemplated in legislation creating\ntheir offices.” United States v. Morton Salt Co., 338 U. S. \n632, 644 (1950). The Act guards against excesses in rule-\nmaking by requiring notice and comment.  Before an \nagency makes a rule, it normally must notify the public of\nthe proposal, invite them to comment on its shortcomings, \nconsider and respond to their arguments, and explain itsfinal decision in a statement of the rule’s basis and pur-\npose. 5 U. S. C. §553(b)–(c); ante, at 2. \nThe APA exempts interpretive rules from these re-\nquirements.  §553(b)(A). But this concession to agencies\nwas meant to be more modest in its effects than it is today.For despite exempting interpretive rules from notice and\ncomment, the Act provides that “the reviewing court \nshall . . . interpret constituti onal and statutory

# Week 3

We'll now try making an agent. We'll start by downloading a library to let us run a local language model.

If you're on Windows, a non-Apple Silicon Mac, or Linux, use this command:

In [23]:
!pip install llama-cpp-python --upgrade --force-reinstall --no-cache-dir

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Collecting llama-cpp-python
  Downloading llama_cpp_python-0.2.27.tar.gz (9.4 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m9.4/9.4 MB[0m [31m16.5 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25h  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h  Installing backend dependencies ... [?25ldone
[?25h  Preparing metadata (pyproject.toml) ... [?25ldone
[?25hCollecting typing-extensions>=4.5.0 (from llama-cpp-python)
  Downloading typing_extensions-4.9.0-py3-none-any.whl.metadata (3.0 kB)
Collecting numpy>=1.20.0 (from llama-cpp-python)
  Downloading numpy-1.24.4-cp38-cp38-macosx_10_9_x86_64.whl.metadata (5.6 kB)
Collecting diskcache>=5.6.1 (from llama-cpp-python)
  Downloading diskcache-5.6.3-py3-none-any.whl.metadata (20 kB)
Downloading diskcache-5.6.3-py3-none-any.whl (45 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m45.5/45.5 kB[0m [31m22.8 MB/s[0m eta [36m0:00:00[0m


If you're on Apple Silicon use this command:

In [None]:
!CMAKE_ARGS="-DLLAMA_METAL=on" pip install llama-cpp-python  --upgrade --force-reinstall --no-cache-dir

Let's load the software.

In [20]:
from langchain.llms import LlamaCpp
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler

callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])

With that loaded, let's download a language model.

In [25]:
!wget https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.1-GGUF/resolve/main/mistral-7b-instruct-v0.1.Q4_0.gguf

zsh:1: command not found: wget


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Okay! Let's try loading it.

In [21]:
llm = LlamaCpp(
    model_path="mistral-7b-instruct-v0.1.Q4_0.gguf",
    temperature=0.8,
    max_tokens=200,
    n_ctx=4096,
    top_p=1,
    n_gpu_layers=-1,
    f16_kv=True,
    verbose=True,
    callback_manager=callback_manager
)

llama_model_loader: loaded meta data with 20 key-value pairs and 291 tensors from mistral-7b-instruct-v0.1.Q4_0.gguf (version GGUF V2)
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = mistralai_mistral-7b-instruct-v0.1
llama_model_loader: - kv   2:                       llama.context_length u32              = 32768
llama_model_loader: - kv   3:                     llama.embedding_length u32              = 4096
llama_model_loader: - kv   4:                          llama.block_count u32              = 32
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 14336
llama_model_loader: - kv   6:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv   7:                 llama.attent

Let's try it out!

In [27]:
llm("Where is Chesterfield, Mo.?")

  warn_deprecated(
Llama.generate: prefix-match hit



A: Chesterfield, Mo. is a city located in the state of Missouri, United States. It is about 20 miles southwest of St. Louis and is part of the St. Louis metropolitan area.


llama_print_timings:        load time =   13993.65 ms
llama_print_timings:      sample time =      16.12 ms /    48 runs   (    0.34 ms per token,  2977.30 tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     1 tokens (    0.00 ms per token,      inf tokens per second)
llama_print_timings:        eval time =   41248.29 ms /    48 runs   (  859.34 ms per token,     1.16 tokens per second)
llama_print_timings:       total time =   41488.62 ms


'\nA: Chesterfield, Mo. is a city located in the state of Missouri, United States. It is about 20 miles southwest of St. Louis and is part of the St. Louis metropolitan area.'

Having verified the language model works, let's try making a tool for it to access our vector store.

In [22]:
from langchain.chains import RetrievalQA

Let's define our agent.

In [23]:
from langchain.agents import initialize_agent, Tool
from langchain.agents import AgentType
from langchain.tools import BaseTool
from langchain.chains import LLMMathChain
from langchain.utilities import SerpAPIWrapper
from langchain.agents.agent_toolkits import create_retriever_tool

In [24]:
retriever = db.as_retriever()

In [25]:
tools = [
    create_retriever_tool(
        retriever,
        name="Search knowledge",
        description="Useful for when you need to answer a question. If the user asks a question concerning the Supreme Court or Hamilton, Ontario, find the answer to their question with this tool. Only use this tool once.",
    ),
]

In [26]:
agent = initialize_agent(
    tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True
)

  warn_deprecated(


And let's try it out.

In [33]:
agent.run("Who, in Hamilton, won an award for art?")

  warn_deprecated(
  warn_deprecated(
  warn_deprecated(
Llama.generate: prefix-match hit




[1m> Entering new AgentExecutor chain...[0m
 First, try searching knowledge.
Action: Search knowledge.
Action Input: Hamilton, art award.


llama_print_timings:        load time =   13993.65 ms
llama_print_timings:      sample time =       9.15 ms /    25 runs   (    0.37 ms per token,  2731.34 tokens per second)
llama_print_timings: prompt eval time =  163720.46 ms /   190 tokens (  861.69 ms per token,     1.16 tokens per second)
llama_print_timings:        eval time =   20754.69 ms /    24 runs   (  864.78 ms per token,     1.16 tokens per second)
llama_print_timings:       total time =  184661.72 ms
  warn_deprecated(
Llama.generate: prefix-match hit


[32;1m[1;3m First, try searching knowledge.
Action: Search knowledge.
Action Input: Hamilton, art award.[0m
Observation: Search knowledge. is not a valid tool, try one of [Search knowledge].
Thought: I see, the search didn't work, let me try using [Search knowledge] instead.
Action: Search knowledge.
Action Input: Who won an art award in Hamilton?.[32;1m[1;3m I see, the search didn't work, let me try using [Search knowledge] instead.
Action: Search knowledge.
Action Input: Who won an art award in Hamilton?.[0m
Observation: Search knowledge. is not a valid tool, try one of [Search knowledge].
Thought:


llama_print_timings:        load time =   13993.65 ms
llama_print_timings:      sample time =      14.71 ms /    42 runs   (    0.35 ms per token,  2855.98 tokens per second)
llama_print_timings: prompt eval time =   17659.80 ms /    21 tokens (  840.94 ms per token,     1.19 tokens per second)
llama_print_timings:        eval time =   35373.34 ms /    41 runs   (  862.76 ms per token,     1.16 tokens per second)
llama_print_timings:       total time =   53235.12 ms
  warn_deprecated(
Llama.generate: prefix-match hit


 Alright, searching didn't yield any results, it looks like I need to think about this more.
Final Answer: Unknown.[32;1m[1;3m Alright, searching didn't yield any results, it looks like I need to think about this more.
Final Answer: Unknown.[0m

[1m> Finished chain.[0m



llama_print_timings:        load time =   13993.65 ms
llama_print_timings:      sample time =      10.60 ms /    30 runs   (    0.35 ms per token,  2831.26 tokens per second)
llama_print_timings: prompt eval time =   17939.50 ms /    21 tokens (  854.26 ms per token,     1.17 tokens per second)
llama_print_timings:        eval time =   25211.42 ms /    29 runs   (  869.36 ms per token,     1.15 tokens per second)
llama_print_timings:       total time =   43300.58 ms


'Unknown.'

# Week 4

Welcome back! We'll now try making an interface for our agent. First, let's download Gradio.

In [27]:
!pip install gradio==3.48.0

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)




Next, let's define a `message` function. This will let our Gradio interface talk to our agent.

In [31]:
import gradio as gr

history = []

def predict(message, history):
    response = agent.run("<s>[INST] " + message + " [/INST]")
    # response = llm("<s>[INST] " + message + " [/INST]")
    return response

With that done, let's try loading an interface in just a few lines of code.

In [32]:
gr.ChatInterface(predict, theme="soft", title="My Fancy Chatbot").launch(share=False)

Running on local URL:  http://127.0.0.1:7862

To create a public link, set `share=True` in `launch()`.




Llama.generate: prefix-match hit


 The United States Supreme Court is the highest judicial body in the United States and is responsible for interpreting the Constitution and federal laws. It is located in Washington, D.C., and is made up of nine justices who are nominated by the President and confirmed by the Senate for life tenure.

The Supreme Court has the power to review and rule on cases that raise important Constitutional questions or that have national significance. Its decisions have a far-reaching impact on American society and can set legal precedents for the entire nation.

The Court operates in two main seasons: an annual term that runs from October to June, during which it hears oral arguments and issues decisions, and a summer recess period that lasts from July to September. The justices also meet in private conferences to discuss and vote on cases throughout the year.

In addition to its judicial responsibilities, the Supreme Court also plays an important role in shaping the legal profession


llama_print_timings:        load time =   15033.43 ms
llama_print_timings:      sample time =      66.59 ms /   200 runs   (    0.33 ms per token,  3003.23 tokens per second)
llama_print_timings: prompt eval time =   15744.64 ms /    16 tokens (  984.04 ms per token,     1.02 tokens per second)
llama_print_timings:        eval time =  186829.48 ms /   200 runs   (  934.15 ms per token,     1.07 tokens per second)
llama_print_timings:       total time =  203589.84 ms
Llama.generate: prefix-match hit


 I don't have personal opinions. However, I can provide information about Clarence Thomas. Clarence Thomas is an American lawyer and jurist who has been a justice of the Supreme Court of the United States since 1990. He was nominated by President George H.W. Bush and confirmed by the Senate in a contentious process that included allegations of sexual misconduct, which he denied.

Thomas has been criticized for his conservative views on issues such as affirmative action, campaign finance reform, and abortion rights. Some have also raised concerns about his judicial philosophy and his perceived close ties to powerful conservative donors and interest groups. However, Thomas maintains that he is an independent judge who applies the law fairly and without bias.


llama_print_timings:        load time =   15033.43 ms
llama_print_timings:      sample time =      52.51 ms /   158 runs   (    0.33 ms per token,  3008.72 tokens per second)
llama_print_timings: prompt eval time =   12732.39 ms /    13 tokens (  979.41 ms per token,     1.02 tokens per second)
llama_print_timings:        eval time =  143323.17 ms /   157 runs   (  912.89 ms per token,     1.10 tokens per second)
llama_print_timings:       total time =  156854.30 ms


Super! It works. We now can talk to an agent hooked up to some data through a fancy website.