### Install Dependencies

In [1]:
%pip install --q langchain-community
%pip install --q langchain
%pip install --q duckduckgo-search

# TODO: add the following packages to poetry
%pip install --q requests
%pip install --q beautifulsoup4
%pip install --q python-dotenv

#################################
# Required for PaperSpace Gradient
%pip install --q typing-inspect==0.8.0 typing_extensions==4.5.0
%pip install --q pydantic==1.10.8

[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
pytest 7.2.1 requires attrs>=19.2.0, but you have attrs 18.2.0 which is incompatible.
gradient 2.0.6 requires marshmallow<3.0, but you have marshmallow 3.21.0 which is incompatible.[0m[31m
[0mNote: you may need to restart the kernel to use updated packages.
[0mNote: you may need to restart the kernel to use updated packages.
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
gradient 2.0.6 requires marshmallow<3.0, but you have marshmallow 3.21.0 which is incompatible.[0m[31m
[0mNote: you may need to restart the kernel to use updated packages.
[0mNote: you may need to restart the kernel to use updated packages.
[0mNote: you may need to restart the kernel to use updated pack

### Setup Ollama

In [28]:
# %> curl -fsSL https://ollama.com/install.sh | sh
# %> ollama serve
# %> ollama pull mixtral:instruct     (gemma:7b-instruct | mistral:instruct)

!ollama list

NAME            	ID          	SIZE 	MODIFIED           
mixtral:instruct	7708c059a8bb	26 GB	About a minute ago	


### Accelerator Info

In [5]:
!nvidia-smi

Wed Feb 28 09:46:17 2024       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.116.04   Driver Version: 525.116.04   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  NVIDIA RTX A6000    Off  | 00000000:00:05.0 Off |                  Off |
| 30%   32C    P8    23W / 300W |      1MiB / 49140MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

### Load Environment Variables & Config
This includes:
- LANGCHAIN_TRACING_V2
- LANGCHAIN_API_KEY

In [3]:
%load_ext dotenv
%dotenv GenieSearch.env
%env LANGCHAIN_TRACING_V2

'True'

### Single Question Prompt

In [29]:
template = """{text}
-----------
Using the above text, answer in short the following question:
> {question}
-----------
if the question cannot be answered using the text, imply summarize the text. Include all
factual information, numbers, stats etc if available.
"""

In [30]:
from langchain_core.prompts import ChatPromptTemplate
SINGLE_QUESTION_PROMPT = ChatPromptTemplate.from_template(template)

### Ingest Documents

In [31]:
import requests
from bs4 import BeautifulSoup

def scrape_url(url: str):
    try:
        # set a get request to webpage
        response = requests.get(url)

        # if the request response was successful
        if (response.status_code == 200):
            # parse the content of the request
            soup = BeautifulSoup(response.text, "html.parser")
            text_content = soup.get_text(separator=" ", strip=True)
            return text_content
        else:
            return f"Failed to retrieve page: {response.status_code}"
    except Exception as e:
        print(e)    

    return ""

In [32]:
url = "https://blog.langchain.dev/announcing-langsmith/"

content = scrape_url(url)[:10000]

### Setup Local LLM & Embedding Models

In [33]:
LLM_MODEL = "mixtral:instruct"  # ("gemma:7b-instruct" | "mistral:instruct")
TEMPERATURE = 0.9


#### Load & Test Local LLM Model

#### Testing the Ollama model. No need to import separate llm model.

In [34]:
from langchain.llms import Ollama

_test_llm_model_ = Ollama(
    model=LLM_MODEL,
    temperature=TEMPERATURE,
)

_test_llm_model_("Who are you?")

  warn_deprecated(


' I am a helpful assistant here to provide information and answer questions to the best of my ability. How can I help you today?'

### Web Search Agent

In [35]:
from langchain.utilities import DuckDuckGoSearchAPIWrapper

NUM_RESULTS_PER_QUESTION = 3
ddg_search = DuckDuckGoSearchAPIWrapper()

async def web_search(query: str, num_results: int = NUM_RESULTS_PER_QUESTION):
    web_query_results = ddg_search.results(query, num_results)
    return [r["link"] for r in web_query_results]


In [36]:
await web_search("What is langsmith?")

['https://blog.logrocket.com/langsmith-test-llms-ai-applications/',
 'https://blog.langchain.dev/announcing-langsmith/',
 'https://cheatsheet.md/langchain-tutorials/langsmith.en']

### Setup Chat Chain

In [37]:
import asyncio
from langchain_community.chat_models import ChatOllama
from langchain.schema.output_parser import StrOutputParser
from langchain.schema.runnable import RunnablePassthrough, RunnableLambda

scrape_and_summarize_chain = (
    RunnablePassthrough.assign(text=lambda x: scrape_url(x["url"])[:10000])
    | SINGLE_QUESTION_PROMPT
    | ChatOllama(model=LLM_MODEL)
    | StrOutputParser()
)

web_search_chain = (
    RunnablePassthrough.assign(urls=lambda x: asyncio.run(web_search(x["question"])))
    | (lambda x: [{"question": x["question"], "url": u} for u in x["urls"]])
    | scrape_and_summarize_chain.map()
)


chat_response = web_search_chain.invoke({"question": "Martin Fabbri"})
chat_response

[' The text does not provide enough information to answer the question. It only mentions that Robby Fabbri scored a goal for the Detroit Red Wings in their game against the Columbus Blue Jackets, but it does not provide any additional details about him.',
 ' The text does not provide any information about a person or entity named "Martin Fabbri." The text itself is an error message indicating that there was a failure to retrieve a page, and the specific error code is 403, which typically means that the server understood the request but refuses to fulfill it. Without more context, it\'s not possible to say why the page retrieval failed or what it might have contained.',
 ' The text does not provide any information about a person or entity named "Martin Fabbri." The text itself is an error message, specifically a 403 error, which indicates that the user attempting to access a web page does not have permission to do so.']

#### Research Web Search

In [39]:
questions_generation = (
    "Write 3 google search queries to search online that form an objective "
    "opinion from the following question: {question}\n "
    "Return a python list in the following format: "
    "['query 1', 'query 2', 'query 3']\n "
    "Do not include anything else after the list."
)

SEARCH_PROMPT = ChatPromptTemplate.from_messages([("user", questions_generation)])

search_qanda_chain = (
    SEARCH_PROMPT
    | ChatOllama(model=LLM_MODEL)
    | StrOutputParser()
    | eval
    | (lambda l: [{"question": q} for q in l])
)

full_research_chain = search_qanda_chain | web_search_chain.map()

lol = full_research_chain.invoke({"question": "what is json?"})
lol

[[" JSON (JavaScript Object Notation) is a lightweight data interchange format that is easy for humans to read and write and easy for machines to parse and generate. It is based on a subset of the JavaScript Programming Language, Standard ECMA-262 3rd Edition - December 1999. JSON is a text format that is completely language independent but uses conventions that are familiar to programmers of the C-family of languages, including C, C++, C#, Java, JavaScript, Perl, Python, and many others. These properties make JSON an ideal data-interchange language.\n\nJSON is built on two structures:\n\n* A collection of name/value pairs. In various languages, this is realized as an object, record, struct, dictionary, hash table, keyed list, or associative array.\n* An ordered list of values. In most languages, this is realized as an array, vector, list, or sequence.\n\nAn object is an unordered set of name/value pairs, where a name is a string and a value is a string, number, boolean, null, object, 

### Report Writer Chain

In [40]:
def collapse_list_of_lists(list_of_lists):
    content = []
    for l in list_of_lists:
        content.append("\n\n".join(l))
    return "\n\n".join(content)

In [41]:
collapse_list_of_lists(lol)

' JSON (JavaScript Object Notation) is a lightweight data interchange format that is easy for humans to read and write and easy for machines to parse and generate. It is based on a subset of the JavaScript Programming Language, Standard ECMA-262 3rd Edition - December 1999. JSON is a text format that is completely language independent but uses conventions that are familiar to programmers of the C-family of languages, including C, C++, C#, Java, JavaScript, Perl, Python, and many others. These properties make JSON an ideal data-interchange language.\n\nJSON is built on two structures:\n\n* A collection of name/value pairs. In various languages, this is realized as an object, record, struct, dictionary, hash table, keyed list, or associative array.\n* An ordered list of values. In most languages, this is realized as an array, vector, list, or sequence.\n\nAn object is an unordered set of name/value pairs, where a name is a string and a value is a string, number, boolean, null, object, or

### Research Writer Prompt

In [47]:
WRITER_SYSTEM_PROMPT = (
    "You are an AI critical thinker research assistant. Your sole purpose "
    "is to write well written, critically acclaimed, objective and structured "
    "reports on given text."
)

RESEARCH_REPORT_TEMPLATE = """Information:
--------
{research_summary}
--------
Using the above information, answer the following question or topic: "{question}" 
in a detailed reseach report format. The report should focus on the answer to the question, 
should be well structured, informative,
in depth, with facts and numbers if available and a minimum of 1,200 words.
You should strive to write the report as long as you can using all relevant 
and necessary information provided.
You must write the report with markdown syntax.
You MUST determine your own concrete and valid opinion based on the given 
information. Do NOT deter to general and meaningless conclusions.
Write all used source urls at the end of the report, and make sure to not 
add duplicated sources, but only one reference for each.
You must write the report in apa format.
Please do your best, this is very important to my career."""

research_writer_prompt = ChatPromptTemplate.from_messages(
    [("system", WRITER_SYSTEM_PROMPT), ("user", RESEARCH_REPORT_TEMPLATE)]
)

### Research Writer Chain

In [48]:
research_writer_chain = (
    RunnablePassthrough.assign(
        research_summary=full_research_chain | collapse_list_of_lists
    )
    | research_writer_prompt
    | ChatOllama(model=LLM_MODEL)
    | StrOutputParser()
)

In [49]:
research_writer_chain.invoke({"question": "What is the difference between LangChain and LangSmith?"})

" Introduction\n\nLangChain and LangSmith are two tools used in the development of language models, but they serve different purposes. While LangChain is an open-source tool for building and deploying large language models (LLMs), LangSmith is a platform for managing LLM applications. This report will provide an in-depth comparison between the two tools, including their features, use cases, and advantages.\n\nLangChain\n\nLangChain is an open-source library that provides various modules and classes to create a chat-based AI environment. It allows developers to build language models by initializing an agent that generates responses based on descriptions. LangChain can be installed using pip, and the runtime environment must be configured by setting some environment variables. The library also provides a ChatOpenAI class that loads specific tools for different tasks.\n\nLangSmith\n\nLangSmith is a platform that allows developers to set up a development environment and use LangChain compo