Note: Responses from local models can be quite slow, especially with 8-bit quantization.

With 4bit quantization, llama2-7b-chat uses about 8GB of VRAM

In [None]:
%pip install llama-index
%pip install transformers accelerate bitsandbytes
%pip install llama-index-readers-web
%pip install llama-index-llms-huggingface
%pip install llama-index-embeddings-huggingface
%pip install llama-index-program-openai
%pip install llama-index-agent-openai

## Setup

### Data

In [None]:
from llama_index.readers.web import BeautifulSoupWebReader

url = "https://www.theverge.com/2023/9/29/23895675/ai-bot-social-network-openai-meta-chatbots"

documents = BeautifulSoupWebReader().load_data([url])

### LLM

This should run on a T4 GPU in the free tier

In [None]:
# huggingface api token for downloading llama2
hf_token = "<input your Hugging Face token here>"

In [None]:
import torch
from transformers import BitsAndBytesConfig
from llama_index.core.prompts import PromptTemplate
from llama_index.llms.huggingface import HuggingFaceLLM

quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
)

llm = HuggingFaceLLM(
    model_name="meta-llama/Llama-2-7b-chat-hf",
    tokenizer_name="meta-llama/Llama-2-7b-chat-hf",
    query_wrapper_prompt=PromptTemplate("<s> [INST] {query_str} [/INST] "),
    context_window=3900,
    model_kwargs={"token": hf_token, "quantization_config": quantization_config},
    tokenizer_kwargs={"token": hf_token},
    device_map="auto",
)

Downloading (…)lve/main/config.json:   0%|          | 0.00/614 [00:00<?, ?B/s]

Downloading (…)fetensors.index.json:   0%|          | 0.00/26.8k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

Downloading (…)of-00002.safetensors:   0%|          | 0.00/9.98G [00:00<?, ?B/s]

Downloading (…)of-00002.safetensors:   0%|          | 0.00/3.50G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]



Downloading (…)neration_config.json:   0%|          | 0.00/188 [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/776 [00:00<?, ?B/s]

Downloading tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/414 [00:00<?, ?B/s]

In [None]:
from llama_index.core import Settings

Settings.llm = llm
Settings.embed_model = "local:BAAI/bge-small-en-v1.5"

Downloading (…)lve/main/config.json:   0%|          | 0.00/743 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/134M [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/394 [00:00<?, ?B/s]

Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

[nltk_data] Downloading package punkt to /tmp/llama_index...
[nltk_data]   Unzipping tokenizers/punkt.zip.


### Index Setup

In [None]:
from llama_index.core import VectorStoreIndex

vector_index = VectorStoreIndex.from_documents(documents)

In [None]:
from llama_index.core.indices import SummaryIndex

summary_index = SummaryIndex.from_documents(documents)

### Helpful Imports / Logging

In [None]:
from llama_index.core.response.notebook_utils import display_response

In [None]:
import logging
import sys

logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

## Basic Query Engine

### Compact (default)

In [None]:
query_engine = vector_index.as_query_engine(response_mode="compact")

response = query_engine.query("How do OpenAI and Meta differ on AI tools?")

display_response(response)

**`Final Response:`** OpenAI and Meta differ on AI tools in the following ways:
OpenAI:
* OpenAI is focused on developing AI tools that are useful for lots of things, such as generating text, images, and voices.
* OpenAI's latest updates include adding a voice feature to its large language model, allowing users to interact with it via voice.
* OpenAI's AI tools are designed to be more personalized, empathetic, and available.
* OpenAI's AI voice feature is still in its early stages and is only available to ChatGPT Plus subscribers.
* OpenAI's AI tools are not necessarily designed to entertain or create a synthetic social network.

Meta:
* Meta is focused on developing AI tools that can be used in its messaging apps, such as generating stickers and placing AI characters on every major surface of its products.
* Meta's AI tools are designed to create a partially synthetic social network, where AI characters will be placed on every major surface of its products.
* Meta's AI tools are designed to be more personalized, engaging, and entertaining.
* Meta's A

### Refine

In [None]:
query_engine = vector_index.as_query_engine(response_mode="refine")

response = query_engine.query("How do OpenAI and Meta differ on AI tools?")

display_response(response)

**`Final Response:`** Based on the new context provided, OpenAI and Meta differ in their approach to AI tools in the following ways:
1. Focus: OpenAI tends to focus on developing language models, while Meta is building LLMs and chatbots for its messaging apps with a focus on entertainment and personality-driven chatbots.
2. Personality: OpenAI's chatbot has a dry, sterile text unadorned by any hint of style, while Meta's chatbots are personality-driven and have 28 different personalities.
3. Availability: The voice feature of ChatGPT is only available to ChatGPT Plus subscribers, while Meta's chatbots are available in its messaging apps.
Overall, OpenAI and Meta have different approaches to developing AI tools, with OpenAI focusing on language models and efficiency, while Meta is more focused on entertainment and personality-driven chatbots.

### Tree Summarize

In [None]:
query_engine = vector_index.as_query_engine(response_mode="tree_summarize")

response = query_engine.query("How do OpenAI and Meta differ on AI tools?")

display_response(response)

**`Final Response:`** OpenAI and Meta have different approaches to AI tools. OpenAI focuses on developing AI language models that can be used for various tasks such as text generation, voice interactions, and image creation. They have also created chatbots that can be used as virtual assistants. On the other hand, Meta is focusing on creating AI characters that can be used in their messaging apps. They have partnered with celebrities to create personalities for their chatbots. OpenAI tends to present its products as productivity tools, while Meta is in the entertainment business and is building LLMs. OpenAI's AI tools are more focused on practical applications, while Meta's AI tools are more geared towards entertainment and personalization.

## Router Query Engine

In [None]:
from llama_index.core.tools import QueryEngineTool, ToolMetadata

vector_tool = QueryEngineTool(
    vector_index.as_query_engine(),
    metadata=ToolMetadata(
        name="vector_search",
        description="Useful for searching for specific facts."
    )
)

summary_tool = QueryEngineTool(
    summary_index.as_query_engine(response_mode="tree_summarize"),
    metadata=ToolMetadata(
        name="summary",
        description="Useful for summarizing an entire document."
    )
)

### Single Selector

In [None]:
from llama_index.core.query_engine import RouterQueryEngine

query_engine = RouterQueryEngine.from_defaults(
    [vector_tool, summary_tool],
    select_multi=False
)

response = query_engine.query("What was mentioned about Meta?")

display_response(response)

ValueError: ignored

### Multi Selector

In [None]:
from llama_index.core.query_engine import RouterQueryEngine

query_engine = RouterQueryEngine.from_defaults(
    [vector_tool, summary_tool],
    select_multi=True,
)

response = query_engine.query("What was mentioned about Meta? Summarize with any other companies mentioned in the entire document.")

display_response(response)

ValueError: ignored

## SubQuestion Query Engine

In [None]:
from llama_index.core.tools import QueryEngineTool, ToolMetadata

vector_tool = QueryEngineTool(
    vector_index.as_query_engine(),
    metadata=ToolMetadata(
        name="vector_search",
        description="Useful for searching for specific facts."
    )
)

summary_tool = QueryEngineTool(
    summary_index.as_query_engine(response_mode="tree_summarize"),
    metadata=ToolMetadata(
        name="summary",
        description="Useful for summarizing an entire document."
    )
)

In [None]:
import nest_asyncio
nest_asyncio.apply()

In [None]:
from llama_index.core.query_engine import SubQuestionQueryEngine

query_engine = SubQuestionQueryEngine.from_defaults(
    [vector_tool, summary_tool],
    verbose=True,
)

response = query_engine.query("What was mentioned about Meta? How Does it differ from how OpenAI is talked about?")

display_response(response)

OutputParserException: ignored

## SQL Query Engine

Here, we download and use a sample SQLite database with 11 tables, with various info about music, playlists, and customers. We will limit to a select few tables for this test.

In [None]:
import locale
locale.getpreferredencoding = lambda: "UTF-8"

In [None]:
!curl https://www.sqlitetutorial.net/wp-content/uploads/2018/03/chinook.zip -O /content/chinook.zip
!unzip /content/chinook.zip

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  298k  100  298k    0     0  1334k      0 --:--:-- --:--:-- --:--:-- 1338k
curl: (3) URL using bad/illegal format or missing URL
Archive:  /content/chinook.zip
  inflating: chinook.db              


In [None]:
from sqlalchemy import create_engine, MetaData, Table, Column, String, Integer, select, column

engine = create_engine("sqlite:////content/chinook.db")

In [None]:
from llama_index.core import SQLDatabase

sql_database = SQLDatabase(engine)

In [None]:
from llama_index.core.indices.struct_store import NLSQLTableQueryEngine

query_engine = NLSQLTableQueryEngine(
    sql_database=sql_database,
    tables=["albums", "tracks", "artists"],
)

In [None]:
response = query_engine.query("What are some albums?")

display_response(response)

NotImplementedError: ignored

In [None]:
response = query_engine.query("What are some artists? Limit it to 5.")

display_response(response)

NotImplementedError: ignored

This last query should be a more complex join

In [None]:
response = query_engine.query("What are some tracks from the artist AC/DC? Limit it to 3")

display_response(response)

NotImplementedError: ignored

In [None]:
print(response.metadata['sql_query'])

## Programs

Depending the LLM, you will have to test with either `OpenAIPydanticProgram` or `LLMTextCompletionProgram`

In [None]:
from typing import List
from pydantic import BaseModel

from llama_index.core.program import LLMTextCompletionProgram
from llama_index.program.openai import OpenAIPydanticProgram

class Song(BaseModel):
    """Data model for a song."""

    title: str
    length_seconds: int


class Album(BaseModel):
    """Data model for an album."""

    name: str
    artist: str
    songs: List[Song]

In [None]:
from llama_index.core.output_parsers import PydanticOutputParser

prompt_template_str = """\
Generate an example album, with an artist and a list of songs. \
Using the movie {movie_name} as inspiration.\
"""
program = LLMTextCompletionProgram.from_defaults(
    output_parser=PydanticOutputParser(Album),
    prompt_template_str=prompt_template_str,
    llm=llm,
    verbose=True,
)

This seems to error out only because it ran out of output token space. We could fix this by setting `max_new_tokens` on the constructor higher than the default of 256.

In [None]:
output = program(movie_name="The Shining")

ValidationError: ignored

In [None]:
print(output)

## Data Agent

Similar to programs, OpenAI LLMs will use `OpenAIAgent`, while other LLMs will use `ReActAgent`.

In [None]:
from llama_index.core.agent import ReActAgent
from llama_index.agent.openai import OpenAIAgent

agent = ReActAgent.from_tools(
    [vector_tool, summary_tool],
    llm=llm,
    verbose=True
)

In [None]:
response = agent.chat("Hello!")
print(response)

[1;3;38;5;200mResponse: Great, I'm happy to help you with your question! Can you please provide more context or clarify what you need help with? For example, what kind of information are you trying to find or what question are you trying to answer?
[0mGreat, I'm happy to help you with your question! Can you please provide more context or clarify what you need help with? For example, what kind of information are you trying to find or what question are you trying to answer?


Interesting tool inputs and responses, but I guess it works lol

In [None]:
response = agent.chat("What was mentioned about Meta? How Does it differ from how OpenAI is talked about?")
print(response)

[1;3;38;5;200mResponse: Great, let's get started! Based on your question, I understand that you want to know the difference between Meta and OpenAI. To answer this question, I will use the `summary` tool to summarize some relevant information about both Meta and OpenAI.
Action: summary
Action Input: {"text": "Meta and OpenAI are both AI research organizations, but they have some key differences. Meta was founded in 2015 by Mark Zuckerberg and is focused on developing AI technologies for Facebook and other Facebook-owned platforms. OpenAI, on the other hand, was founded in 2015 by a group of prominent AI researchers and is focused on developing AI technologies for a wide range of applications, including but not limited to natural language processing, computer vision, and robotics. Additionally, OpenAI is a non-profit organization, while Meta is a for-profit company. Both organizations have made significant contributions to the field of AI and have published numerous research papers and