<a href="https://colab.research.google.com/github/kfahn22/Colab_notebooks/blob/main/Mistral_7b_instruct_Coding_Train.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Documentation from LlamaIndex on [using LLMs](https://docs.llamaindex.ai/en/stable/module_guides/models/llms.html)

Notebook from [here](https://colab.research.google.com/drive/1ZAdrabTJmZ_etDp10rjij_zME2Q3umAQ?usp=sharing)

Note: Responses from local models can be quite slow, especially with 8-bit quantization.

With 4bit quantization, `mistralai/Mistral-7B-Instruct-v0.1` uses about 12GB of VRAM and 8.5GB of RAM. I used a T4-High RAM instance for this notebook.

In [1]:
!pip install huggingface_hub



In [2]:
from huggingface_hub import notebook_login

notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [3]:
!pip install git+https://github.com/run-llama/llama_index

Collecting git+https://github.com/run-llama/llama_index
  Cloning https://github.com/run-llama/llama_index to /tmp/pip-req-build-og821e00
  Running command git clone --filter=blob:none --quiet https://github.com/run-llama/llama_index /tmp/pip-req-build-og821e00
  Resolved https://github.com/run-llama/llama_index to commit e5b163daff3b9cfc3fe9396e8f48e1fced66f211
  Running command git submodule update --init --recursive -q
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting dataclasses-json (from llama-index==0.9.46)
  Downloading dataclasses_json-0.6.4-py3-none-any.whl (28 kB)
Collecting deprecated>=1.2.9.3 (from llama-index==0.9.46)
  Downloading Deprecated-1.2.14-py2.py3-none-any.whl (9.6 kB)
Collecting dirtyjson<2.0.0,>=1.0.8 (from llama-index==0.9.46)
  Downloading dirtyjson-1.0.8-py3-none-any.whl (25 kB)
Collecting httpx (from llama-index==0.9.46)
  Dow

In [4]:
!pip install transformers accelerate bitsandbytes

Collecting accelerate
  Downloading accelerate-0.26.1-py3-none-any.whl (270 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m270.9/270.9 kB[0m [31m5.7 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting bitsandbytes
  Downloading bitsandbytes-0.42.0-py3-none-any.whl (105.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m105.0/105.0 MB[0m [31m13.4 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: bitsandbytes, accelerate
Successfully installed accelerate-0.26.1 bitsandbytes-0.42.0


## Setup

### Data

In [14]:
from llama_index import download_loader

BeautifulSoupWebReader = download_loader("BeautifulSoupWebReader")

loader = BeautifulSoupWebReader()

challenges = ["about", "challenges", "tracks", "showcase","challenges/1-starfield", "challenges/2-menger-sponge", "challenges/168-the-mandelbulb", "challenges/178-climate-spiral", "challenges/179-wolfram-ca", "challenges/178-climate-spiral", "challenges/180-falling-sand"]
urls = []
for challenge in challenges:
    urls.append(f"https://thecodingtrain.com/{challenge}")


#documents = loader.load_data(urls=['https://thecodingtrain.com/'])

documents = loader.load_data(urls)

### LLM

This should run on a T4 instance on the free tier

In [6]:
import torch
from transformers import BitsAndBytesConfig
from llama_index.prompts import PromptTemplate
from llama_index.llms import HuggingFaceLLM

quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
)


llm = HuggingFaceLLM(
    model_name="mistralai/Mistral-7B-Instruct-v0.1",
    tokenizer_name="mistralai/Mistral-7B-Instruct-v0.1",
    query_wrapper_prompt=PromptTemplate("<s>[INST] {query_str} [/INST] </s>\n"),
    context_window=3900,
    max_new_tokens=256,
    model_kwargs={"quantization_config": quantization_config},
    # tokenizer_kwargs={},
    generate_kwargs={"temperature": 0.2, "top_k": 5, "top_p": 0.95},
    device_map="auto",
)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/25.1k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/9.94G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/4.54G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/1.47k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/493k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.80M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/72.0 [00:00<?, ?B/s]

In [15]:
from llama_index import ServiceContext

service_context = ServiceContext.from_defaults(llm=llm, embed_model="local:BAAI/bge-small-en-v1.5")

### Index Setup

In [16]:
from llama_index import VectorStoreIndex

vector_index = VectorStoreIndex.from_documents(documents, service_context=service_context)

In [17]:
from llama_index import SummaryIndex

summary_index = SummaryIndex.from_documents(documents, service_context=service_context)

### Helpful Imports / Logging

In [18]:
from llama_index.response.notebook_utils import display_response

In [19]:
import logging
import sys

logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

## Basic Query Engine

### Compact (default)

In [20]:
query_engine = vector_index.as_query_engine(response_mode="compact")

response = query_engine.query("What is the Coding Train?")

display_response(response)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


**`Final Response:`** The Coding Train is a community-focused educational platform that teaches computer programming to beginners and curious individuals. It was founded by Dan Shiffman in 2015 and offers a variety of resources, including video tutorials, live streaming events, and a social media presence. The platform's main focus is on teaching the fundamentals of computer programming, but it also covers a range of topics and languages, including JavaScript, p5.js, and Git. The Coding Train's goal is to make coding accessible and fun for everyone, regardless of their prior experience.

### Refine

In [21]:
query_engine = vector_index.as_query_engine(response_mode="refine")

response = query_engine.query("what is the featured challenge")

display_response(response)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


**`Final Response:`** The featured challenge is #180 - Falling Sand.

In [24]:
query_engine = vector_index.as_query_engine(response_mode="refine")

response = query_engine.query("Looking at the Passenger Showcase, which showcases use Wolphram CA?")

display_response(response)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


**`Final Response:`** Looking at the Passenger Showcase, which showcases use Wolfram CA?

The following showcases use Wolfram CA:

* From Wolfram CA Bees, Parametres/arguments
* From Wolfram CA Rainbow colored falling dots
* From Wolfram CA Rules Switching Infinite Canvas Wolfram CA
* From Wolfram CA Climate Spiral with only JS Chaining
* From Wolfram CA Pseudo-Islamic tiling
* From Wolfram CA Hexagonal Maze Generator
* From Wolfram CA Menger Sponge Fractal Wolfram Alpha CA Double Rule
* From Wolfram CA Interactive starfield in full 3D!
* From Wolfram CA Wolfram Elementary CA RINGS!!
* From Wolfram CA Hexagonal Maze Generator
* From Wolfram CA Maze Generator Follower Clock

Note: The list may not be exhaustive and there may be other showcases that use Wolfram CA.

### Tree Summarize

In [27]:
query_engine = vector_index.as_query_engine(response_mode="tree_summarize")

response = query_engine.query("Looking at the Passenger Showcase, which challenge has the most showcases")

display_response(response)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


**`Final Response:`** The Passenger Showcase on The Coding Train website features various projects created by viewers. To determine which challenge has the most showcases, we need to count the number of projects associated with each challenge and compare them.

Here's a Python script that can help us with this task:
```python
import requests
from bs4 import BeautifulSoup

# URL of the Passenger Showcase page
url = "https://thecodingtrain.com/showcase"

# Send a GET request to the URL and parse the HTML content
response = requests.get(url)
soup = BeautifulSoup(response.content, "html.parser")

# Find all the challenge sections on the page
challenge_sections = soup.find_all("section", class_="challenge")

# Initialize a dictionary to store the count of showcases for each challenge
showcase_counts = {}

# Iterate through each challenge section and count the number of showcases
for challenge_section in challenge_sections:
    # Find all the project sections within the challenge section
    project_sections = challenge_section.find_all("section",

## Router Query Engine

In [29]:
from llama_index.tools import QueryEngineTool, ToolMetadata

vector_tool = QueryEngineTool(
    vector_index.as_query_engine(),
    metadata=ToolMetadata(
        name="vector_search",
        description="Useful for searching for specific facts."
    )
)

summary_tool = QueryEngineTool(
    summary_index.as_query_engine(response_mode="tree_summarize"),
    metadata=ToolMetadata(
        name="summary",
        description="Useful for summarizing an entire document."
    )
)

### Single Selector

In [30]:
from llama_index.query_engine import RouterQueryEngine

query_engine = RouterQueryEngine.from_defaults(
    [vector_tool, summary_tool],
    service_context=service_context,
    select_multi=False
)

response = query_engine.query("what is the climate spiral")

display_response(response)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


**`Final Response:`** The climate spiral is a visual representation of the changing temperatures over time, illustrating the urgent need to address climate change. It was originally designed by the climate scientist Ed Hawkins and is a graphical depiction of global temperature anomalies. The spiral shows how temperatures have increased over time, with the outermost part of the spiral representing the earliest time period and the innermost part representing the most recent time period. The spiral is often used to communicate the severity of climate change and the need for immediate action to mitigate its effects.

### Multi Selector

In [34]:
from llama_index.query_engine import RouterQueryEngine

query_engine = RouterQueryEngine.from_defaults(
    [vector_tool, summary_tool],
    service_context=service_context,
    select_multi=True,
)

response = query_engine.query("Looking at the related challenges, which challenge is similar to Falling Sand?")

display_response(response)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


**`Final Response:`** Looking at the related challenges, which challenge is similar to Falling Sand?

There are several challenges on The Coding Train's website that are similar to Falling Sand. One such challenge is Wolfram CA, which involves coding a p5.js visualization of the Wolfram Elementary Cellular Automaton. Like Falling Sand, Wolfram CA involves creating a simulation using a grid of pixels and simple rules. Another challenge that is similar to Falling Sand is Climate Spiral, which involves creating a visual representation of changing temperatures over time using p5.js and temperature data. Both challenges involve creating simulations using a grid of pixels and simple rules, similar to Falling Sand.

## SubQuestion Query Engine

In [None]:
from llama_index.tools import QueryEngineTool, ToolMetadata

vector_tool = QueryEngineTool(
    vector_index.as_query_engine(),
    metadata=ToolMetadata(
        name="vector_search",
        description="Useful for searching for specific facts."
    )
)

summary_tool = QueryEngineTool(
    summary_index.as_query_engine(response_mode="tree_summarize"),
    metadata=ToolMetadata(
        name="summary",
        description="Useful for summarizing an entire document."
    )
)

In [None]:
import nest_asyncio
nest_asyncio.apply()

In [None]:
from llama_index.query_engine import SubQuestionQueryEngine

query_engine = SubQuestionQueryEngine.from_defaults(
    [vector_tool, summary_tool],
    service_context=service_context,
    verbose=True,
)

response = query_engine.query("")

display_response(response)

## Note I haven't gotten this to really work yet.

## Data Agent

Similar to programs, OpenAI LLMs will use `OpenAIAgent`, while other LLMs will use `ReActAgent`.

TODO:  Figure out what vector tool, summary tool should be?

In [None]:
from llama_index.agent import OpenAIAgent, ReActAgent

agent = ReActAgent.from_tools(
    [vector_tool, summary_tool],
    llm=llm,
    verbose=True
)

In [None]:
response = agent.chat("Hello!")
print(response)

In [None]:
response = agent.chat("")
print(response)