### Wednesday, January 10, 2024

Did some cleanup of /root/.cache/huggingface/hub and wanted to make sure this still runs.

### Monday, January 8, 2024

I'm trying to run this again on the hfpt_Dec14 container ... yup! It still runs! Nice!

### Tueasday, January 2, 2024

Run this again to ensure it still runs ... Yup! It still runs! Nice!

### Wednesday, December 27, 2023

I'm trying to run this again on the hfpt_Dec14 container using 'mistralai/Mistral-7B-Instruct-v0.2'

1) pip install playwright
2) playwright install-deps (again, this installed a ton of stuff!)
3) playwright install (this install chromium, firefox)
4) pip install html2text

Nice! Now it all runs!

### Thursday, December 14, 2023

Gonna give this another go, to see if things now magically work ...

I manually created this notebook from the medium blog post, but I just saw the author DOES include a [notebook link](https://github.com/madhavthaker1/llm/blob/main/rag/e2e_rag.ipynb) ... and the blog post does NOT show all of the code. I added in the missing code, and now this all runs! Nice!

### Thursday, November 30, 2023

Attempting to run this again ...

docker container start hfpt_Oct28

### Sunday, November 26, 2023

[Build your own RAG with Mistral-7B and LangChain](https://medium.com/@thakermadhav/build-your-own-rag-with-mistral-7b-and-langchain-97d0c92fa146)

In [1]:
# !pip install -q torch datasets
# !pip install -q accelerate==0.21.0 peft==0.4.0 bitsandbytes==0.40.2 transformers==4.31.0 trl==0.4.7

In [1]:
import os
import torch
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    pipeline
)
from datasets import load_dataset
from peft import LoraConfig, PeftModel

from langchain.text_splitter import CharacterTextSplitter
from langchain.document_transformers import Html2TextTransformer
from langchain.document_loaders import AsyncChromiumLoader

from langchain.embeddings.huggingface import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS

from langchain.prompts import PromptTemplate
from langchain.schema.runnable import RunnablePassthrough
from langchain.llms import HuggingFacePipeline
from langchain.chains import LLMChain

In [2]:
# !ls /home/rob/Data2/huggingface/transformers
!ls /root/.cache/huggingface/hub

models--NousResearch--Llama-2-7b-chat-hf     version.txt
models--mistralai--Mistral-7B-Instruct-v0.2  version_diffusers_cache.txt


In [None]:
# docker cp /home/rob/Data3/huggingface/transformers/models--mistralai--Mistral-7B-Instruct-v0.2 c8324b70601d:///root/.cache/huggingface/hub
# Successfully copied 14.5GB to c8324b70601d:///root/.cache/huggingface/hub

### Tokenizer

In [3]:
import transformers

model_name='mistralai/Mistral-7B-Instruct-v0.1'

model_name='mistralai/Mistral-7B-Instruct-v0.2'

model_config = transformers.AutoConfig.from_pretrained(
    model_name,
)

In [4]:
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

### Bits and Bytes parameters

In [5]:
# Activate 4-bit precision base model loading
use_4bit = True

# Compute dtype for 4-bit base models
bnb_4bit_compute_dtype = "float16"

# Quantization type (fp4 or nf4)
bnb_4bit_quant_type = "nf4"

# Activate nested quantization for 4-bit base models (double quantization)
use_nested_quant = False

### Set up quantization config

In [6]:
compute_dtype = getattr(torch, bnb_4bit_compute_dtype)

In [7]:
bnb_config = BitsAndBytesConfig(
    load_in_4bit=use_4bit,
    bnb_4bit_quant_type=bnb_4bit_quant_type,
    bnb_4bit_compute_dtype=compute_dtype,
    bnb_4bit_use_double_quant=use_nested_quant,
)

In [8]:
# Check GPU compatibility with bfloat16
if compute_dtype == torch.float16 and use_4bit:
    major, _ = torch.cuda.get_device_capability()
    if major >= 8:
        print("=" * 80)
        print("Your GPU supports bfloat16: accelerate training with bf16=True")
        print("=" * 80)

Your GPU supports bfloat16: accelerate training with bf16=True


### Load pre-trained config

In [9]:
# This loads the model directly into the GPU
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
)

# 8.7s   5392 MiB VRAM
# 10.3s
# 15.3s

Downloading shards:   0%|          | 0/3 [00:00<?, ?it/s]

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

### Count number of trainable parameters

In [10]:
def print_number_of_trainable_model_parameters(model):
    trainable_model_params = 0
    all_model_params = 0
    for _, param in model.named_parameters():
        all_model_params += param.numel()
        if param.requires_grad:
            trainable_model_params += param.numel()
    return f"trainable model parameters: {trainable_model_params}\nall model parameters: {all_model_params}\npercentage of trainable model parameters: {100 * trainable_model_params / all_model_params:.2f}%"


In [11]:
print(print_number_of_trainable_model_parameters(model))


trainable model parameters: 262410240
all model parameters: 3752071168
percentage of trainable model parameters: 6.99%


### Build Mistral text generation pipeline

In [12]:
text_generation_pipeline = transformers.pipeline(
    model=model,
    tokenizer=tokenizer,
    task="text-generation",
    temperature=0.2,
    repetition_penalty=1.1,
    return_full_text=True,
    max_new_tokens=1000,
)


In [13]:
mistral_llm = HuggingFacePipeline(pipeline=text_generation_pipeline)

### Load and chunk documents. Load chunked documents into FAISS index

In [14]:
import nest_asyncio
nest_asyncio.apply()

In [15]:
# Articles to index
articles = ["https://www.fantasypros.com/2023/11/rival-fantasy-nfl-week-10/",
            "https://www.fantasypros.com/2023/11/5-stats-to-know-before-setting-your-fantasy-lineup-week-10/",
            "https://www.fantasypros.com/2023/11/nfl-week-10-sleeper-picks-player-predictions-2023/",
            "https://www.fantasypros.com/2023/11/nfl-dfs-week-10-stacking-advice-picks-2023-fantasy-football/",
            "https://www.fantasypros.com/2023/11/players-to-buy-low-sell-high-trade-advice-2023-fantasy-football/"]


In [16]:
# Firing this for the first time produced an error ...
# ImportError: playwright is required for AsyncChromiumLoader. Please install it with `pip install playwright`.
# pip install playwright


# Scrapes the blogs above
loader = AsyncChromiumLoader(articles)


In [17]:
# This does not run ... sigh.
# The error message suggests running the following command from the terminal
# playwright install-deps
# I did this and it appeared to install a ton of stuff ...
# OK Nice! This now runs!

# This does the work of pulling the docs and loading them ... 
docs = loader.load()

# 1m 41.0s

In [18]:
# Running this for the first time generates an error ... solution is to ..
# pip install html2text

# Converts HTML to plain text 
html2text = Html2TextTransformer()
docs_transformed = html2text.transform_documents(docs)

In [19]:
# Chunk text
text_splitter = CharacterTextSplitter(chunk_size=100, 
                                      chunk_overlap=0)
chunked_documents = text_splitter.split_documents(docs_transformed)

Created a chunk of size 148, which is longer than the specified 100
Created a chunk of size 4288, which is longer than the specified 100
Created a chunk of size 131, which is longer than the specified 100
Created a chunk of size 230, which is longer than the specified 100
Created a chunk of size 500, which is longer than the specified 100
Created a chunk of size 207, which is longer than the specified 100
Created a chunk of size 365, which is longer than the specified 100
Created a chunk of size 312, which is longer than the specified 100
Created a chunk of size 515, which is longer than the specified 100
Created a chunk of size 584, which is longer than the specified 100
Created a chunk of size 1119, which is longer than the specified 100
Created a chunk of size 257, which is longer than the specified 100
Created a chunk of size 103, which is longer than the specified 100
Created a chunk of size 136, which is longer than the specified 100
Created a chunk of size 230, which is longer t

In [20]:
!ls /root/.cache/torch/sentence_transformers

BAAI_bge-large-en-v1.5
BAAI_bge-small-en-v1.5
hkunlp_instructor-xl
sentence-transformers_all-MiniLM-L6-v2
sentence-transformers_all-mpnet-base-v2
sentence-transformers_clip-ViT-B-32
sentence-transformers_multi-qa-mpnet-base-dot-v1
thenlper_gte-large


In [21]:
# Load chunked documents into the FAISS index
db = FAISS.from_documents(chunked_documents, 
                          HuggingFaceEmbeddings(model_name='sentence-transformers/all-mpnet-base-v2'))


In [None]:
# docker cp c8324b70601d://root/.cache/torch/sentence_transformers/sentence-transformers_all-mpnet-base-v2 /home/rob/Data3/sentence_transformers
# Successfully copied 439MB to c8324b70601d:///root/.cache/huggingface/hub

In [22]:
retriever = db.as_retriever()

### Create PromptTemplate and LLMChain

In [23]:
prompt_template = """
### [INST] Instruction: Answer the question based on your fantasy football knowledge. Here is context to help:

{context}

### QUESTION:
{question} [/INST]
 """

# Create prompt from prompt template 
prompt = PromptTemplate(
    input_variables=["context", "question"],
    template=prompt_template,
)

# Create llm chain 
llm_chain = LLMChain(llm=mistral_llm, prompt=prompt)

### Build RAG Chain

In [24]:
rag_chain = ( 
 {"context": retriever, "question": RunnablePassthrough()}
    | llm_chain
)

In [25]:
rag_chain.invoke("Should I start Gibbs next week for fantasy?")

# 9.3s
# 6690 MiB VRAM

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


{'context': [Document(page_content='could start cutting into his workload. Furthermore, his rest of the season\nschedule isn’t fantasy-friendly. Try to flip Edwards and a WR3 for Kenneth\nWalker or Tony Pollard', metadata={'source': 'https://www.fantasypros.com/2023/11/players-to-buy-low-sell-high-trade-advice-2023-fantasy-football/'}),
  Document(page_content='“ **Gus Edwards** has been on fire lately. He is the RB1 over the past three\nweeks, averaging 22.2 half-point PPR fantasy points and two rushing touchdowns\nper game. However, over 54% of his fantasy production came from the six\nrushing touchdowns. Meanwhile, the veteran averaged only 6.5 fantasy points\nper game over the first six contests. He had more than six fantasy points only\nonce, in the Week 2 matchup where Edwards found the end zone. The veteran\nrunning back is a touchdown-or-bust player, and Keaton Mitchell', metadata={'source': 'https://www.fantasypros.com/2023/11/players-to-buy-low-sell-high-trade-advice-2023-fan