In [19]:
! pip install --quiet chromadb tiktoken openai langchain-community langchain-together langchain-mistralai langchain-openai

[0m

## Data Loading

First, we load the [Mixtral paper](https://arxiv.org/pdf/2401.04088.pdf).

In [1]:
from langchain_community.document_loaders import OnlinePDFLoader
loader = OnlinePDFLoader("https://arxiv.org/pdf/2401.04088.pdf")
data = loader.load()

## Splitting

Next, we split the based upon token count [using tiktoken encoder](https://python.langchain.com/docs/modules/data_connection/document_transformers/split_by_token#tiktoken).

We use [recursive](https://python.langchain.com/docs/modules/data_connection/document_transformers/recursive_text_splitter) splitting: 

> This has the effect of trying to keep all paragraphs (and then sentences, and then words) together.

In [2]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(chunk_size=2000,chunk_overlap=100)
all_splits = text_splitter.split_documents(data)

## Indexing

First, set up an account with Mistral to access their embedding model [here](https://console.mistral.ai/users/api-keys/).

We'll embed [using Mistral](https://python.langchain.com/docs/integrations/text_embedding/mistralai) and index the chunks using a local vectorstore, [Chroma](https://python.langchain.com/docs/integrations/vectorstores/chroma).

In [15]:
import os
from langchain_mistralai import MistralAIEmbeddings
from langchain_community.vectorstores import Chroma
# Try Mistral embd
mistral_api_key = os.environ.get("MISTRAL_API_KEY")
embedding = MistralAIEmbeddings(mistral_api_key=mistral_api_key)
# Try OpenAI embd
from langchain_openai import OpenAIEmbeddings
embedding = OpenAIEmbeddings()

In [16]:
# Build vectorstore
vectorstore = Chroma.from_documents(
    documents=all_splits,
    collection_name="rag-chroma",
    embedding=embedding,
)
retriever = vectorstore.as_retriever()

## Prompt

Promt we'll use for RAG.

In [8]:
from langchain_core.prompts import ChatPromptTemplate
template = """Answer the question based only on the following context:
{context}

Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)

## RAG chain

Compose our retriever, prompt, and LLM.

For LLM, we can using [Together](https://python.langchain.com/docs/integrations/llms/together) or [Mistral](https://python.langchain.com/docs/integrations/chat/mistralai) [API](https://docs.mistral.ai/api/).

In [11]:
from langchain_core.output_parsers import StrOutputParser
from langchain_core.pydantic_v1 import BaseModel
from langchain_core.runnables import RunnableParallel, RunnablePassthrough

from langchain_mistralai.chat_models import ChatMistralAI
llm = ChatMistralAI(model="Mistral-7B-v0.2",
                    mistral_api_key=mistral_api_key)

from langchain_community.llms import Together
llm = Together(
    model="mistralai/Mixtral-8x7B-Instruct-v0.1",
    temperature=0.0,
    max_tokens=2000,
    top_k=1,
)

# RAG chain
chain = (
    RunnableParallel({"context": retriever, 
                      "question": RunnablePassthrough()})
    | prompt
    | llm
    | StrOutputParser()
)

In [12]:
chain.invoke("What is the Mixtral bias on the BBQ benchmark?")

'\nAnswer: The Mixtral bias on the BBQ benchmark is 56.0%.'

## Optional: Use LangSmith for tracing.

Compose our retriever, prompt, and LLM.

Set up LangSmith [as discussed here](https://python.langchain.com/docs/langsmith/walkthrough): 

* Trace: https://smith.langchain.com/public/6d3f07ef-25b8-4392-ba3a-d5ddfc26d980/r