# Chat with your data

### Disclaimer: We didn't find this solution to be optimal for this kind of data but we still want to share this solution with you. Later in this notebook we chunk the data into pieces and store it in a vector index. This would work fine for example a wikipedia text but not as good with tabular data.

### This file is for educational purpose, it explains the flow from data to chat.

### The easiest way to run this file is in Google Colab.
### https://colab.research.google.com/
### Log in with your Google-account and add this file in "Ladda Upp".

### This notebook requires a GPU run. Press on 'Körning' -> 'Ändra körningstyp' and choose T4 GPU.
### In Colab you can run all cells with Ctrl+F9 or press Shift+Enter to run chosen cell.

In [None]:
!pip install -qU \
  transformers==4.31.0 \
  sentence-transformers==2.2.2 \
  pinecone-client==2.2.2 \
  datasets==2.14.0 \
  accelerate==0.21.0 \
  einops==0.6.1 \
  langchain==0.0.240 \
  xformers==0.0.20 \
  bitsandbytes==0.41.0


In [None]:
from torch import cuda
from langchain.embeddings.huggingface import HuggingFaceEmbeddings

embed_model_id = 'sentence-transformers/all-MiniLM-L6-v2'

device = f'cuda:{cuda.current_device()}' if cuda.is_available() else 'cpu'

embed_model = HuggingFaceEmbeddings(
    model_name=embed_model_id,
    model_kwargs={'device': device},
    encode_kwargs={'device': device, 'batch_size': 32}
)

The database in this program is Pinecone.
You can create a free index:

https://www.pinecone.io/


Log in, navigate to your API-keys and create a new key. Paste the key and environment i the cell below.


In [None]:
import pinecone

pinecone.init(api_key="ENTER-KEY-HERE", environment="gcp-starter")
pinecone.list_indexes()

Different embeddingsmodel creates different sizes of vector dimensions. If you embedd your text and then check the lenght of an embedding you will see how many dimensions your model creates. See cell below.

Read more about it here:
https://docs.pinecone.io/docs/choosing-index-type-and-size

In [None]:
docs = [
    "Here you can test to embedd you text",
    "And find out how many dimensions the embedding is",
    "The amout of dimensions is an important metric for the vector index"
]

embeddings = embed_model.embed_documents(docs)

print(f"We have {len(embeddings)} doc embeddings, each with "
      f"a dimensionality of {len(embeddings[0])}.")





To use the Llama 2 model you have to make an request to Meta. There are a few steps to make.
1. Create a free account on HuggingFace: https://huggingface.co/
2. Search for this model in the search field: meta-llama/Llama-2-13b-chat-hf
3. Make a request to use the model.
4. You will then get an email with where you have to accept the terms to use the model
5. When its done you will have this text on the models page on Hugging Face:  Gated model You have been granted access to this model
6. On Hugging Face navigate to your profile -> settings --> Access Tokens. Copy the token and paste in cell below.


In [None]:
from torch import cuda, bfloat16
import transformers

model_id = 'meta-llama/Llama-2-13b-chat-hf' # Change this if you want another model from Huggingface

device = f'cuda:{cuda.current_device()}' if cuda.is_available() else 'cpu'

# set quantization configuration to load large model with less GPU memory
# this requires the `bitsandbytes` library
bnb_config = transformers.BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type='nf4',
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=bfloat16
)

# begin initializing HF items, need auth token for these
# Change the string to your token
hf_auth = 'ENTER-HUGGINGFACE-KEY-HERE'
model_config = transformers.AutoConfig.from_pretrained(
    model_id,
    use_auth_token=hf_auth
)


In [None]:
model = transformers.AutoModelForCausalLM.from_pretrained(
    model_id,
    trust_remote_code=True,
    config=model_config,
    quantization_config=bnb_config,
    device_map='auto',
    use_auth_token=hf_auth
)
model.eval()
print(f"Model loaded on {device}")

In [None]:
tokenizer = transformers.AutoTokenizer.from_pretrained(
    model_id,
    use_auth_token=hf_auth
)

In [None]:
generate_text = transformers.pipeline(
    model=model, tokenizer=tokenizer,
    return_full_text=True,  # langchain expects the full text
    task='text-generation',
    # we pass model parameters here too
    temperature=0.0,  # 'randomness' of outputs, 0.0 is the min and 1.0 the max
    max_new_tokens=512,  # mex number of tokens to generate in the output
    repetition_penalty=1.1  # without this output begins repeating
)

In [None]:
from langchain.llms import HuggingFacePipeline

llm = HuggingFacePipeline(pipeline=generate_text)

Here you instantiate the database

In [None]:
import time

index_name = 'index'

if index_name not in pinecone.list_indexes():
    pinecone.create_index(
        index_name,
        dimension=len(embeddings[0]),
        metric='cosine'
    )
    # wait for index to finish initialization
    while not pinecone.describe_index(index_name).status['ready']:
        time.sleep(1)

Run cell below to see the stats of your index.

In [None]:
index = pinecone.Index(index_name)
index.describe_index_stats()

Time to insert data. In the cell below is the class for a DataFrameLoader.

In [None]:
from typing import Any, Iterator, List

import pandas as pd

from langchain.docstore.document import Document
from langchain.document_loaders.base import BaseLoader


class DataFrameLoader2(BaseLoader):
    """Load from a Pandas DataFrame."""

    def __init__(self, data_frame: Any, ewo_wo_no_column: str = "EWO WO No"):
        """Initialize with a Pandas DataFrame.

        Args:
            data_frame: Pandas DataFrame.
            ewo_wo_no_column: Name of the column to include as metadata. Defaults to "EWO WO No".
        """
        if not isinstance(data_frame, pd.DataFrame):
            raise ValueError(
                f"Expected data_frame to be a pd.DataFrame, got {type(data_frame)}"
            )
        self.data_frame = data_frame
        self.ewo_wo_no_column = ewo_wo_no_column

    def lazy_load(self) -> Iterator[Document]:
        """Lazy load records from the Pandas DataFrame."""

        for _, row in self.data_frame.iterrows():
            text = "\n".join([f"{key}: {value}" for key, value in row.items()])
            metadata = row.to_dict()
            ewo_wo_no_value = row[self.ewo_wo_no_column]
            metadata["EWO WO No"] = ewo_wo_no_value
            yield Document(page_content=text, metadata=metadata, column_names=list(self.data_frame.columns))

    def load(self) -> List[Document]:
        """Load full Pandas DataFrame."""
        return list(self.lazy_load())





In [None]:
import numpy as np

# Add the file to your folder and change the file path in the code.

data = pd.read_json(r"/content/data.json", orient=True)




### There are many types of loaders that create the data into a Document. A Document consist of page_content and metadata
### In the different types of loaders you can specify which data/column is page_content and which is metadata.

### Read all about it here https://python.langchain.com/docs/modules/data_connection/document_loaders

All the columns are used as the data. We choose to do this so it would be more easier to use any machine data. All the missing values will be representated by 'null'. This is not necessary but it will help to ai if the data is consistent.

In [None]:
data.replace({np.nan: 'null', 'NA': 'null', 'N/A': 'null', 'Missing': 'null'}, inplace=True)
loader = DataFrameLoader2(data)


Normally you split your documents into chunks. You choose a size and how many characters to overlap between each chunk. You want about 10-20% of the chunk to overlap.

In [None]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=600, chunk_overlap=75)
splits = text_splitter.split_documents(loader.load())

A Pinecone object needs the index, the model to embedd the text and the field in the vector index where we store the texts

In [None]:
from langchain.vectorstores import Pinecone

text_field = 'text'  # field in metadata that contains text content

vectorstore = Pinecone(
    index, embed_model.embed_query, text_field
)

In the cell below the data/texts upserts to the vector index in batches of 32

In [None]:
from itertools import chain

batch_size = 32
vectors = []

for x, i in enumerate(splits):
    ids = str(x)  # Create a unique ID for each vector
    texts = [str(i.page_content)]
    embeddings = list(chain.from_iterable(embed_model.embed_documents(texts)))

    vector_data = {
        'id': ids,
        'values': embeddings,
        'metadata': {
            'text': i.page_content,

        }
    }

    vectors.append(vector_data)

    # Check if the batch size is reached, then upsert
    if len(vectors) == batch_size:
        index.upsert(vectors)
        vectors = []  # Reset the vectors for the next batch

# Add any remaining vectors (less than a full batch) to Pinecone
if vectors:
    index.upsert(vectors)

Run this code again and see you stats

In [None]:
index = pinecone.Index(index_name)
index.describe_index_stats()

In [None]:
# You can change the prompt but the {question} and {context} is key values that the models needs. Removing these will cause an error.


from langchain.prompts import PromptTemplate
prompt_template = """I will provide you with historical data regarding previous work orders (EWO) for a machine. By examining the historical data, you should respond by describing similar issues in the historical data and how those issues were resolved. Use all the columns and explain how a similar problem was previously resolved.

Before you do that, ask for a brief description of the issue that we will analyze using the historical data you've received.

This description may be concise and may not exactly match any previous ones in the history. Nevertheless, you should search the history for similar problems or solutions resembling the new issue description.

Especially, check what's mentioned in EWO WO Directive, EWO WO Work Done, and EWO WO Work Details. Also, think broadly because there might be misspellings or similar words with the same meaning.

Reply with the number of work orders available. Also, mention how many of them are similar to the issue we're currently analyzing.

Then, proceed with details about all of them. This includes all analyses you can perform about what was done, the time taken, parts replaced, downtime, etc. Time analyses are crucial. Specify the time taken to resolve the issue, the duration from discovery to operation, and, most importantly, thoroughly analyze the root cause. Include all time-related information.

If you encounter any issues while reading the file, do not mention it, just continue. Do not describe all the steps you take. The information you are getting is EWO and you will analyzie it with as much information as needed. The EWO WO ID is the link between the texts. If I ask you for date or time, this is the correct format to look for: YYYY-MM-DD HH:MI:SS. Give the answer about the date if someone asks. If you don't know the answer, just say that you don't know, don't try to make up an answer. If you encounter  some technical terms may not have direct translations in English write the whole answer in english.
 Don't answer if the chat_history is empty.
{context}


Question: {question}

Answer in Swedish::"""
PROMPT = PromptTemplate(
    template=prompt_template, input_variables=["context", "question"]
)

Create a memory for the conversation

In [None]:
from langchain.memory import ConversationBufferMemory

memory = ConversationBufferMemory(memory_key="chat_history")

### Time for the final step - putting all the pieces together

In [None]:
from langchain.chains import RetrievalQA

llama_pipeline = RetrievalQA.from_chain_type(
    llm=llm, chain_type='stuff',
    chain_type_kwargs = {"prompt": PROMPT},
    memory = memory,
    retriever=vectorstore.as_retriever(search_type="mmr", search_kwargs={'k': 5, 'fetch_k': 50}, return_source_documents=True)
)

Now you can run your question/query through the pipeline.