### Chat with your Logs with Llama3 and Ollama

Adapted from original Code by Sascha Retter (https://blog.retter.jetzt/)

##### Chat with local Llama3 Model via Ollama in KNIME Analytics Platform — Also extract Logs into structured JSON Files
https://medium.com/p/aca61e4a690a

##### Ask Questions from your CSV with an Open Source LLM, LangChain & a Vector DB
https://www.tetranyde.com/blog/langchain-vectordb

##### Document Loaders in LangChain
https://medium.com/@varsha.rainer/document-loaders-in-langchain-7c2db9851123

##### Unleashing Conversational Power: A Guide to Building Dynamic Chat Applications with LangChain, Qdrant, and Ollama (or OpenAI’s GPT-3.5 Turbo)
https://medium.com/@ingridwickstevens/langchain-chat-with-your-data-qdrant-ollama-openai-913020ec504b


In [1]:
import os

import pandas as pd

# Document Loaders in LangChain
# https://medium.com/@varsha.rainer/document-loaders-in-langchain-7c2db9851123
from langchain.document_loaders import CSVLoader

from langchain.document_loaders import WebBaseLoader
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

from langchain.vectorstores import Chroma
from langchain.embeddings import OllamaEmbeddings

from langchain.llms import Ollama
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler

from langchain.chains import RetrievalQA

model = "llama3:instruct" # model needs already be available, already pulled with for example 'ollama run llama3:instruct'

In [2]:
question = f"What would be the best set of JSON columns to extract data from these Logfiles in a systematic way? Can you write a prompt?"

In [16]:
# you can omit the cource_column
# https://medium.com/@varsha.rainer/document-loaders-in-langchain-7c2db9851123
loader = CSVLoader("../data/sample_logs.csv", source_column="Prompt")
documents = loader.load()

In [18]:
type(documents)

list

In [19]:
# Create embeddings
oembed = OllamaEmbeddings(base_url="http://localhost:11434", model=model)

In [20]:
chroma_db = Chroma.from_documents(
        documents,  embedding=oembed, persist_directory="../data/vectorstore/chroma_logfiles"
    )

In [21]:
type(chroma_db)

langchain_community.vectorstores.chroma.Chroma

In [22]:
# load from disk
chroma_db = Chroma(persist_directory="../data/vectorstore/chroma_logfiles", embedding_function=oembed)

In [23]:
type(chroma_db)

langchain_community.vectorstores.chroma.Chroma

In [14]:
# LLM
llm = Ollama(model=model,
            verbose=True,
            callback_manager=CallbackManager([StreamingStdOutCallbackHandler()]))

print(f"Loaded LLM model {llm.model}")

Loaded LLM model llama3:instruct


In [15]:
# Initialize the RetrievalQA chain with the vector store retriever
qa_chain = RetrievalQA.from_chain_type(
    llm,
    retriever=chroma_db.as_retriever(),
)

# Use the 'invoke' method to handle the query instead of '__call__'
result = qa_chain.invoke({"query": question})

Based on the provided log files, I can suggest the following set of JSON columns that could be extracted in a systematic way:

1. `timestamp`: Extract the timestamp (e.g., "03/22 08:53:38") from each log entry.
2. `log_level`: Determine the log level (e.g., "TRACE") and extract it as a separate column.
3. `component`: Identify the component or module that generated the log message (e.g., "router_forward_getOI", "rsvp_flow_stateMachine", etc.) and extract it as a separate column.
4. `source_address`: Extract the source IP address (e.g., "9.67.116.98") from each log entry.
5. `out_inf`: Extract any relevant information about the output interface or network interface (e.g., "9.67.116.98").
6. `gateway`: Extract the gateway IP address (e.g., "0.0.0.0") if present in the log entries.
7. `event_type`: Identify the type of event that triggered the log message (e.g., "reentering state RESVED", "received event from RAW-IP on interface").

Here's a sample JSON schema based on these extracted col

In [None]:
# Print the result
print(result)