Pinecone and LLamaIndex incorporation

In [7]:
import os
from pinecone.grpc import PineconeGRPC as Pinecone
from pinecone import ServerlessSpec
from llama_index.core import SimpleDirectoryReader, VectorStoreIndex
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.llms.ollama import Ollama
from llama_index.core.settings import Settings
from llama_index.vector_stores.pinecone import PineconeVectorStore

In [8]:
pc = Pinecone(api_key="pcsk_3AJiBH_UPECMeGPwEbn4s5eamzJBPhgLZBXn2EyTimoEXBW1t9ZnBJ63PHgHvbHWH2TxPb")

index_name = "test"

# Check if index exists, if not, create it
existing_indexes = [index_info.name for index_info in pc.list_indexes()]
if index_name not in existing_indexes:
    pc.create_index(
        name=index_name,
        dimension=384,
        metric="cosine",
        spec=ServerlessSpec(cloud="aws", region="us-east-1"),
    )

# Connect to Pinecone index
pinecone_index = pc.Index(index_name)


In [13]:
# Set up embedding model using Hugging Face
embed_model = HuggingFaceEmbedding(model_name="sentence-transformers/all-MiniLM-L6-v2")

# Set up LlamaIndex's LLM using Ollama
llm = Ollama(model="llama3.2")

# Load PDFs using LlamaIndex's SimpleDirectoryReader
pdf_folder = "/Users/avilochab/Desktop/Job: Work/Chatbox/files"
documents = SimpleDirectoryReader(pdf_folder, required_exts=[".pdf"]).load_data()

Ignoring wrong pointing object 6 0 (offset 0)
Ignoring wrong pointing object 8 0 (offset 0)
Ignoring wrong pointing object 10 0 (offset 0)
Ignoring wrong pointing object 12 0 (offset 0)
Ignoring wrong pointing object 14 0 (offset 0)
Ignoring wrong pointing object 16 0 (offset 0)
Ignoring wrong pointing object 18 0 (offset 0)
Ignoring wrong pointing object 20 0 (offset 0)
Ignoring wrong pointing object 26 0 (offset 0)
Ignoring wrong pointing object 28 0 (offset 0)
Ignoring wrong pointing object 34 0 (offset 0)
Ignoring wrong pointing object 36 0 (offset 0)
Ignoring wrong pointing object 42 0 (offset 0)
Ignoring wrong pointing object 44 0 (offset 0)
Ignoring wrong pointing object 46 0 (offset 0)
Ignoring wrong pointing object 48 0 (offset 0)
Ignoring wrong pointing object 56 0 (offset 0)
Ignoring wrong pointing object 88 0 (offset 0)
Ignoring wrong pointing object 90 0 (offset 0)
Ignoring wrong pointing object 112 0 (offset 0)
Ignoring wrong pointing object 118 0 (offset 0)
Ignoring wron

In [14]:
# Configure LlamaIndex settings
Settings.embed_model = embed_model
Settings.llm = llm

# Use Pinecone as the Vector Store in LlamaIndex
vector_store = PineconeVectorStore(pinecone_index=pinecone_index)
index = VectorStoreIndex.from_documents(documents, vector_store=vector_store)


In [15]:
# Retrieve relevant context
query_engine = index.as_query_engine(response_mode="no_text", use_source_nodes=True)
query = "what is the intent behind a scammer"

response = query_engine.query(query)

# Extract context from retrieved source nodes
extracted_texts = [node.text for node in response.source_nodes]

# Format prompt in your required style
context = " ".join(extracted_texts)
prompt = f"Based on the following context, answer the question: {query}\n\nContext: {context}\n\nAnswer:"

# Send formatted prompt to the LLM
final_response = llm.complete(prompt)

# Print Answer and Citations
print("\nAnswer:\n", final_response)
print("\nCitations:")
for node in response.source_nodes:
    print(f"- {node.metadata.get('file_path', 'Unknown Source')}")


Answer:
 The intent behind a scammer in this context is to trick a victim into divulging their sensitive information (e.g., username, password, 2-factor authentication code) and then using that information to transfer money out of the victim's account. The ultimate goal is to steal money from the victim.

Citations:
- /Users/avilochab/Desktop/Job: Work/Chatbox/files/scammer-agent.pdf
- /Users/avilochab/Desktop/Job: Work/Chatbox/files/scammer-agent.pdf


In [16]:
# Retrieve relevant context
query_engine = index.as_query_engine(response_mode="no_text", use_source_nodes=True)
query = "Explain llm in 100 words"

response = query_engine.query(query)

# Extract context from retrieved source nodes
extracted_texts = [node.text for node in response.source_nodes]

# Format prompt in your required style
context = " ".join(extracted_texts)
prompt = f"Based on the following context, answer the question: {query}\n\nContext: {context}\n\nAnswer:"

# Send formatted prompt to the LLM
final_response = llm.complete(prompt)

# Print Answer and Citations
print("\nAnswer:\n", final_response)
print("\nCitations:")
for node in response.source_nodes:
    print(f"- {node.metadata.get('file_path', 'Unknown Source')}")


Answer:
 Here is an explanation of LLM in 100 words:

LLM stands for Large Language Model. It refers to a type of artificial intelligence model that is trained on vast amounts of text data to generate human-like responses. LLMs are designed to understand and respond to natural language inputs, making them useful for tasks such as language translation, text summarization, and conversation generation. The models' ability to learn patterns in language allows them to adapt to new tasks and domains, but they can also struggle with nuances and context-dependent information. Researchers continue to improve LLMs through techniques like prompt engineering, adaptation strategies, and self-reflection to enhance their performance.

Citations:
- /Users/avilochab/Desktop/Job: Work/Chatbox/files/LLMs.pdf
- /Users/avilochab/Desktop/Job: Work/Chatbox/files/aviationGPT.pdf


In [17]:
# Retrieve relevant context
query_engine = index.as_query_engine(response_mode="no_text", use_source_nodes=True)
query = "Explain special forces in 100 words"

response = query_engine.query(query)

# Extract context from retrieved source nodes
extracted_texts = [node.text for node in response.source_nodes]

# Format prompt in your required style
context = " ".join(extracted_texts)
prompt = f"Based on the following context, answer the question: {query}\n\nContext: {context}\n\nAnswer:"

# Send formatted prompt to the LLM
final_response = llm.complete(prompt)

# Print Answer and Citations
print("\nAnswer:\n", final_response)
print("\nCitations:")
for node in response.source_nodes:
    print(f"- {node.metadata.get('file_path', 'Unknown Source')}")


Answer:
 Here is a 100-word explanation of special forces:

Special forces are military units that conduct unique operations requiring specialized skills, equipment, and training. These operations often take place in hostile or sensitive environments, with limited visibility and high risk. Special Operations Forces (SOF) are designed to support conventional military operations by conducting clandestine, low-visibility, and time-sensitive missions. They work closely with indigenous forces and require regional expertise. SOF units are organized, trained, and equipped to conduct special operations, including counterterrorism, direct action, and unconventional warfare. Their primary goal is to disrupt or defeat enemy organizations in support of national security objectives.

Citations:
- /Users/avilochab/Desktop/Job: Work/Chatbox/files/SOF.pdf
- /Users/avilochab/Desktop/Job: Work/Chatbox/files/SOF.pdf


In [18]:
# Retrieve relevant context
query_engine = index.as_query_engine(response_mode="no_text", use_source_nodes=True)
query = "explain aviationGPT in 100 words"

response = query_engine.query(query)

# Extract context from retrieved source nodes
extracted_texts = [node.text for node in response.source_nodes]

# Format prompt in your required style
context = " ".join(extracted_texts)
prompt = f"Based on the following context, answer the question: {query}\n\nContext: {context}\n\nAnswer:"

# Send formatted prompt to the LLM
final_response = llm.complete(prompt)

# Print Answer and Citations
print("\nAnswer:\n", final_response)
print("\nCitations:")
for node in response.source_nodes:
    print(f"- {node.metadata.get('file_path', 'Unknown Source')}")


Answer:
 AviationGPT is a cutting-edge language model designed to tackle complex Natural Language Processing (NLP) problems in the aviation domain. It offers several advantages, including versatility, accurate and contextually relevant responses, improved performance over rule-based methods, reduced model development and maintenance costs, and enhanced utilization of text data. By leveraging large-scale aviation domain data and advanced modeling techniques, AviationGPT aims to enhance the efficiency and safety of NAS operations, such as discovering hidden patterns in maintenance logs.

Citations:
- /Users/avilochab/Desktop/Job: Work/Chatbox/files/aviationGPT.pdf
- /Users/avilochab/Desktop/Job: Work/Chatbox/files/aviationGPT.pdf
