### Load CSV & Build Text

In [None]:
import pandas as pd

df = pd.read_csv("TNG.csv")

# Build combined texts with metadata 
combined_texts = (
    df["who"].fillna('Unknown').astype(str) + ": " +
    df["text"].fillna('').astype(str) + 
    " [Episode: " + df["Episode"].astype(str) + ", Scene: " + df["scenenumber"].astype(str) + "]"
).tolist()

### Chunk & Embed

In [2]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
from sentence_transformers import SentenceTransformer

splitter = RecursiveCharacterTextSplitter(chunk_size=800, chunk_overlap=50)
chunks = sum([splitter.split_text(t) for t in combined_texts], [])

model = SentenceTransformer("all-MiniLM-L6-v2")
embeddings = model.encode(chunks)

  from .autonotebook import tqdm as notebook_tqdm
W0926 21:42:32.096138 1488 torch/distributed/elastic/multiprocessing/redirects.py:29] NOTE: Redirects are currently not supported in Windows or MacOs.


### Store in Vector DB

In [3]:
import chromadb

client = chromadb.Client()
collection = client.create_collection("tng_docs")

for i, (chunk, embedding) in enumerate(zip(chunks, embeddings)):
    collection.add(
        documents=[chunk],
        embeddings=[embedding.tolist()],
        ids=[str(i)]
    )

### Query & Retrieval Cell

In [25]:
# RAG Query
def semantic_search(query, top_n=5):
    query_emb = model.encode([query])[0].tolist()
    results = collection.query(
        query_embeddings=[query_emb],
        n_results=top_n
    )
    texts = results['documents'][0]
    return texts

### Example usage:

In [26]:
query = "What does Data say about being human?"
retrieved = semantic_search(query)
for txt in retrieved:
    print(txt)

DATA:  I do not know for certain... but I believe that it is during my creative endeavors that I come closest to experiencing what it might be like to be human. [Episode: 10.0, Scene: 14 ]
DATA:  I have always wished to be Human. I study people carefully in order to more closely approximate Human behavior. [Episode: 11.0, Scene: 69 ]
DATA:  And you consider it important to please humans? [Episode: 12.0, Scene: 40 ]
Unknown:  Data watches intently as Timothy again laughs MOS -- much more "Human" than we've ever seen Data act. OFF Data's face. [Episode: 11.0, Scene: 68 ]
PICARD:  These are questions that mankind has been struggling with since creation. I am afraid your confusion, Data... is only human. [Episode: 12.0, Scene: 39 ]


In [12]:
query = "What does Picard say about Space?"
retrieved = semantic_search(query)   
retrieved

['PICARD:  About space. About the universe you are preparing to enter. [Episode: 15.0, Scene: 7  ]',
 'PICARD:  So -- this energy, directed into space, then what? [Episode: 23.0, Scene: 22 ]',
 'PICARD:  And into specific patterns of matter. Much as our transporters do. [Episode: 1.0, Scene: 233]',
 "PICARD (V.O.):  to a place in the universe which is uncharted and unknown. Our ship's instruments... [Episode: 5.0, Scene: 62 ]",
 "PICARD (V.O.):  Captain's log, supplemental. While exploring a strange area in space without any form of matter or energy... [Episode: 2.0, Scene: 44 ]"]

### LLM tools with ollama

In [None]:
from ollama import chat

def ollama_llm_answer(question, context, model_name="llama3.2:1b"):
    messages = [
        {"role": "system", "content": "You are an expert on Star Trek: The Next Generation."},
        {
            "role": "user",
            "content": f"Context:\n{context}\n\nQuestion: {question}\nAnswer:"
        }
    ]
    response = chat(model=model_name, messages=messages)
    return response.message.content

# Use after RAG retrieval:
rag_query = "What does Picard say about Data in TNG?"
retrieved = semantic_search(rag_query)
context = "\n".join(retrieved)
answer = ollama_llm_answer(rag_query, context, model_name="llama3.2:1b")  
print(answer)

In the Star Trek: The Next Generation series, Jean-Luc Picard's dialogue about Data is a recurring theme throughout the show. Here are some quotes and context that illustrate his views on Data:

* In Episode 4.0, "Yesterday's Enterprise" (Scene: 14), Picard says to Data, "I am making you aware of the... situation." This implies that he thinks Data may be a threat or an anomaly, which is why he wants to inform him.
* In Episode 17.0, "The Measure of a Man" (Scene: 78), Picard tells Data, "You are my son, Data." This indicates that Picard feels a paternal bond with Data, despite their artificial creation and the fact that they were not biologically related.
* In several episodes, including Episode 13.0, "Tapestry" (Scene: 26), Picard expresses concern about Data's humanity, particularly in situations where he is faced with tough moral choices. He seems to believe that Data's lack of emotions and instincts makes him less human.
* In Episode 16.0, "All Good Things..." (Scene: 31), Picard s