In [19]:
%load_ext dotenv
%dotenv

The dotenv extension is already loaded. To reload it, use:
  %reload_ext dotenv


In [15]:
import sys
import os
sys.path.append("/home/neal/workspace/llama-lab")
from llama_app.populate.populate_db import store_articles_to_disk, clean_directory

In [16]:
path = "../data"
clean_directory(path)
store_articles_to_disk(path)

In [39]:
from llama_index import SimpleDirectoryReader, VectorStoreIndex, ServiceContext, StorageContext
from llama_index.embeddings import GooglePaLMEmbedding
from llama_index.node_parser import SentenceSplitter
from llama_index.response.notebook_utils import (
    display_source_node,
    display_response,
)
from llama_index.vector_stores import PGVectorStore

from sqlalchemy import make_url
import psycopg2

connection_string = "postgresql://postgres:postgres@localhost:5432"
db_name = "vector_db"
# with psycopg2.connect(connection_string) as conn:
#     conn.autocommit = True
#     with conn.cursor() as c:
#         c.execute(f"DROP DATABASE IF EXISTS {db_name}")
#         c.execute(f"CREATE DATABASE {db_name}")

url = make_url(connection_string)
vector_store = PGVectorStore.from_params(
    database=db_name,
    host=url.host,
    password=url.password,
    port=url.port,
    user=url.username,
    table_name="composers",
    embed_dim=768,
)


# https://cloud.google.com/docs/authentication/api-keys
api_key = os.getenv("GOOGLE_AI_API_KEY")

storage_context = StorageContext.from_defaults(vector_store=vector_store)
service_context = ServiceContext.from_defaults(
    embed_model=GooglePaLMEmbedding(
        model_name="models/embedding-gecko-001",
        api_key=api_key,
    )
)

documents = SimpleDirectoryReader(path).load_data()
print("Document ID:", documents[0].doc_id)

parser = SentenceSplitter()
print("constructing nodes")
nodes = parser.get_nodes_from_documents(documents)

print("initializing vector store index")
print("loading data?")
index = VectorStoreIndex(nodes, service_context=service_context, storage_context=storage_context)
print("loaded?")
engine = index.as_query_engine()
response = engine.query("What people rank amongst the greatest Baroque composers and his influence during his lifetime?")

display_response(
    response, source_length=1000, show_source=True, show_source_metadata=True
)


Document ID: 8d9fcb4f-33d3-45f9-870b-8453f326e071
constructing nodes
initializing vector store index
loading data?
loaded?


**`Final Response:`** Johann Sebastian Bach ranks amongst the greatest Baroque composers. During his lifetime, he had a significant influence on the music scene. He enriched established German styles through his mastery of counterpoint, harmonic, and motivic organization. Bach also adapted rhythms, forms, and textures from abroad, particularly from Italy and France. His compositions include hundreds of cantatas, both sacred and secular, as well as Latin church music, Passions, oratorios, and motets. Bach's music was primarily valued for his skills as an organist, but his keyboard music, such as The Well-Tempered Clavier, was also appreciated for its didactic qualities.

---

**`Source Node 1/2`**

**Node ID:** 0a36c3ac-bc77-4455-ac2a-5e0f6d39d4c5<br>**Similarity:** 0.7545058470960508<br>**Text:** Johann Sebastian Bach (31 March [O.S. 21 March] 1685 – 28 July 1750) was a German composer and musician of the late Baroque period. He is known for his orchestral music such as the Brandenburg Concertos; instrumental compositions such as the Cello Suites; keyboard works such as the Goldberg Variations and The Well-Tempered Clavier; organ works such as the Schubler Chorales and the Toccata and Fugue in D minor; and vocal music such as the St Matthew Passion and the Mass in B minor. Since the 19th-century Bach revival, he has been generally regarded as one of the greatest composers in the history of Western music. He has been called the "father of harmony".The Bach family already counted several composers when Johann Sebastian was born as the last child of a city musician, Johann Ambrosius, in Eisenach. After being orphaned at the age of 10, he lived for five years with his eldest brother Johann Christoph, after which he continued his musical education in Lüneburg. From 1703 he was ba...<br>**Metadata:** {'file_path': '../data/00ffed6f-4567-408f-9803-0ea5a1ce44c0.txt', 'file_name': '00ffed6f-4567-408f-9803-0ea5a1ce44c0.txt', 'file_type': 'text/plain', 'file_size': 3548, 'creation_date': '2024-01-06', 'last_modified_date': '2024-01-06', 'last_accessed_date': '2024-01-06'}<br>

---

**`Source Node 2/2`**

**Node ID:** a23459d0-4105-4401-bf98-3d98632cae5a<br>**Similarity:** 0.7373157695285525<br>**Text:** Gustav Mahler (German: [ˈɡʊstaf ˈmaːlɐ]; 7 July 1860 – 18 May 1911) was an Austro-Bohemian Romantic composer, and one of the leading conductors of his generation. As a composer he acted as a bridge between the 19th-century Austro-German tradition and the modernism of the early 20th century. While in his lifetime his status as a conductor was established beyond question, his own music gained wide popularity only after periods of relative neglect, which included a ban on its performance in much of Europe during the Nazi era. After 1945 his compositions were rediscovered by a new generation of listeners; Mahler then became one of the most frequently performed and recorded of all composers, a position he has sustained into the 21st century. Born in Bohemia (then part of the Austrian Empire) to Jewish parents of humble origins, the German-speaking Mahler displayed his musical gifts at an early age. After graduating from the Vienna Conservatory in 1878, he held a succession of conducting ...<br>**Metadata:** {'file_path': '../data/fc5e522b-0eee-430f-aa10-4e93c9bcb033.txt', 'file_name': 'fc5e522b-0eee-430f-aa10-4e93c9bcb033.txt', 'file_type': 'text/plain', 'file_size': 2654, 'creation_date': '2024-01-06', 'last_modified_date': '2024-01-06', 'last_accessed_date': '2024-01-06'}<br>

In [24]:
from llama_index import Document
from llama_index.schema import MetadataMode

document = Document(
    text="This is a super-customized document",
    metadata={
        "file_name": "super_secret_document.txt",
        "category": "finance",
        "author": "LlamaIndex",
    },
    excluded_llm_metadata_keys=["file_name"],
    metadata_seperator="::",
    metadata_template="{key}=>{value}",
    text_template="Metadata: {metadata_str}\n-----\nContent: {content}",
)

print(
    "The LLM sees this: \n",
    document.get_content(metadata_mode=MetadataMode.LLM),
)
print(
    "The Embedding model sees this: \n",
    document.get_content(metadata_mode=MetadataMode.EMBED),
)

The LLM sees this: 
 Metadata: category=>finance::author=>LlamaIndex
-----
Content: This is a super-customized document
The Embedding model sees this: 
 Metadata: file_name=>super_secret_document.txt::category=>finance::author=>LlamaIndex
-----
Content: This is a super-customized document


In [48]:
prompts_dict = engine.get_prompts()

print(prompts_dict["response_synthesizer:text_qa_template"].default_template.template)

Context information is below.
---------------------
{context_str}
---------------------
Given the context information and not prior knowledge, answer the query.
Query: {query_str}
Answer: 
