In [18]:
%pip install llama-index-readers-file pymupdf
%pip install llama-index-vector-stores-chroma
%pip install llama-index-embeddings-together
%pip install trulens-eval


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.0[0m[39;49m -> [0m[32;49m24.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


Embedding model

In [27]:
import os
from dotenv import load_dotenv
load_dotenv()

api_key=os.getenv("TOEGETHER_API_KEY")

from llama_index.embeddings.together import TogetherEmbedding

embed_model = TogetherEmbedding(
    model_name="togethercomputer/m2-bert-80M-8k-retrieval", api_key=api_key
)

In [28]:
from llama_index.core import VectorStoreIndex
from llama_index.vector_stores.chroma import ChromaVectorStore
from llama_index.core import StorageContext
from IPython.display import Markdown, display
from llama_index.core import Settings
import chromadb

In [30]:
chroma_client = chromadb.EphemeralClient()
chroma_collection = chroma_client.create_collection("starter")

Loading data

In [74]:

from pathlib import Path
from llama_index.readers.file import PyMuPDFReader


loader = PyMuPDFReader()
documents = loader.load(file_path="./data/AGI.pdf")

print(len(documents))
print(documents[0])

48
Doc ID: 0298c9c3-859d-4954-aaee-72e2967e5a62
Text: Journal of Artiﬁcial General Intelligence 5(1) 1-46, 2014
Submitted 2013-2-12 DOI: 10.2478/jagi-2014-0001 Accepted 2014-3-15
Artiﬁcial General Intelligence: Concept, State of the Art, and Future
Prospects Ben Goertzel BEN@GOERTZEL.ORG OpenCog Foundation G/F, 51C
Lung Mei Village Tai Po, N.T., Hong Kong Editor: Tsvi Achler Abstract
In recent year...


In [75]:
from llama_index.core import Document

document = Document(text="\n\n".join([doc.text for doc in documents]))

In [82]:
from llama_index.core.node_parser import SentenceSplitter

text_splitter = SentenceSplitter(
  separator=" ",
  chunk_size=1024,
  chunk_overlap=20,
  paragraph_separator="\n\n\n",
  secondary_chunking_regex="[^,.;。]+[,.;。]?",

)


In [83]:
import os
from dotenv import load_dotenv
from llama_index.llms.cohere import Cohere
load_dotenv()

cohere_key=os.getenv("CO_API_KEY")


In [84]:
llm = Cohere(api_key=cohere_key)

In [85]:
index = VectorStoreIndex.from_documents(
    documents, embed_model=embed_model
)

In [125]:
query_engine = index.as_query_engine(llm=llm)
response = query_engine.query("WWhat is the future of AGI?")
print(response)


The future of AGI (Artificial General Intelligence) is uncertain. Creating AGI systems that exhibit human-like general intelligence may require a revolutionary breakthrough or the gradual development of theory and experimentation. The progression of AGI research towards a more scientific and systematic approach is likely to yield more capable systems. However, the timeline and specifics of AGI's development are unclear. 

The current state of AGI research is such that ambitious goals, like those listed above, seem far off. The field relies heavily on the intuition and tinkering of researchers, and it's uncertain how much a more advanced AGI theory will impact practical system development. Still, progress in this direction is believed to benefit the AGI field overall, leading to more systematic and scientific design processes. 

The ultimate success of AGI will depend on various factors, some of which include the limitations of mathematics and computing, the potential for gradual theore

Sentence Window Retrieval

In [88]:
from llama_index.core.node_parser import SentenceWindowNodeParser

# create the sentence window node parser w/ default settings
node_parser = SentenceWindowNodeParser.from_defaults(
    window_size=3,
    window_metadata_key="window",
    original_text_metadata_key="original_text",
)

In [89]:
Settings.llm = llm
Settings.embed_model = embed_model
Settings.node_parser = node_parser


In [90]:
sentence_index = VectorStoreIndex.from_documents(
    [document], embed_model=embed_model
)

In [91]:
sentence_index.storage_context.persist(persist_dir="persist/sentence_index")

In [92]:
from llama_index.core.postprocessor import MetadataReplacementPostProcessor

postproc = MetadataReplacementPostProcessor(
    target_metadata_key="window"
)

In [93]:
from llama_index.core.schema import NodeWithScore
from copy import deepcopy
from llama_index.core import Document

nodes = node_parser.get_nodes_from_documents([document])

scored_nodes = [NodeWithScore(node=x, score=1.0) for x in nodes]
nodes_old = [deepcopy(n) for n in nodes]

In [94]:
nodes_old[2].text

'Approaches to deﬁning the concept of Artiﬁcial\nGeneral Intelligence (AGI) are reviewed including mathematical formalisms, engineering, and\nbiology inspired perspectives. '

In [95]:
replaced_nodes = postproc.postprocess_nodes(scored_nodes)

In [159]:
from llama_index.core.postprocessor import MetadataReplacementPostProcessor

query_engine = sentence_index.as_query_engine(
    similarity_top_k=2,
    node_postprocessors=[
        MetadataReplacementPostProcessor(target_metadata_key="window")
    ],
)

query = "Why do we need AGI?"
window_response = query_engine.query(
    query
)

print(window_response)

The text implies that AGI (Artificial General Intelligence) is being pursued as a way to create AI systems that can learn and think more like humans. AGI is positioned as a potential solution to the limitation of current AI technologies, which are narrow in scope and unable to generalize their learning to new situations. 

The development of AGI is framed as an emulation of the human learning environment, which could potentially enable machines to understand and tackle complex, interconnected real-world tasks. This would seemingly be a valuable tool for humans, especially when compared to the current state of AI, which is often task-specific and rigid in its application. 

The text also hints at the potential future usefulness of AGI technologies in making advancements in various fields, like robotics and human-AI interaction.


Answer relevancy test

In [157]:
import nest_asyncio
from tqdm.asyncio import tqdm_asyncio

nest_asyncio.apply()

In [124]:
from datasets import Dataset 
from ragas.metrics import answer_relevancy
from ragas import evaluate

In [158]:
context = []
for i in replaced_nodes:
    context.append(i.text)

data_samples = {
    'question': ['What is AGI?', 'Why do we need AGI?'],
    'answer': ['AGI stands for Artificial General Intelligence, which is a concept in artificial intelligence that refers to the development of machines that can perform any intellectual task that a human being can. AGI is also referred to as Strong AI or Human-Level AI.',
'The text implies that AGI (Artificial General Intelligence) is being pursued as a way to create AI systems that can learn and think more like humans. AGI is positioned as a potential solution to the limitation of current AI technologies, which are narrow in scope and unable to generalize their learning to new situations. The development of AGI is motivated by the desire to create robots or computer programs that can handle the complexity and interconnectedness of real-world human tasks. Proponents of AGI research believe that this approach is a crucial step towards creating more flexible and adaptable AI systems. These systems would be able to recognize and respond to diverse situations, learning and developing new skills as needed. The ultimate goal, as implied by the text, is to create AI with a level of intelligence comparable to that of humans, which can then be applied to a host of challenges and problems that require more sophisticated and nuanced solutions than current AI technologies can provide.' ],  
'contexts' : context
}


Auto Merging Retrieval

In [92]:
from llama_index.core.node_parser import HierarchicalNodeParser

node_parser = HierarchicalNodeParser.from_defaults()
    

In [95]:
nodes = node_parser.get_nodes_from_documents([document])

In [97]:
from llama_index.core.node_parser import get_leaf_nodes, get_root_nodes


In [100]:
leaf_nodes = get_leaf_nodes(nodes)


root_nodes = get_root_nodes(nodes)

len(leaf_nodes)
len(root_nodes)

7

In [102]:
from llama_index.core.storage.docstore import SimpleDocumentStore 
from llama_index.core import StorageContext


In [106]:
docstore = SimpleDocumentStore()
docstore.add_documents(nodes)

storage_context = StorageContext.from_defaults(docstore=docstore)

In [107]:
base_index = VectorStoreIndex(
    leaf_nodes,
    storage_context=storage_context,
)

Define Retriever

In [2]:
from llama_index.core.retrievers import AutoMergingRetriever