In [1]:
from llama_index.core import (
    SimpleDirectoryReader,
    load_index_from_storage,
    VectorStoreIndex,
    StorageContext,
    ServiceContext
)
from llama_index.vector_stores.faiss import FaissVectorStore
from IPython.display import Markdown, display
from llama_index.embeddings.ollama import OllamaEmbedding
import faiss
from llama_index.core.retrievers import VectorIndexRetriever
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.node_parser import SemanticSplitterNodeParser
from llama_index.embeddings.ollama import OllamaEmbedding
from llama_index.core.response.notebook_utils import display_source_node
from llama_index.core import Settings

In [2]:
# load documents
documents = SimpleDirectoryReader(input_files=["../data/thinking_machines.txt"]).load_data()
embed_model = OllamaEmbedding(
    model_name="mxbai-embed-large:latest"
)
Settings.embed_model = embed_model
Settings.llm = None
splitter = SemanticSplitterNodeParser(buffer_size=1, breakpoint_percentail_thresholed=95,
                                      embed_model=embed_model)
nodes = splitter.get_nodes_from_documents(documents)

LLM is explicitly disabled. Using MockLLM.


In [3]:
for i, node in enumerate(nodes):
    print(i, " ---- ", node) #node.get_content()

0  ----  Node ID: 1a1af6e9-9da2-435f-a067-25713489c386
Text: When I was an entrepreneur about a decade ago, I landed on a
golden use case (or so I fantasised). It is to apply Natural Language
Processing (NLP) to convert english statements into structured facts
(Subject-Predicate-Object) that adhere to commonly-agreed Domain
Ontologies. For example, “Singapore’s economic inflation is estimated
at 4.5%” can...
1  ----  Node ID: a1eb6634-1ee4-448f-a52e-9cdb0175fa14
Text: With the very little that I have learned about this vast
history, I will try and categorise the different schools of thoughts
that existed and evolved this field. If you are a sincere seeker, you
may stitch the story also, starting with Aristotle and traversing all
the way through the history of first order logic, inferencing,
mathematical induc...
2  ----  Node ID: 690be0ea-28f2-4c64-88a5-a1fab2095c5f
Text: The AI winter(s) did not happen suddenly. There were some
radical voices that predicted it like that of Hubert Drey

In [4]:
faiss_index = faiss.IndexFlatL2(1024)
faiss_index.is_trained

True

In [5]:
vector_store = FaissVectorStore(faiss_index=faiss_index)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex(
    nodes, storage_context=storage_context,
)

In [16]:
Settings.llm = None
query_engine = index.as_query_engine()
response = query_engine.query(
    "How can semantic datastores be used in reasoning"
)
for n in response.source_nodes:
    display_source_node(n, source_length=20000)

LLM is explicitly disabled. Using MockLLM.


**Node ID:** 1a1af6e9-9da2-435f-a067-25713489c386<br>**Similarity:** 224.02749633789062<br>**Text:** When I was an entrepreneur about a decade ago, I landed on a golden use case (or so I fantasised). It is to apply Natural Language Processing (NLP) to convert english statements into structured facts (Subject-Predicate-Object) that adhere to commonly-agreed Domain Ontologies. For example, “Singapore’s economic inflation is estimated at 4.5%” can be expressed as a triple with Singapore as the subject, 4.5 as the object and has_inflation as the predicate (semantic relatioship). All the Subjects, Objects and Predicates can be expressed as unique URIs so that the knowledge across millions of triples describing different aspects of the same subjects and objects can be stictched together. Such fact triples can be loaded onto Semantic Datastores (RDF or Property-Graph based) and queried, again by the same natural language processing capability. Even more so, these triples can be crunched by inferencing engines to derive semantically sound new knowledge and can reason these derivations in reverse too. The ultimate thinking machine! or so I fantasised.

Why this seemingly round-about approach to create a thinking machine as against the generative AI’s approach of predicting word-sequences? Why convert unstructured English to structured facts before processing it for an answer? And are there any other alternative approaches to creating thinking machines? Strangely, I have not heard much balanced critical opinions from contemporary computing historians or philosophers on this subject ever since LLMs were made public. It could also be that I am living under a certain other rock but all I hear is the whole industry monomaniacally adulating the genAI’s auto-regressive model, bullishly sidelining any reasonable criticisms. To correctly quantify the pleasantly surprising merits and some devastating demerits, we need to philosophically ponder on the humanity’s past quests on this matter and re-learn from it.<br>

**Node ID:** 690be0ea-28f2-4c64-88a5-a1fab2095c5f<br>**Similarity:** 274.6891784667969<br>**Text:** The AI winter(s) did not happen suddenly. There were some radical voices that predicted it like that of Hubert Dreyfus. They theorised that the magical activities like object recognition are not symbol manipulation problems. These are perhaps pattern matching problems. So here is our third category and lets call them the “thinking = pattern sifting” school. Thanks to them, we learnt that we can’t solve a problem of recognising an object, say a chair, even if we write thousands of rules. (Although its a core component of intelligence, is recognition a sufficient capability to create thinking machines? Let us come back to this shortly)

While progressing in parallel on the natural language processing frontier, we have recently landed on another possible solution to thinking machines — the LLMs. This is a variation of the pattern sifting approach, applied to word patterns in languages. By mastering the language syntax patterns sourced from an unlimited corpus and by using them as references to generate eloquent statements, these models seem to simulate thinking. This is the claim of the fourth school and lets call them the “thinking = sequence transduction” school. This also would mean that people perhaps are nothing but evolved sequence models.

I am not going to dive into the second category for now.<br>

In [11]:
index.storage_context.persist(persist_dir="index")

In [13]:
vector_store = FaissVectorStore.from_persist_path("./index/default__vector_store.json")
storage_context = StorageContext.from_defaults(
    vector_store=vector_store, persist_dir="./index"
)
retrieved_index = load_index_from_storage(storage_context=storage_context)

In [15]:
retriever = retrieved_index.as_retriever()
response = retriever.retrieve(
    "How can semantic datastores be used in reasoning"
)
for n in response:
    display_source_node(n, source_length=20000)

**Node ID:** 690be0ea-28f2-4c64-88a5-a1fab2095c5f<br>**Similarity:** 326.43353271484375<br>**Text:** The AI winter(s) did not happen suddenly. There were some radical voices that predicted it like that of Hubert Dreyfus. They theorised that the magical activities like object recognition are not symbol manipulation problems. These are perhaps pattern matching problems. So here is our third category and lets call them the “thinking = pattern sifting” school. Thanks to them, we learnt that we can’t solve a problem of recognising an object, say a chair, even if we write thousands of rules. (Although its a core component of intelligence, is recognition a sufficient capability to create thinking machines? Let us come back to this shortly)

While progressing in parallel on the natural language processing frontier, we have recently landed on another possible solution to thinking machines — the LLMs. This is a variation of the pattern sifting approach, applied to word patterns in languages. By mastering the language syntax patterns sourced from an unlimited corpus and by using them as references to generate eloquent statements, these models seem to simulate thinking. This is the claim of the fourth school and lets call them the “thinking = sequence transduction” school. This also would mean that people perhaps are nothing but evolved sequence models.

I am not going to dive into the second category for now.<br>

**Node ID:** 1a1af6e9-9da2-435f-a067-25713489c386<br>**Similarity:** 328.9163513183594<br>**Text:** When I was an entrepreneur about a decade ago, I landed on a golden use case (or so I fantasised). It is to apply Natural Language Processing (NLP) to convert english statements into structured facts (Subject-Predicate-Object) that adhere to commonly-agreed Domain Ontologies. For example, “Singapore’s economic inflation is estimated at 4.5%” can be expressed as a triple with Singapore as the subject, 4.5 as the object and has_inflation as the predicate (semantic relatioship). All the Subjects, Objects and Predicates can be expressed as unique URIs so that the knowledge across millions of triples describing different aspects of the same subjects and objects can be stictched together. Such fact triples can be loaded onto Semantic Datastores (RDF or Property-Graph based) and queried, again by the same natural language processing capability. Even more so, these triples can be crunched by inferencing engines to derive semantically sound new knowledge and can reason these derivations in reverse too. The ultimate thinking machine! or so I fantasised.

Why this seemingly round-about approach to create a thinking machine as against the generative AI’s approach of predicting word-sequences? Why convert unstructured English to structured facts before processing it for an answer? And are there any other alternative approaches to creating thinking machines? Strangely, I have not heard much balanced critical opinions from contemporary computing historians or philosophers on this subject ever since LLMs were made public. It could also be that I am living under a certain other rock but all I hear is the whole industry monomaniacally adulating the genAI’s auto-regressive model, bullishly sidelining any reasonable criticisms. To correctly quantify the pleasantly surprising merits and some devastating demerits, we need to philosophically ponder on the humanity’s past quests on this matter and re-learn from it.<br>