In [1]:
#reloads modules before executing user code
%load_ext autoreload
%autoreload 2

In [2]:
import sys
!{sys.executable} -m pip install -r requirements.txt

--- Logging error ---
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pip/_internal/utils/logging.py", line 177, in emit
    self.console.print(renderable, overflow="ignore", crop=False, style=style)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pip/_vendor/rich/console.py", line 1673, in print
    extend(render(renderable, render_options))
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pip/_vendor/rich/console.py", line 1305, in render
    for render_output in iter_render:
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pip/_internal/utils/logging.py", line 134, in __rich_console__
    for line in lines:
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pip/_vendor/rich/segment.py", line 249, in split_lines
    for segment in segments:
  Fil

In [3]:
# Ignore unclosed SSL socket warnings - optional in case you get these errors
import warnings

warnings.filterwarnings(action="ignore", message="unclosed", category=ImportWarning)
warnings.filterwarnings("ignore", category=DeprecationWarning)

## Laying the foundations

### Storage

We're going to use Redis as our database for both document contents and the vector embeddings. You will need the full Redis Stack to enable use of Redisearch, which is the module that allows semantic search - more detail is in the [docs for Redis Stack](https://redis.io/docs/stack/get-started/install/docker/).

To set this up locally, you will need to install Docker and then run the following command: ```docker run -d --name redis-stack -p 6379:6379 -p 8001:8001 redis/redis-stack:latest```.

The code used here draws heavily on [this repo](https://github.com/RedisAI/vecsim-demo).

After setting up the Docker instance of Redis Stack, you can follow the below instructions to initiate a Redis connection and create a Hierarchical Navigable Small World (HNSW) index for semantic search.

In [4]:
# Setup Redis and running?
from database import get_redis_connection

redis_client = get_redis_connection()

redis_client.ping()

True

In [5]:
# Optional step to drop the indexes if they already exists
from importer import NOTION_INDEX_NAME, WEB_SCRAPE_INDEX_NAME

# redis_client.ft(NOTION_INDEX_NAME).dropindex()
# redis_client.ft(WEB_SCRAPE_INDEX_NAME).dropindex()

b'OK'

### Ingestion

We'll load up our Notion pages into documents

In [6]:
import logging
import sys

logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

In [7]:
from importer import import_notion_data

notion_index = import_notion_data()

INFO:llama_index.vector_stores.redis:Creating index notion-fl-index
Creating index notion-fl-index
INFO:llama_index.vector_stores.redis:Added 102 documents to index notion-fl-index
Added 102 documents to index notion-fl-index
INFO:llama_index.token_counter.token_counter:> [build_index_from_nodes] Total LLM token usage: 0 tokens
> [build_index_from_nodes] Total LLM token usage: 0 tokens
INFO:llama_index.token_counter.token_counter:> [build_index_from_nodes] Total embedding token usage: 80449 tokens
> [build_index_from_nodes] Total embedding token usage: 80449 tokens


In [8]:
# Optional
# Proves that the redis database contains data

from importer import number_of_stored_notion_docs
print(number_of_stored_notion_docs())

1197


Adding web scraped data to index

In [9]:
from importer import import_web_scrape_data

web_scrape_index = import_web_scrape_data()

INFO:llama_index.vector_stores.redis:Creating index web-scrape-fl-index
Creating index web-scrape-fl-index
INFO:llama_index.vector_stores.redis:Added 50 documents to index web-scrape-fl-index
Added 50 documents to index web-scrape-fl-index
INFO:llama_index.token_counter.token_counter:> [build_index_from_nodes] Total LLM token usage: 0 tokens
> [build_index_from_nodes] Total LLM token usage: 0 tokens
INFO:llama_index.token_counter.token_counter:> [build_index_from_nodes] Total embedding token usage: 44918 tokens
> [build_index_from_nodes] Total embedding token usage: 44918 tokens


In [10]:
# Optional
# Proves that the redis database contains data

from importer import number_of_stored_web_scrape_docs
print(number_of_stored_web_scrape_docs())

1247


In [13]:
#optional if you haven't installed stop words
#go to the corpora tab, use the arrow key to scroll down to stop words and hit enter to install
import nltk
import ssl

try:
    _create_unverified_https_context = ssl._create_unverified_context
except AttributeError:
    pass
else:
    ssl._create_default_https_context = _create_unverified_https_context

nltk.download()

NLTK Downloader
---------------------------------------------------------------------------
    d) Download   l) List    u) Update   c) Config   h) Help   q) Quit
---------------------------------------------------------------------------

---------------------------------------------------------------------------
    d) Download   l) List    u) Update   c) Config   h) Help   q) Quit
---------------------------------------------------------------------------


True

In [14]:
from importer import compose_graph

graph = compose_graph()

INFO:llama_index.token_counter.token_counter:> [build_index_from_nodes] Total LLM token usage: 0 tokens
> [build_index_from_nodes] Total LLM token usage: 0 tokens
INFO:llama_index.token_counter.token_counter:> [build_index_from_nodes] Total embedding token usage: 0 tokens
> [build_index_from_nodes] Total embedding token usage: 0 tokens
INFO:llama_index.token_counter.token_counter:> [build_index_from_nodes] Total LLM token usage: 0 tokens
> [build_index_from_nodes] Total LLM token usage: 0 tokens
INFO:llama_index.token_counter.token_counter:> [build_index_from_nodes] Total embedding token usage: 0 tokens
> [build_index_from_nodes] Total embedding token usage: 0 tokens
INFO:llama_index.token_counter.token_counter:> [build_index_from_nodes] Total LLM token usage: 0 tokens
> [build_index_from_nodes] Total LLM token usage: 0 tokens
INFO:llama_index.token_counter.token_counter:> [build_index_from_nodes] Total embedding token usage: 0 tokens
> [build_index_from_nodes] Total embedding token us

[nltk_data] Downloading package stopwords to /Users/peter/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.


In [15]:
# Optional
# Proves that the graph is built

# query_engine = graph.as_query_engine()
response = graph.query("What are some of the solutions that Focused Labs has created?")

print(str(response))
# print(response.get_formatted_sources())

INFO:llama_index.indices.keyword_table.retrievers:> Starting query: What are some of the solutions that Focused Labs has created?
> Starting query: What are some of the solutions that Focused Labs has created?
INFO:llama_index.indices.keyword_table.retrievers:query keywords: ['created', 'labs', 'focused', 'solutions']
query keywords: ['created', 'labs', 'focused', 'solutions']
INFO:llama_index.indices.keyword_table.retrievers:> Extracted keywords: ['labs', 'focused']
> Extracted keywords: ['labs', 'focused']
INFO:llama_index.vector_stores.redis:Using filters: *
Using filters: *
INFO:llama_index.vector_stores.redis:Querying index web-scrape-fl-index
Querying index web-scrape-fl-index
INFO:llama_index.vector_stores.redis:Found 2 results for query with id ['webscrapefocusedlabsdocs_6531c4b7-be65-4ab2-8ade-1c544f599da9', 'webscrapefocusedlabsdocs_5cdb55b6-3740-4026-a2a4-2fabeb2040d0']
Found 2 results for query with id ['webscrapefocusedlabsdocs_6531c4b7-be65-4ab2-8ade-1c544f599da9', 'websc

KeyboardInterrupt: 

In [1]:
from chat_engine import create_interactive_lang_chain_chat_engine

chat_engine = create_interactive_lang_chain_chat_engine()



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: I need to find the answer to the question "What is pair programming?" using the Focused Labs Domain Data Graph.

Action:
```
{
  "action": "Focused Labs Domain Data Graph",
  "action_input": {
    "question": "What is pair programming?"
  }
}
```

[0m
Observation: [36;1m[1;3mPair programming at Focused Labs is a core collaboration practice where two equal partners work together on one computer, with two monitors, keyboards, and people. It involves one programmer writing code (the driver) while the other reviews each line of code (the observer or navigator). The two programmers switch roles frequently, and the practice is aimed at helping each other grow skills, sharing knowledge about the codebase and tools, and enabling the Apertures team with the benefits of Extreme Programming.[0m
Thought:[32;1m[1;3mThe answer to the question "What is pair programming?" is that it is a core collaboration practice at Focused 