# Weaviate Index Demo

#### Creating a Weaviate Client

In [1]:
import weaviate

In [2]:
resource_owner_config = weaviate.AuthClientPassword(
  username = "<username>", 
  password = "<password>", 
)

In [5]:
#client = weaviate.Client("https://<cluster-id>.semi.network/", auth_client_secret=resource_owner_config)
client = weaviate.Client("http://localhost:8080")

#### Load documents, build the GPTWeaviateIndex

In [6]:
from gpt_index import GPTWeaviateIndex, SimpleDirectoryReader
from IPython.display import Markdown, display

In [7]:
# load documents
documents = SimpleDirectoryReader('../paul_graham_essay/data').load_data()

In [10]:
len(documents)

1

In [8]:
index = GPTWeaviateIndex(documents, weaviate_client=client)

# NOTE: you may also choose to define a class_prefix manually.
# class_prefix = "test_prefix"
# index = GPTWeaviateIndex(documents, weaviate_client=client, class_prefix=class_prefix)

> Adding chunk: 		

What I Worked On

February 2021

Before col...
> Adding chunk: only up to age 25 and already there are such co...
> Adding chunk: clear that it was even possible. To find out, w...
> Adding chunk: a name for the kind of company Viaweb was, an "...
> Adding chunk: get their initial set of customers almost entir...
> Adding chunk: had smart people and built impressive technolog...
> [build_index_from_documents] Total LLM token usage: 0 tokens
> [build_index_from_documents] Total embedding token usage: 17621 tokens


In [9]:
# save index to disk
index.save_to_disk('index_weaviate.json')

In [11]:
schema = client.schema.get()
schema

{'classes': [{'class': 'Gpt_Index_7525487223651875604_Node',
   'description': 'Class for Gpt_Index_7525487223651875604_Node',
   'invertedIndexConfig': {'bm25': {'b': 0.75, 'k1': 1.2},
    'cleanupIntervalSeconds': 60,
    'stopwords': {'additions': None, 'preset': 'en', 'removals': None}},
   'properties': [{'dataType': ['string'],
     'description': 'Text property',
     'name': 'text',
     'tokenization': 'word'},
    {'dataType': ['string'],
     'description': 'Document id',
     'name': 'doc_id',
     'tokenization': 'word'},
    {'dataType': ['string'],
     'description': 'extra_info (in JSON)',
     'name': 'extra_info',
     'tokenization': 'word'},
    {'dataType': ['int'],
     'description': 'The index of the Node',
     'name': 'index'},
    {'dataType': ['int[]'],
     'description': 'The child_indices of the Node',
     'name': 'child_indices'},
    {'dataType': ['string'],
     'description': 'The ref_doc_id of the Node',
     'name': 'ref_doc_id',
     'tokenizatio

In [14]:
import json
print(json.dumps(schema, indent=2))

{
  "classes": [
    {
      "class": "Gpt_Index_7525487223651875604_Node",
      "description": "Class for Gpt_Index_7525487223651875604_Node",
      "invertedIndexConfig": {
        "bm25": {
          "b": 0.75,
          "k1": 1.2
        },
        "cleanupIntervalSeconds": 60,
        "stopwords": {
          "additions": null,
          "preset": "en",
          "removals": null
        }
      },
      "properties": [
        {
          "dataType": [
            "string"
          ],
          "description": "Text property",
          "name": "text",
          "tokenization": "word"
        },
        {
          "dataType": [
            "string"
          ],
          "description": "Document id",
          "name": "doc_id",
          "tokenization": "word"
        },
        {
          "dataType": [
            "string"
          ],
          "description": "extra_info (in JSON)",
          "name": "extra_info",
          "tokenization": "word"
        },
        {
       

In [15]:
# load index from disk
index = GPTWeaviateIndex.load_from_disk('index_weaviate.json', weaviate_client=client)

#### Query Index

In [16]:
# try verbose=True for more detailed outputs
response = index.query("What did the author do growing up?", verbose=True)

> Top 1 nodes:
> [Node 0] 		

What I Worked On

February 2021

Before college the two main things I worked on, outside of s...
> Searching in chunk: 		

What I Worked On

February 2021

Before col...
> Initial response: 
The author grew up writing short stories, programming on an IBM 1401, and building a computer kit with a friend. He also wrote simple games, a program to predict how high his model rockets would fly, and a word processor. He studied philosophy in college, but switched to AI and taught himself Lisp. He wrote a book about Lisp hacking and reverse-engineered SHRDLU. He also took art classes at Harvard and applied to art schools.
> Refine context: limited vocabulary. [2]

I'm only up to age 25 ...
> Refined response: 

The author grew up writing short stories, programming on an IBM 1401, and building a computer kit with a friend. He also wrote simple games, a program to predict how high his model rockets would fly, and a word processor. He studied philosophy in college, bu

In [17]:
display(Markdown(f"<b>{response}</b>"))

<b>

The author grew up writing short stories, programming on an IBM 1401, and building a computer kit with a friend. He also wrote simple games, a program to predict how high his model rockets would fly, and a word processor. He studied philosophy in college, but switched to AI and taught himself Lisp. He wrote a book about Lisp hacking and reverse-engineered SHRDLU. He also took art classes at Harvard and applied to art schools, but was disappointed by the lack of teaching and learning in the painting department at the Accademia. He also had experience with 19th century studio painting conventions, such as having a little stove fed with kindling and a nude model sitting as close to it as possible.</b>