# MyScale Index Demo

#### Creating a MyScale Client

In [1]:
import logging
import sys

logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

In [2]:
import clickhouse_connect

# initialize client
client = clickhouse_connect.get_client(
    host='YOUR_CLUSTER_HOST', 
    port=8443, 
    username='YOUR_USERNAME', 
    password='YOUR_CLUSTER_PASSWORD'
)

INFO:numexpr.utils:Note: NumExpr detected 12 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
Note: NumExpr detected 12 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
INFO:numexpr.utils:NumExpr defaulting to 8 threads.
NumExpr defaulting to 8 threads.
INFO:clickhouse_connect.driver.ctypes:Successfully imported ClickHouse Connect C data optimizations
Successfully imported ClickHouse Connect C data optimizations
INFO:clickhouse_connect.driver.ctypes:Successfully import ClickHouse Connect C/Numpy optimizations
Successfully import ClickHouse Connect C/Numpy optimizations
INFO:clickhouse_connect.json_impl:Using python library for writing JSON byte strings
Using python library for writing JSON byte strings


#### Load documents, build the GPTMyscaleIndex

In [4]:
from gpt_index import GPTMyscaleIndex, SimpleDirectoryReader
from IPython.display import Markdown, display

  from .autonotebook import tqdm as notebook_tqdm


In [5]:
# load documents
documents = SimpleDirectoryReader('../paul_graham_essay/data').load_data()

In [6]:
# initialize without metadata filter
index = GPTMyscaleIndex.from_documents(documents, myscale_client=client)

None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
Token indices sequence length is longer than the specified maximum sequence length for this model (3383 > 1024). Running this sequence through the model will result in indexing errors


INFO:gpt_index.token_counter.token_counter:> [build_index_from_nodes] Total LLM token usage: 0 tokens
> [build_index_from_nodes] Total LLM token usage: 0 tokens
INFO:gpt_index.token_counter.token_counter:> [build_index_from_nodes] Total embedding token usage: 17617 tokens
> [build_index_from_nodes] Total embedding token usage: 17617 tokens


#### Query Index

In [7]:
# set Logging to DEBUG for more detailed outputs
response = index.query("What did the author do growing up?")

INFO:gpt_index.token_counter.token_counter:> [query] Total LLM token usage: 3897 tokens
> [query] Total LLM token usage: 3897 tokens
INFO:gpt_index.token_counter.token_counter:> [query] Total embedding token usage: 8 tokens
> [query] Total embedding token usage: 8 tokens


In [8]:
display(Markdown(f"<b>{response}</b>"))

<b>

Growing up, the author wrote short stories, programmed on an IBM 1401, wrote simple games and a word processor on a TRS-80, studied philosophy in college, learned Lisp, reverse-engineered SHRDLU, wrote a book about Lisp hacking, took art classes at Harvard, and painted still lives in his bedroom at night. He also had the opportunity to observe a nude model in his art classes, and learned that she made a living from modelling and making fakes for a local antique dealer.</b>