# 01 - Combined Extract and Build

## Setup

If you haven't already, install the toolkit and dependencies using the [Setup](./00-Setup.ipynb) notebook.

## Continous ingest

See [Continous ingest](https://github.com/awslabs/graphrag-toolkit/blob/main/docs/lexical-graph/indexing.md#continous-ingest).

In [1]:
%reload_ext dotenv
%dotenv

import os

from graphrag_toolkit.lexical_graph import LexicalGraphIndex, set_logging_config
from graphrag_toolkit.lexical_graph.storage import VectorStoreFactory
from graphrag_toolkit.lexical_graph.storage import GraphStoreFactory
from graphrag_toolkit.lexical_graph.storage.graph.falkordb import FalkorDBGraphStoreFactory

GraphStoreFactory.register(FalkorDBGraphStoreFactory)

from llama_index.readers.web import SimpleWebPageReader

graph_store = GraphStoreFactory.for_graph_store(os.environ['GRAPH_STORE'])
vector_store = VectorStoreFactory.for_vector_store(os.environ['VECTOR_STORE'])

graph_index = LexicalGraphIndex(
    graph_store, 
    vector_store
)

doc_urls = [
    'https://docs.aws.amazon.com/neptune/latest/userguide/intro.html',
    'https://docs.aws.amazon.com/neptune-analytics/latest/userguide/what-is-neptune-analytics.html',
    'https://docs.aws.amazon.com/neptune-analytics/latest/userguide/neptune-analytics-features.html',
    'https://docs.aws.amazon.com/neptune-analytics/latest/userguide/neptune-analytics-vs-neptune-database.html'
]

docs = SimpleWebPageReader(
    html_to_text=True,
    metadata_fn=lambda url:{'url': url}
).load_data(doc_urls)

graph_index.extract_and_build(docs, show_progress=True)

print('Complete')

Extracting propositions [nodes: 5, num_workers: 4]: 100%|██████████| 5/5 [00:10<00:00,  2.05s/it]t]
Extracting propositions [nodes: 10, num_workers: 4]: 100%|██████████| 10/10 [00:18<00:00,  1.83s/it]
Extracting topics [nodes: 5, num_workers: 4]: 100%|██████████| 5/5 [00:26<00:00,  5.33s/it]t]
Extracting topics [nodes: 10, num_workers: 4]: 100%|██████████| 10/10 [00:51<00:00,  5.13s/it]
INFO:graphrag_toolkit.lexical_graph.indexing.build.build_pipeline:Running build pipeline [batch_size: 4, num_workers: 2, job_sizes: [451, 199], batch_writes_enabled: True, batch_write_size: 25]
Building graph [batch_writes_enabled: True, batch_write_size: 25]: 100%|██████████| 199/199 [00:00<00:00, 51231.68it/s]
Building graph [batch_writes_enabled: True, batch_write_size: 25]: 100%|██████████| 451/451 [00:00<00:00, 54455.80it/s]
Building vector index [batch_writes_enabled: True, batch_write_size: 25]: 100%|██████████| 199/199 [00:00<00:00, 862258.78it/s]
INFO:botocore.tokens:Loading cached SSO token fo

Complete
