# Test indexing functions

The notebook demos setting up and editing an OpenSearch index using the Haystack framework. You must have Docker Desktop installed prior to using OpenSearch.

To install necessary packages, run `pip install -e '.[search_backend]'`.

Before running this notebook, set up an Opensearch container (see docker-compose.yml) by running:
```
docker compose up localstack
```
Or alternatively follow instructions here: https://docs.haystack.deepset.ai/v2.0/docs/opensearchbm25retriever

You will also need the JSON file dummy-products-20241015.json. This is [kept on the wiki](https://dsdmoj.atlassian.net/wiki/spaces/AN/pages/5214503074/Dummy+data) for privacy purposes. Copy it into the same directory as this notebook.

In [None]:
import json

from haystack import Document
from haystack_integrations.document_stores.opensearch import OpenSearchDocumentStore
from search_backend.indexing_pipeline import IndexingPipeline

## Read data

In [None]:
with open('../tests/data/demo_data.json') as f:
    doc_list = json.load(f)

print(doc_list)

In [None]:
doc_list[:2]

## Connect to OpenSearch container

In [None]:
# Connect to an existing Opensearch document store
query_document_store = OpenSearchDocumentStore(
    hosts="http://0.0.0.0:4566/opensearch/eu-west-2/rd-demo",
    use_ssl=False,
    verify_certs=False,
    http_auth=("localstack", "localstack"),
)

## Initialise docstore and write first document

In [None]:
docs = [Document(**content) for content in doc_list[:2]]

indexer = IndexingPipeline(query_document_store)
indexer.index_docs(docs)

### Check what's in the docstore

Count how many docs are currently in the docstore:

In [None]:
query_document_store.count_documents()

Check the contents of the docstore:

In [None]:
query_document_store.filter_documents()

## Try adding another document

In [None]:
docs = [Document(**content) for content in doc_list[2:3]]
indexer.index_docs(docs)

In [None]:
query_document_store.count_documents()

In [None]:
query_document_store.filter_documents()

In [None]:
docs = [Document(**content) for content in doc_list[3:4]]
indexer.index_docs(docs)

In [None]:
query_document_store.count_documents()

In [None]:
query_document_store.filter_documents()

## Check behaviour when trying to add a duplicate doc

In [None]:
indexer.index_docs(docs)

In [None]:
query_document_store.count_documents()

In [None]:
query_document_store.filter_documents()

## Try removing a document

In [None]:
doc_ids = [doc.id for doc in query_document_store.filter_documents()]

In [None]:
id_to_delete = doc_ids[2]

In [None]:
indexer.delete_docs([id_to_delete], id_metafield="id")

In [None]:
query_document_store.count_documents()

In [None]:
query_document_store.filter_documents()