# Load Snapshot

This notebook has instructions to create the indices of VerbCL into a running ElasticSearch instance.

* Adjust the file `elastic-local.env` with the correct hostname and port.
* Adjust the path to VerbCL.tar.xz: PATH_TO_VERBCL
* Adjust the path where the data can be uncompressed: PATH_TO_DATA
* Adjust the name of the new snapshot repository: REPO_NAME

In [None]:
PATH_TO_VERBCL = "verbcl.tar.xz"
PATH_TO_DATA = "/data"
REPO_NAME = "verbcl_repository"
SNAP_NAME = "verbcl_1.0"

## Uncompress the Archive

In [None]:
import tarfile

In [None]:
with tarfile.open(PATH_TO_VERBCL, "r:*") as txz:
    txz.extractall(PATH_TO_DATA)

## Connect to ElasticSearch

If more security is required, adjust the code. See [documentation](https://elasticsearch-py.readthedocs.io/en/v7.12.1/api.html#elasticsearch).

In [None]:
import os
from elasticsearch import Elasticsearch

In [None]:
# More security? Adjust here
es = Elasticsearch(host=os.getenv("ELASTIC_HOST"), port=os.getenv("ELASTIC_PORT"))

## Create the Repository

* Create a new snapshot repository for the ElasticSearch instance
* Define it as a filesystem repository, pointing to the unarchived data
* The name should not exist already

In [None]:
es.snapshot.create_repository(
    repository=REPO_NAME, 
    body={
        "type": "fs", 
        "settings": {
            "location": os.path.join(PATH_TO_DATA, "VerbCL"),
            "compressed": True
        }
    }
)

## Restore the Indices

**At this stage, indices will be created within the instances**
* `verbcl_opinions`
* `verbcl_citation_graph`
* `verbcl_highlights`

In [None]:
es.snapshot.restore(repository=REPO_NAME, snapshot=SNAP_NAME)

Wait for completion of the restore task...

In [None]:
for idx in ["verbcl_opinions", "verbcl_citation_graph", "verbcl_highlights"]:
    assert es.indices.exists(idx), f"ERROR: {idx} does not exist"