# Elastic Search Population

In [None]:
ES_HOST = 'localhost'
ES_PORT = 9200
DATA_FILE = 'data/big_author_data.p'

To use elastic search we must populate it. Loading the author data and writing it is relatively trivial in python.

You can read about the [simple elastic search client](https://elasticsearch-py.readthedocs.io/en/master/).

In [None]:
import pickle

with open(DATA_FILE, 'rb') as handle:
    author_data = pickle.load(handle)

In [None]:
es_records = (
    {
        **entry,
        '_id': entry['id'],
        '_index': 'documents',
        '_type': 'document'
    }
    for entry in author_data
)

The next step will take some time to execute.

In [None]:
from collections import deque
from elasticsearch import Elasticsearch
from elasticsearch.helpers import parallel_bulk

client = Elasticsearch([f'{ES_HOST}:{ES_PORT}'], sniff_on_start=True)

# parallel_bulk returns a generator which must be consumed
# deque is used to do that
deque(parallel_bulk(client, es_records), maxlen=0)

To validate that this has worked we can ask for the number of documents in the index.

In [None]:
import requests

requests.get(f'http://{ES_HOST}:{ES_PORT}/documents/document/_search', params={'size': 0}).json()