## Check the Connection and the Number of Vectors

In [48]:
from elasticsearch import Elasticsearch

ES_SERVER = "10.94.253.5"

index_doc = 'november2019'
index_vec = 'recent_vectors'

es = Elasticsearch(ES_SERVER)
print(es.count(index=index_vec, body={'query': {'match_all': {}}}))

{'count': 6272, '_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0}}


## Define Queries and Embed them into the Same Vector Space

In [43]:
from ProcessorVectorizer import Vectorizer

vt = Vectorizer('muse', '../../models/muse')
queries = ['Coronavirus prevention is important.', 'Data privacy was compromised.']
query_vectors = vt.vectorize(queries)

## View Results

In [53]:
for query_text, query_vector in zip(queries, query_vectors):
    # Perform the cosine similarity query
    query = {
        "script_score": {
            "query": {
                "match_all": {}
            },
            "script": {
                "source": "cosineSimilarity(params.query_vector, 'vector') + 1.0",
                "params": {
                     "query_vector": query_vector
                }
            }
        }
    }

    body = { "query": query, "size" : 100}
    response = es.search(index=index_vec, body=body, request_timeout=120)

    # Evaluate the search
    if response['hits']['total']['value']:
        _id = response['hits']['hits'][0]['_id']
        _score = response['hits']['hits'][0]['_score']
        text = es.get(index=index_doc, id=_id)['_source']['article']['maintext']
        print(f'Query text: {query_text}\nScore: {_score}\n\n{text}\n\n{"=" * 50}\n')

Query text: Coronavirus prevention is important.
Score: 1.4756657

The estimated number of coronavirus deaths fluctuate depending on prior data. Current estimations in the US went up to a little less than 66,000 by August on Tuesday. Experts say these numbers depend on social distancing being maintained till then. States like Georgia recently announced lifts on their stay-at-home orders beginning this week. The exact revised death estimate of the impact of the coronavirus outbreak in the United States is 74,073.


Query text: Data privacy was compromised.
Score: 1.3049575

Полиция задержала мошенников, которые продавали бесплатные цифровые пропуска. Бизнес подозреваемые наладили в мессенджере. Через канал предлагали получить цифровой пропуск для передвижения по Москве и области. У покупателя запрашивали паспортные данные и регистрационный номер машины. За свои услуги мошенники просили 3,5 тыс. рублей. После оплаты они переставали выходить на связь.


