<a href="https://colab.research.google.com/github/jeffvestal/elastic_jupyter_notebooks/blob/main/load_embedding_model_from_hf_to_elastic.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Loading an Sentense Transformer model from Hugging Face into Elastic

This code will show you how to set up an ingest pipeline to generate vectors for documents on ingest.

Overview of steps
1. Set up our python environment
2. Setup index mapping
3. Configure ingest pipeline
4. Index a couple test documents

### Requirements
This notebook assumes you already have loaded an embedding model into elasticsearch. If you haven't, please start with [this notebook example](https://github.com/jeffvestal/elastic_jupyter_notebooks/blob/main/load_embedding_model_from_hf_to_elastic.ipynb)


### Elastic version support
Requires Elastic version 8.0+ with a platinum or enterprise license (or trial license)

You can set up a [free trial elasticsearch Deployment in Elastic Cloud](https://cloud.elastic.co/registration).

# Setup
This section will set up the python environment with the required libraries

## Install and import required python libraries

Elastic uses the [eland python library](https://github.com/elastic/eland) to download modesl from Hugging Face hub and load them into elasticsearch

In [1]:
pip install eland

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting eland
  Downloading eland-8.3.0-py3-none-any.whl (143 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m143.7/143.7 KB[0m [31m3.0 MB/s[0m eta [36m0:00:00[0m
Collecting elasticsearch<9,>=8.3
  Downloading elasticsearch-8.6.1-py3-none-any.whl (385 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m385.4/385.4 KB[0m [31m19.7 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting elastic-transport<9,>=8
  Downloading elastic_transport-8.4.0-py3-none-any.whl (59 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m59.5/59.5 KB[0m [31m3.6 MB/s[0m eta [36m0:00:00[0m
Collecting urllib3<2,>=1.26.2
  Downloading urllib3-1.26.14-py2.py3-none-any.whl (140 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m140.6/140.6 KB[0m [31m2.5 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: urllib3, elastic-transport, ela

In [2]:
pip install elasticsearch

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [3]:
pip install transformers

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting transformers
  Downloading transformers-4.26.0-py3-none-any.whl (6.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.3/6.3 MB[0m [31m20.4 MB/s[0m eta [36m0:00:00[0m
Collecting tokenizers!=0.11.3,<0.14,>=0.11.1
  Downloading tokenizers-0.13.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.6/7.6 MB[0m [31m19.0 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting huggingface-hub<1.0,>=0.11.0
  Downloading huggingface_hub-0.12.0-py3-none-any.whl (190 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m190.3/190.3 KB[0m [31m10.2 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: tokenizers, huggingface-hub, transformers
Successfully installed huggingface-hub-0.12.0 tokenizers-0.13.2 transformers-4.26.0


In [4]:
pip install sentence_transformers

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting sentence_transformers
  Downloading sentence-transformers-2.2.2.tar.gz (85 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m86.0/86.0 KB[0m [31m3.7 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting sentencepiece
  Downloading sentencepiece-0.1.97-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m36.6 MB/s[0m eta [36m0:00:00[0m
Building wheels for collected packages: sentence_transformers
  Building wheel for sentence_transformers (setup.py) ... [?25l[?25hdone
  Created wheel for sentence_transformers: filename=sentence_transformers-2.2.2-py3-none-any.whl size=125938 sha256=3dd71edc95b5e5c3cebba9fac90c7fa7f18e909ac740311c75bb6922ed64381a
  Stored in directory: /root/.cache/pip/wheels/5e/6f/8c/d88aec621f3f5

In [5]:
pip install torch==1.11

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting torch==1.11
  Downloading torch-1.11.0-cp38-cp38-manylinux1_x86_64.whl (750.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m750.6/750.6 MB[0m [31m2.0 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: torch
  Attempting uninstall: torch
    Found existing installation: torch 1.13.1+cu116
    Uninstalling torch-1.13.1+cu116:
      Successfully uninstalled torch-1.13.1+cu116
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
torchvision 0.14.1+cu116 requires torch==1.13.1, but you have torch 1.11.0 which is incompatible.
torchtext 0.14.1 requires torch==1.13.1, but you have torch 1.11.0 which is incompatible.
torchaudio 0.13.1+cu116 requires torch==1.13.1, but you have torch 1.11.0 which is incompatible.[0m[31m
[0mSuc

In [27]:
from pathlib import Path
from eland.ml.pytorch import PyTorchModel
from eland.ml.pytorch.transformers import TransformerModel
from elasticsearch import Elasticsearch, helpers
from elasticsearch.client import MlClient
from pprint import pprint

## Configure elasticsearch authentication. 
The recommended authentication approach is using the [Elastic Cloud ID](https://www.elastic.co/guide/en/cloud/current/ec-cloud-id.html) and a [cluster level API key](https://www.elastic.co/guide/en/kibana/current/api-keys.html)

You can use any method you wish to set the required credentials. We are using getpass in this example to prompt for credentials to avoide storing them in github.

In [7]:
import getpass

In [8]:
es_cloud_id = getpass.getpass('Enter Elastic Cloud ID:  ')
es_api_id = getpass.getpass('Enter cluster API key ID:  ') 
es_api_key = getpass.getpass('Enter cluster API key:  ')

Enter Elastic Cloud ID:  ··········
Enter cluster API key ID:  ··········
Enter cluster API key:  ··········


## Connect to Elastic Cloud

In [9]:
es = Elasticsearch(cloud_id=es_cloud_id, 
                   api_key=(es_api_id, es_api_key)
                   )
es.info() # should return cluster info

ObjectApiResponse({'name': 'instance-0000000001', 'cluster_name': 'a7bf48bf42ad403ab45dd6b90b860f85', 'cluster_uuid': 'gEbjuhUOSyCVzG4Gz2SQ2w', 'version': {'number': '8.6.0', 'build_flavor': 'default', 'build_type': 'docker', 'build_hash': 'f67ef2df40237445caa70e2fef79471cc608d70d', 'build_date': '2023-01-04T09:35:21.782467981Z', 'build_snapshot': False, 'lucene_version': '9.4.2', 'minimum_wire_compatibility_version': '7.17.0', 'minimum_index_compatibility_version': '7.0.0'}, 'tagline': 'You Know, for Search'})

# Model Information and Status

## View information about the model
This is not required but will allow us to get the model_id as it is stored in elasticsearch as well as verify the model is running / deployed and ready to use in our ingest pipeline

In [10]:
m = MlClient.get_trained_models(es)
m.body

{'count': 17,
 'trained_model_configs': [{'model_id': 'bert-base-uncased',
   'model_type': 'pytorch',
   'created_by': 'api_user',
   'version': '8.1.0',
   'create_time': 1649359786787,
   'model_size_bytes': 0,
   'estimated_operations': 0,
   'license_level': 'platinum',
   'description': "Model bert-base-uncased for task type 'fill_mask'",
   'tags': [],
   'input': {'field_names': ['text_field']},
   'inference_config': {'fill_mask': {'vocabulary': {'index': '.ml-inference-native-000001'},
     'tokenization': {'bert': {'do_lower_case': True,
       'with_special_tokens': True,
       'max_sequence_length': 512,
       'truncate': 'first',
       'span': -1}},
     'num_top_classes': 0}},
   'location': {'index': {'name': '.ml-inference-native-000001'}}},
  {'model_id': 'bhadresh-savani__distilbert-base-uncased-emotion',
   'model_type': 'pytorch',
   'created_by': 'api_user',
   'version': '8.1.0',
   'create_time': 1649342073282,
   'model_size_bytes': 0,
   'estimated_operatio

## Set the model_id for ease of reference later
To make is easy for reference later, we will set  `es_model_id` to the `model_id` listed in the output above

In [12]:
es_model_id = "sentence-transformers__msmarco-minilm-l-12-v3"

### *If* the model is not started we will need to deploy the model

You will only need to run this if the model hasn't been deployed. 

This will load the model on the ML nodes and start the process(es) making it available for the NLP task

uncomment the code below

In [None]:
#s = MlClient.start_trained_model_deployment(es, model_id=es_model_id)
#s.body

#### Verify the model started without issue
If you aren't sure if the model is started you can check here

In [13]:
stats = MlClient.get_trained_models_stats(es, model_id=es_model_id)
stats.body['trained_model_stats'][0]['deployment_stats']['nodes'][0]['routing_state']

{'routing_state': 'started'}

# Elasticsearch index setup
Here we will configure an index template with settings and mappings to store our vectors and text data

The **important** part here will be setting our vector field to be a `dense_vector` type. This will tell elasticsearch to build the HNSW graph for the vectors so we can then use kNN search later. 

## Define the index template
We will have the following fields

- `vectors` of type `dense_vector`
-- it is important to set `dims` to the number of dimensions the model you will use outputs
- `title` of type `text`
- `summary` of type `text`

We will have 
- 1 primary shard
- 0 replica -> *note* in production you will want at least 1 replica

This will match new indices with the name matching the pattern of `jupyter-vector-demo*`

In [48]:
index_patterns = "jupyter-vector-demo*"
settings= {
        "number_of_shards": 1,
        "number_of_replicas": 0
    }
mappings= {
        "properties": {
            "vectors": {
                "type": "dense_vector",
                "dims": 384,
                "index" : True,
                "similarity" : "cosine"
            },
            "title": {
                "type": "text"
            },
            "summary": {
                "type": "text"
            }
        }
    }

## Apply the template
Here we apply the templat and give it a name of `jupyter-vector-demo`. This is just the name of the template if we need to modify it later on.

In [49]:
es.indices.put_template(name="jupyter-vector-demo-template", 
                        index_patterns=index_patterns,
                        settings=settings,
                        mappings=mappings
                        )

  es.indices.put_template(name="jupyter-vector-demo-template",


ObjectApiResponse({'acknowledged': True})

# The Ingest Pipeline

An ingest pipeline has one or more processors and processes documents before they are written into an elasticsearch index. 

Each processor is designed to perform a various task such as parsing fields or enriching data. 

The main processor for this pipeline is the `inference` processor. The inference processor sends a specified field to a supervised model and writes the output from the model to a new field along with the original fields in the document. 

To make it simpler to access the vector, we will copy the vectors to a field named `vectors` and them remove the `ml` field tree which is the default output.

## Configure the pipeline

In [18]:
pipeline_definition = {
    "description": "A pipeline for generating and storing vectors on ingest",
    "processors": [
      {
       "inference": {
          "model_id": "sentence-transformers__msmarco-minilm-l-12-v3",
          "field_map": {
           "summary": "text_field"
          }
       }
     },
     {
      "set": {
        "field": "vectors",
        "copy_from": "ml.inference.predicted_value"
        }
     },
    {
      "remove": {
        "field": "ml"
      }
    }
  ]
}



## Create the pipeline if it doesn't exist

In [19]:
if es.ingest.put_pipeline(id="jupyter-vector-demo-pipeline", body=pipeline_definition):
    print("Pipeline created successfully")
else:
    print("Failed to create pipeline")


  if es.ingest.put_pipeline(id="jupyter-vector-demo-pipeline", body=pipeline_definition):


Pipeline created successfully


## Verify the pipeline
Not required but nice to verify everything looks correct

In [24]:
pipeline = es.ingest.get_pipeline(id="jupyter-vector-demo-pipeline")
pipeline.body

{'jupyter-vector-demo-pipeline': {'description': 'A pipeline for generating and storing vectors on ingest',
  'processors': [{'inference': {'model_id': 'sentence-transformers__msmarco-minilm-l-12-v3',
     'field_map': {'summary': 'text_field'}}},
   {'set': {'field': 'vectors', 'copy_from': 'ml.inference.predicted_value'}},
   {'remove': {'field': 'ml'}}]}}

---
---
# Ingest Docs and Generate Vectors
---
---

## Create sample documents
These aren't real blogs just sampls ChatGPT created for me :) 

In [25]:
samples = [["The Power of Word Embeddings in NLP", "Word embeddings have revolutionized the field of NLP."  ],  
    ["An Introduction to Transformer Models", "Transformer models have taken NLP by storm."  ],  
    ["Fine-Tuning BERT for Text Classification", "Fine-tuning BERT can lead to state-of-the-art results in text classification."  ],  
    ["Why GPT-3 is a Game Changer for NLP", "GPT-3 has set a new standard for language models in NLP."  ],  
    ["Using ELMO for Sentiment Analysis", "ELMO can effectively capture contextual information for sentiment analysis."  ],  
    ["The Rise of Pre-Trained Models in NLP", "Pre-trained models have become increasingly popular in NLP."  ]
]

## Create the list of docs to ingest

In [26]:
docs = [
    {   "_index": "jupyter-vector-demo",
        "_source": {
           "title": sample[0], 
           "summary": sample[1]
        }
    }
    for sample in samples
]

## Index the docs 
This will send a bulk index request to elastic, sending all the docs through the ingest pipeline, generating vectors, and storing them in elasticsearch

In [51]:
helpers.bulk(es, docs, pipeline="jupyter-vector-demo-pipeline", create_if_missing=True)

TypeError: ignored

## Verify one of the docs 
Let's take a look at one doc and see how it was indexed

In [35]:
result = es.search(index='jupyter-vector-demo', body={}, size=1)
result.body['hits']['hits'][0]['_source']

{'summary': 'Word embeddings have revolutionized the field of NLP.',
 'vectors': [0.010032681748270988,
  0.1762128621339798,
  0.025519631803035736,
  -0.1699194610118866,
  -0.023978114128112793,
  -0.17380699515342712,
  -0.16619500517845154,
  -0.4496205449104309,
  0.14203619956970215,
  -0.025377998128533363,
  -0.21256506443023682,
  0.3052826225757599,
  -0.048612333834171295,
  -0.25566211342811584,
  0.0038711531087756157,
  0.2568204402923584,
  -0.4086630940437317,
  0.3276959955692291,
  0.18598729372024536,
  -0.08290590345859528,
  -0.06666664034128189,
  0.33053335547447205,
  0.33372732996940613,
  -0.1446480005979538,
  0.4143035411834717,
  -0.11616694182157516,
  -0.003925261087715626,
  -0.002277584746479988,
  0.11438579857349396,
  -0.5439679026603699,
  0.27566054463386536,
  -0.0374893993139267,
  -0.08002748340368271,
  0.010440019890666008,
  -0.1600598394870758,
  0.3334594666957855,
  -0.10152608156204224,
  0.02321258932352066,
  0.20382066071033478,
  -0.

---
---
# knn
---



## Generate Vector for Query

Before we can run an approximate k-nearest neighbor (kNN) query, we need to convert our query string to a vector.

Set a sample query doc

Depending on your specific model, you may need to change the field name from "text_field"

In [36]:
docs =  [
    {
      "text_field": "State of the art nlp models"
    }
  ]

We call the `_infer` endpoint supplying the model_id and the doc[s] we want to vectorize. 

In [37]:
vec = MlClient.infer_trained_model(es, model_id=es_model_id, docs=docs, )

The vector for the first doc can be accessed in the response dict as shown below

In [38]:
doc_0_vector = vec['inference_results'][0]['predicted_value']
doc_0_vector

[-0.05313778668642044,
 0.2675938904285431,
 -0.1571311205625534,
 -0.16366317868232727,
 0.1534436196088791,
 0.4014796018600464,
 0.09830273687839508,
 -0.4107570946216583,
 0.6688247919082642,
 0.18063218891620636,
 0.23392875492572784,
 0.25056707859039307,
 0.1332893967628479,
 -0.027977390214800835,
 0.19046132266521454,
 0.11570954322814941,
 -0.24199819564819336,
 -0.1414170265197754,
 0.5337180495262146,
 0.5993724465370178,
 0.30228930711746216,
 0.09154966473579407,
 0.17977407574653625,
 0.14795929193496704,
 0.3506891429424286,
 -0.18918591737747192,
 0.41521453857421875,
 0.2111051082611084,
 0.038915835320949554,
 -0.09822694212198257,
 -0.1743984818458557,
 -0.24724091589450836,
 -0.35224899649620056,
 0.28879034519195557,
 0.3031083047389984,
 0.24868538975715637,
 -0.41746076941490173,
 0.009341837838292122,
 0.36109238862991333,
 -0.07405922561883926,
 0.33332574367523193,
 0.212000772356987,
 0.04666478931903839,
 0.0004928873386234045,
 0.15392138063907623,
 0.1711

## Run the Search

We will call the `_search` api and specify the `knn` section. 

This is a simple example of a search query. Elastic supports combining kNN search with "traditional" BM25 search. You can also filter documents to reduce the number of docs that needs to be searched. See the [docs](https://www.elastic.co/guide/en/elasticsearch/reference/current/knn-search-api.html) for more information.

This will be a very simple example to get started

### Create the search body

In [39]:
body = {
    "knn": {
    "field": "vector",
    "query_vector": doc_0_vector,
    "k": 2,
    "num_candidates": 10
  }
}

In [44]:
knn = {
    "field": "vectors",
    "query_vector": doc_0_vector,
    "k": 2,
    "num_candidates": 10
  }

In [45]:
result = es.search(index='jupyter-vector-demo', knn=knn, size=1)


BadRequestError: ignored