**Copyright 2023 The MediaPipe Authors. All Rights Reserved.**

In [None]:
#@title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Text Embedding with MediaPipe Tasks

This notebook will show you how to use the MediaPipe Tasks Python API to compare text items, giving you a value for how similar they are. These values will range from -1 to 1 with 1 being the same text. This is done through cosine similarity.

##Preparation

Let's start with installing MediaPipe.

In [None]:
!pip install -q mediapipe

After installing your dependencies, you can download the text embedder model that will be used for this example.

In [None]:
#@title Start downloading here.
!wget -O embedder.tflite -q https://storage.googleapis.com/mediapipe-models/text_embedder/bert_embedder/float32/1/bert_embedder.tflite

## Running inference

To run inference using the MediaPipe Text Embedder task, you just need to initialize the `TextEmbedder` using the model that you downloaded earlier, and then use that `TextEmbedder` to compare two separate strings. You can edit the two strings that will be used on the side of this section where you see `first_text` and `second_text`.

In [None]:
from mediapipe.tasks import python
from mediapipe.tasks.python import text

# Create your base options with the model that was downloaded earlier
base_options = python.BaseOptions(model_asset_path='embedder.tflite')

# Set your values for using normalization and quantization
l2_normalize = True #@param {type:"boolean"}
quantize = False #@param {type:"boolean"}

# Create the final set of options for the Embedder
options = text.TextEmbedderOptions(
    base_options=base_options, l2_normalize=l2_normalize, quantize=quantize)

with text.TextEmbedder.create_from_options(options) as embedder:
  # Retrieve the first and second sets of text that will be compared
  first_text = "I'm feeling so good" #@param {type:"string"}
  second_text = "I'm okay I guess" #@param {type:"string"}

  # Convert both sets of text to embeddings
  first_embedding_result = embedder.embed(first_text)
  second_embedding_result = embedder.embed(second_text)

  # Calculate and print similarity
  similarity = text.TextEmbedder.cosine_similarity(
      first_embedding_result.embeddings[0],
      second_embedding_result.embeddings[0])
  print(similarity)