## Connect to the Elasticsearch client with your credentials

In [2]:
from elasticsearch import Elasticsearch, helpers
from getpass import getpass

#Connect to the elastic cloud server
ELASTIC_CLOUD_ID = getpass("Elastic Cloud ID: ")
ELASTIC_API_KEY = getpass("Elastic API Key: ")

# Create an Elasticsearch client using the provided credentials
client = Elasticsearch(
    cloud_id=ELASTIC_CLOUD_ID,  # cloud id can be found under deployment management
    api_key=ELASTIC_API_KEY, # your username and password for connecting to elastic, found under Deplouments - Security
)

  from elasticsearch.client import MlClient


Get the samle lyrics data [we just downloaded with the API.](/lyrics.ipynb)

In [19]:
import json
with open('data/hozier_songs.json', 'r') as f:
  songs = json.load(f)

As a reminder, this is the format of each of our lyrics documents right now after the processing.

In [88]:
songs[0]

{'id': 84741219,
 'name': 'Cherry Wine ',
 'artist': 'Hozier',
 'lyrics': [{'line': 'Her eyes and words are so icy'},
  {'line': 'Oh but she burns'},
  {'line': 'Like rum on the fire'},
  {'line': 'Hot and fast and angry as she can be'},
  {'line': 'I walk my days on a wire.'},
  {'line': "It looks ugly, but it's clean,"},
  {'line': "Oh momma, don't fuss over me."},
  {'line': "The way she tells me I'm hers and she is mine"},
  {'line': 'Open hand or closed fist would be fine'},
  {'line': 'The blood is rare and sweet as cherry wine.'},
  {'line': 'Calls of guilty thrown at me'},
  {'line': '******* Th'}]}

## Put the data in an index

We are creating a nested field for the lyrics so we can search for the inner hits to get the exact lines we want.

In [21]:
index_name = 'songs'

mappings = {
  "properties": {
    "lyrics": {
        "type": "nested",
        "properties": {
          "line": {
            "type": "text"
          }
        }
    },
  }
}

# Create the Elasticsearch index with the specified name (delete if already existing)
if client.indices.exists(index=index_name):
    client.indices.delete(index=index_name)
client.indices.create(index=index_name, mappings=mappings)

# Define a function to convert DataFrame rows to Elasticsearch documents
def generate_docs(data, index_name):
    for document in data:
        yield dict(_index=index_name, _id=f"{document['id']}", _source=document)


# Use the Elasticsearch helpers.bulk() method to index the DataFrame data into Elasticsearch
load = helpers.bulk(client, generate_docs(songs, index_name), index_name)

## Look for a specific line in a song

We can now use a nested query to look up words in our songs and get the specific passage where this would be mentioned.

In [125]:
query = {
    "nested": {
      "path": "lyrics",
      "query": {
        "match": {
          "lyrics.line": "love"
        }
      },
      "inner_hits" : {
        "docvalue_fields" : [
          "lyrics.line.keyword"
        ]
      }
    }
}

#Run a simple query, for example looking for problems with the engine
response = client.search(index=index_name, query=query)

print(f'We get back {response["hits"]["total"]["value"]} songs that fit, here are the top results:')
for hit in response["hits"]["hits"]:
    print(f'From {hit["_source"]["artist"]} : {hit["_source"]["name"]}: ')
    for inner_hit in hit["inner_hits"]["lyrics"]["hits"]["hits"]:
        print(inner_hit["_source"]["line"])
    print()


We get back 3 songs that fit, here are the top results:
From Hozier : Take Me to Church : 
But I love it

From Hozier : Work Song : 
I'm so full of love I could barely eat

From Hozier : Someone New : 
I fall in love just a little ol' little bit
I fall in love just a little ol' little bit
I fall in love just a little ol' little bit



However, this is only returning exact matches, missing out on similar songs about "lovers", "loving", or any similar phrases which I might still want to find. 

So we can take this a step further and add a semantic search model into the mix, to help us really look for meaning in the lyrics.



## Adding ELSER inference for semantic search

We will use a [foreach](https://www.elastic.co/guide/en/elasticsearch/reference/current/foreach-processor.html) processor to loop through all lines of the lyrics.

See [the ELSER Notebook](https://github.com/elastic/elasticsearch-labs/blob/main/notebooks/search/03-ELSER.ipynb) for a simple get-started quide for semantic search; and [this document chunking example](https://github.com/elastic/elasticsearch-labs/blob/main/notebooks/document-chunking/with-index-pipelines.ipynb) for another instance of embedding inner hits. 

In [110]:
client.ingest.put_pipeline(
    id="adding_ELSER_to_lyrics", 
    processors=
    [
        {
            "foreach": {
                "field": "lyrics",
                "processor": {
                    "inference": {
                        "model_id": ".elser_model_2",
                        "input_output": [
                            {"input_field": "_ingest._value.line", "output_field": "_ingest._value.tokens"}
                        ],
                        "on_failure" : [
                        {
                            "set" : {
                                "field": "_ingest._value.errors",
                                "value": "failed in foreach processor"
                            }
                        }]
                    }
                }
            }
        }
    ]
)

mappings = {
    "dynamic" : True,
    "properties" : 
    {
        "lyrics": {
            "type": "nested",
            "properties": {
                "line" : {
                    "type": "text",
                    "fields": {"keyword": {"type": "keyword", "ignore_above": 256}},
                },
                "tokens": { 
                    "type": "sparse_vector" 
                }
            }
        }
    }
}

#Creating the new index with enriched data
index_name_new = "songs_semantic"
if client.indices.exists(index=index_name_new):
    client.indices.delete(index=index_name_new)
client.indices.create(index=index_name_new, mappings=mappings)

client.reindex(body={
      "source": {
          "index": index_name},
      "dest": {"index": index_name_new, "pipeline" : "adding_ELSER_to_lyrics"}
    }, wait_for_completion=False)

ObjectApiResponse({'task': 'JqYuDbWsRueybLrxY3c9Cg:61831402'})

We can now run the same query again, but using `text_expansion` on the generated tokens rather than `match` directly on the text field.

In [123]:
query = {
    "nested": {
        "path": "lyrics",
        "query": {
            "text_expansion": {
                "lyrics.tokens": {
                    "model_id": ".elser_model_2",
                    "model_text": "love",
                }
            }
        },
        "inner_hits" : {
            "docvalue_fields" : [
                "lyrics.line.keyword"
            ]
        }
    }
}

#Run a simple query, for example looking for problems with the engine
response = client.search(index=index_name_new, query=query)

print(f'We get back {response["hits"]["total"]["value"]} songs that fit, here are the top results:')
for hit in response["hits"]["hits"]:
    print(f'From {hit["_source"]["artist"]} : {hit["_source"]["name"]}: ')
    for inner_hit in hit["inner_hits"]["lyrics"]["hits"]["hits"]:
        print(inner_hit["_source"]["line"])
    print()


We get back 8 songs that fit, here are the top results:
From Hozier : Someone New : 
And so I fall in love just a little ol' little bit
I fall in love just a little ol' little bit
I fall in love just a little ol' little bit

From Hozier : Take Me to Church : 
My lover's got humour
But I love it
Is when I'm alone with you—

From Hozier : Work Song : 
I'm so full of love I could barely eat
She give me toothaches just from kissin' me
There's nothing sweeter than my baby

From Hozier : Cherry Wine : 
The way she tells me I'm hers and she is mine
Hot and fast and angry as she can be
Her eyes and words are so icy

From Noah Kahan feat. Hozier : Northern Attitude: 
You lose your friends, you lose your wife
You feelin' right? You feelin' proud?
Forgive my northern attitude

From Hozier : Butchered Tongue: 
Singin' at me as the first thing
And as a young man, blessed to pass so many road signs
A promise softly sung of somewhere else

From Hozier feat. Mavis Staples : Nina Cried Power : 
It's th

Using the semantic model, we not get a lot more results for "love". Pretty much every song in the small test dataset actully comes up - which either means our model's threshold is a bit too low for now; or that the lyrics do just happen to all be about love in some way. Both interesting restuls!

Now that we've seen this can work, next up we can try with some more data points, additional data sources, and more refined queries or hybrid search techniques. 

In the next blog - I will add my Spotify listening history and trends to get some more personalized results based on my listening habits and preferences. 