# Text Search with Elasticsearch

In [13]:
import docs_08 as docs

github_data = docs.read_github_data('evidentlyai', 'docs')
parsed_data = docs.parse_data(github_data)
chunks = docs.chunk_documents(parsed_data)


In [14]:
chunks[65]

{'start': 2000,
 'content': '\n\n- **Export scores** as JSON or Python dictionary.\n- **As a DataFrame**, either as a raw metrics table or by attaching scores to existing data rows.\n- **Generate visual reports** in Jupyter, Colab, or export as HTML\n- **Upload to Evidently Platform** to track evaluations over time\n\nThis exportability makes it easy to integrate Evidently into your existing workflows and pipelines –\xa0even if you are not using the Evidently Platform.\n\nHere is an example visual report showing various data quality metrics and test results. Other evaluations can be presented in the same way, or exported as raw scores:\n\n![](/images/concepts/report_test_preview.gif)\n\n**📌 Links:**\n\n- Quickstart for [LLM evaluation](/quickstart_llm) \n- Quickstart for [ML evaluation](/quickstart_ml)\n\nOr read on through this page for conceptual introduction.\n\n**2. Synthetic data generation [NEW]**\n\n<Check>\n  **TL;DR**: We have a nice config for structured synthetic data genera

In [15]:
from minsearch import Index

In [16]:
index = Index(
    text_fields=['content', 'filename', 'title', 'description']
)

index.fit(chunks)

<minsearch.minsearch.Index at 0x107109400>

In [17]:
search_results = index.search('how do I use llm-as-a-judge for evals')

In [18]:
print(search_results)

[{'start': 0, 'content': 'import CloudSignup from \'/snippets/cloud_signup.mdx\';\nimport CreateProject from \'/snippets/create_project.mdx\';\n\nIn this tutorial, we\'ll show how to evaluate text for custom criteria using LLM as the judge, and evaluate the LLM judge itself.\n\n<Info>\n  **This is a local example.** You will run and explore results using the open-source Python library. At the end, we’ll optionally show how to upload results to the Evidently Platform for easy exploration.\n</Info>\n\nWe\'ll explore two ways to use an LLM as a judge:\n\n- **Reference-based**. Compare new responses against a reference. This is useful for regression testing or whenever you have a "ground truth" (approved responses) to compare against.\n- **Open-ended**. Evaluate responses based on custom criteria, which helps evaluate new outputs when there\'s no reference available.\n\nWe will focus on demonstrating **how to create and tune the LLM evaluator**, which you can then apply in different contex

## Start Elastisearch

```bash
docker run -it \
    --rm \
    --name elasticsearch \
    -m 4GB \
    -p 9200:9200 \
    -p 9300:9300 \
    -e "discovery.type=single-node" \
    -e "xpack.security.enabled=false" \
    -v es9_data:/usr/share/elasticsearch/data \
    docker.elastic.co/elasticsearch/elasticsearch:9.1.1
```

Check that it's working using curls:

```bash
curl http://localhost:9200
```

In [19]:
!uv add elasticsearch==9.1.1

[2mResolved [1m155 packages[0m [2min 19ms[0m[0m
[2mAudited [1m136 packages[0m [2min 38ms[0m[0m


## Let's use it in Python

In [20]:
from elasticsearch import Elasticsearch
es_client = Elasticsearch('http://localhost:9200')

Define an index

In [22]:
index_settings = {
    "mappings": {
        "properties": {
            "start": {"type": "integer"},
            "content": {"type": "text"},
            "title": {"type": "text"},
            "description": {"type": "text"},
            "filename": {"type": "text"}
        }
    }
}

Create an index

In [23]:
index_name = 'evidently-docs'
es_client.indices.create(index=index_name, body=index_settings)

ObjectApiResponse({'acknowledged': True, 'shards_acknowledged': True, 'index': 'evidently-docs'})

If we want to delete an index we can run:


```python
# python
es_client.indices.delete(index=index_name)
```

Or if we want to recreate an index, we would have to delete it and create it again.

In [25]:
from tqdm.auto import tqdm

Index the chunks:

In [27]:
for doc in tqdm(chunks):
    es_client.index(index=index_name, document=doc)

  0%|          | 0/575 [00:00<?, ?it/s]

Search:

In [28]:
def elastic_search(query, num_results=15):
    es_query = {
        "size": num_results,
        "query": {
            "multi_match": {
                "query": query,
                "type": "best_fields",
                "fields": ["content", "filename", "title", "description"],
            }
        }
    }

    response = es_client.search(index=index_name, body=es_query)

    result_docs = []
    
    for hit in response['hits']['hits']:
        result_docs.append(hit['_source'])
    
    return result_docs


To give some terms more importance, we can use boosting. For example, this way we make title 3 times more important than other fields:

```python
# Python
"fields": ["content", "filename", "title^3", "description"],
```

This way we make title 3 times more important than other fields.

In [29]:
search_results = elastic_search('how do I use llm-as-a-judge for evals')

In [30]:
print(search_results)

[{'start': 3000, 'content': 's/customize_llm_judge#change-the-evaluator-llm) to see how you can select a different evaluator LLM. \n</Info>\n\n## 2.  Create the Dataset\n\nFirst, we\'ll create a toy Q&A dataset with customer support question that includes:\n\n- **Questions**. The inputs sent to the LLM app.\n- **Target responses**. The approved responses you consider accurate.\n- **New responses**. Imitated new responses from the system.\n- **Manual labels with explanation**. Labels that say if response is correct or not.\n\nWhy add the labels? It\'s a good idea to be the judge yourself before you write a prompt. This helps:\n\n- Formulate better criteria. You discover nuances that help you write a better prompt.\n- Get the "ground truth". You can use it to evaluate the quality of the LLM judge.\n\nUltimately, an LLM judge is a small ML system, and it needs its own evals\\!\n\n**Generate the dataframe**. Here\'s how you can create this dataset in one go:\n\n<Accordion title="Toy data t

# RAG

In [31]:
def search(query):
    results = elastic_search(
        query=query,
        num_results=15
    )

    return results

In [32]:
import json

instructions = """
You're an assistant that helps with the documentation.
Answer the QUESTION based on the CONTEXT from the search engine of our documentation.

Use only the facts from the CONTEXT when answering the QUESTION.

When answering the question, provide the reference to the file with the source.
Use the filename field for that. The repo url is: https://github.com/evidentlyai/docs/
Include code examples when relevant. 
If the question is discussed in multiple documents, cite all of them.

Don't use markdown or any formatting in the output.
""".strip()

prompt_template = """
<QUESTION>
{question}
</QUESTION>

<CONTEXT>
{context}
</CONTEXT>
""".strip()


def build_prompt(question, search_results):
    context = json.dumps(search_results)

    prompt = prompt_template.format(
        question=question,
        context=context
    ).strip()
    
    return prompt

In [34]:
from openai import OpenAI

openai_client = OpenAI()

def llm(user_prompt, instructions=None, model="gpt-4o-mini"):
    messages = []

    if instructions:
        messages.append({
            "role": "system",
            "content": instructions
        })

    messages.append({
        "role": "user",
        "content": user_prompt
    })

    response = openai_client.responses.create(
        model=model,
        input=messages
    )

    return response.output_text

In [35]:
def rag(query):
    search_results = search(query)
    prompt = build_prompt(query, search_results)
    response = llm(prompt)
    return response

In [36]:
result = rag("How can I build an eval report with llm as a judge?")

In [37]:
print(result)

To build an evaluation report using an LLM as a judge, follow these steps:

### 1. Installation and Imports
Make sure you have the required libraries installed. You will need the Evidently library for evaluation.

```bash
pip install evidently
```

Import necessary modules:

```python
import pandas as pd
import numpy as np

from evidently import Dataset, DataDefinition, Report, BinaryClassification
from evidently.descriptors import *
from evidently.presets import TextEvals, ClassificationPreset
from evidently.llm.templates import BinaryClassificationPromptTemplate
```

### 2. Set Up OpenAI API Key
Set up your OpenAI key as an environment variable.

```python
import os
os.environ["OPENAI_API_KEY"] = "YOUR_KEY"
```

### 3. Create the Dataset
Construct a dataset with customer support questions, including target responses, new responses, and manual labels.

```python
data = [
    ["Hi there, how do I reset my password?", 
     "To reset your password, click on 'Forgot Password' on the logi