Solr Automation Example

This is an example of how to automate Solr using the Solr API. The example is written in Python and uses the requests library to make HTTP requests to the Solr API.

Setup local solr instance

Use the docker-compose file to set up a local Solr instance.

docker-compose up -d

Workflow

Create the core with the name core.py script.
Update the schema with the schema.py script.
Create a document with the document.py script.
Test the document with the query.py script.
(Optional) Enable security with the security.py script.

Schema

The schema is defined in the schema.py script. The schema defines the fields that are available in the Solr core. You can adjust the schema to your needs.

What you would do is add or remove fields from the schema. This is done via the json file fields.json and after wards executed by the schema.py script again.

Machine Learning (PoC)

This script implements two different search approaches for Solr with machine learning capabilities.

Key Functions

index_document_with_embeddings():
- Creates vector embeddings for documents by combining all text fields
semantic_search():
- Executes pure vector-based search using KNN
hybrid_search():
- Combines text and vector search with configurable weights
load_solr_fields():
- Loads field configuration from fields.json

Semantic Search with Pretrained Model

Model: Uses pre-trained all-MiniLM-L6-v2 SentenceTransformer model
Process: Converts documents and queries into vector embeddings
Search: Performs k-nearest-neighbor (KNN) search based on vector similarity
Use Case: Best for finding semantically similar content regardless of exact keyword matches

Hybrid Search with Solr LTR (Text + Vector)

Approach: Combines traditional text search with vector-based search
Text Component: Uses Solr's edismax parser for enhanced text matching
Vector Component: KNN search as filter query for semantic similarity
Weighting: Configurable balance between text and vector search results

Workflow Overview

flowchart TD
    A[Start: main] --> B{SEMANTIC_WITH_PRETRAINED_MODEL?}
    B -->|Yes| C[Load SentenceTransformer Model<br/>all-MiniLM-L6-v2]
    B -->|No| D{HYBRID_SEARCH_WITH_SOLR_LTR?}
    C --> E[Generate Documents<br/>generate_documents 0, 1000]
    E --> F[Create Embeddings for each Document<br/>index_document_with_embeddings]
    F --> G[Combine Text Fields]
    G --> H[Generate Vector Embedding<br/>model.encode]
    H --> I[Add vector_field to Document]
    I --> J[Index Documents in Solr<br/>solr.add + commit]
    J --> K[Load Query from JSON<br/>sentences.json]
    K --> L[Perform Semantic Search<br/>semantic_search]
    L --> M[Encode Query to Vector]
    M --> N[Execute KNN Search<br/>fq: knn f=vector_field]
    N --> O[Return Top Results]
    D -->|Yes| P[Check if Documents Exist<br/>solr.search]
    D -->|No| Q[End]
    P --> R{Documents Found?}
    R -->|No| S[Generate & Index Documents<br/>with Embeddings]
    R -->|Yes| T[Perform Hybrid Search<br/>hybrid_search]
    S --> T
    T --> U[Encode Query to Vector]
    U --> V[Execute Combined Search<br/>q: text query<br/>fq: KNN vector search]
    V --> W[Return Weighted Results]
    O --> Q
    W --> Q
    style C fill: #e1f5fe
    style F fill: #fff3e0
    style L fill: #f3e5f5
    style T fill: #e8f5e8

Name		Name	Last commit message	Last commit date
Latest commit History 65 Commits
.devcontainer		.devcontainer
.github/workflows		.github/workflows
grafana/provisioning/datasources		grafana/provisioning/datasources
json		json
prometheus		prometheus
solr		solr
tests		tests
.env		.env
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Solr Automation Example

Setup local solr instance

Workflow

Schema

Machine Learning (PoC)

Key Functions

Semantic Search with Pretrained Model

Hybrid Search with Solr LTR (Text + Vector)

Workflow Overview

About

Uh oh!

Releases

Packages

Uh oh!

Languages

r4mercur/solr-automation-python-example

Folders and files

Latest commit

History

Repository files navigation

Solr Automation Example

Setup local solr instance

Workflow

Schema

Machine Learning (PoC)

Key Functions

Semantic Search with Pretrained Model

Hybrid Search with Solr LTR (Text + Vector)

Workflow Overview

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages