OmniSim generates realistic, personalised, multi-turn conversational recommendation dialogues from item metadata alone. It combines LLM-based generation with Elasticsearch-grounded retrieval to mitigate hallucination and produces human-level lexical diversity across any item domain (movies, fashion, e-commerce, …).
GitHub: https://github.com/irecsys/OmniSim
Demo: http://34.72.93.183/ (only available till June 14, 2026)
- Domain-agnostic — works with any item metadata CSV; no domain-specific templates required
- Retrieval-grounded — hybrid BM25 + dense kNN scoring (Equations 3–6 in the paper) prevents hallucination and data leakage
- Three dialogue modes — Free (open-ended), Static (schema-driven), Adaptive (LLM-augmented attributes)
- Hybrid user modelling — combines short-term and long-term preference signals when interaction history is available
- Probabilistic behaviours — configurable chit-chat, explainable rejections, recommendation explanations
- Dual-track evaluation — NLP lexical-diversity metrics + LLM-as-a-Judge quality scores
- Streamlit dashboard — interactive browser for generated conversations and evaluation results
OmniSim/
├── run.py # Main entry point — run simulations
├── build_index.py # Build / rebuild the Elasticsearch index
│
├── configs/
│ ├── system/
│ │ └── system.yaml # Global defaults (all parameters documented)
│ ├── imdb/
│ │ ├── imdb.yaml # IMDB movies dataset config
│ │ └── inputs/ # test_pairs.csv, test_items.csv, test_users.csv
│ ├── hm/
│ │ ├── hm.yaml # H&M fashion dataset config
│ │ └── inputs/ # test_pairs.csv, test_items.csv, test_users.csv
│ └── prompts/
│ ├── default.yaml # All LLM prompt templates (editable)
│ └── phrase_templates.yaml
│
├── data/
│ ├── imdb/ # items.csv (with embedding_vector), users.csv, interactions.csv
│ └── hm/ # handm.csv (with embedding_vector), users.csv, interactions.csv
│
├── utils/
│ ├── utils.py # Scoring, retrieval, LLM clients, dialogue acts
│ ├── simulator.py # Free / Static / Adaptive simulation engines
│ ├── quick_start.py # Orchestrator — parallel conversation generation
│ ├── configurator.py # Config loader
│ ├── dataset.py # CSV loader
│ ├── user_profile_builder.py # Build / cache user preference profiles
│ ├── evaluator.py # LLM-as-a-Judge evaluation
│ └── metrics.py # NLP lexical diversity metrics
│
├── scripts/
│ ├── build_es_index.py # Index items into Elasticsearch
│ ├── compute_nlp_metrics.py
│ └── run_judge_all.py
│
├── UI/
│ └── dashboard.py # Streamlit analytics dashboard
│
└── docs/
└── API.md # Full API reference
- Python 3.10+
- Elasticsearch 8.x (local via Docker or remote; do not use ES 9.x)
- An OpenAI-compatible LLM API — Azure OpenAI, OpenAI, ThetaEdgeCloud, or any vLLM-compatible endpoint
- An embeddings API or local
sentence-transformers(runs offline, free)
# 1. Clone the repository
git clone https://github.com/irecsys/OmniSim.git
cd OmniSim
# 2. Create and activate a virtual environment
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
# 3. Install dependencies
pip install -r requirements.txt
# 4. Copy the environment template and fill in your credentials
cp .env.example .env
# Edit .env — see "API Keys" section belowNote —
elasticsearchclient version:requirements.txtpinselasticsearch>=8.13.0,<9.0.0. The 9.x client is incompatible with an ES 8.x server and will produceBadRequestError(400).
Copy .env.example to .env and fill in the keys for the providers you use:
# OpenAI (chat + embeddings)
OPENAI_KEY=sk-...
# Azure OpenAI (chat + embeddings — same key for both)
AZURE_KEY=...
# ThetaEdgeCloud (open-source models via hosted API)
THETA_KEY=...
# Elasticsearch — only needed when xpack.security.enabled=true
ES_USER=elastic
ES_PWD=...# Docker (recommended — security disabled for local use)
docker run -d --name es8 \
-e "discovery.type=single-node" \
-e "xpack.security.enabled=false" \
-p 9200:9200 \
docker.elastic.co/elasticsearch/elasticsearch:8.13.0
# Verify
curl http://localhost:9200Edit configs/system/system.yaml. Choose one of the options below:
# ── Option A: Azure OpenAI (recommended) ─────────────────────
openai_provider: azure
chat_model: gpt-4o-mini # deployment name in your Azure resource
chat_endpoint: https://YOUR_RESOURCE.openai.azure.com/
chat_api_version: "2024-02-01"
embeddings_provider: azure
embeddings_model: text-embedding-3-small
embeddings_endpoint: https://YOUR_RESOURCE.openai.azure.com/
embeddings_api_version: "2024-02-01"
# ── Option B: OpenAI ──────────────────────────────────────────
openai_provider: openai
chat_model: gpt-4o-mini
embeddings_provider: openai
embeddings_model: text-embedding-3-small
# ── Option C: ThetaEdgeCloud (Llama) + local embeddings ──────
# Use num_workers: 1 to avoid 409 Conflict (no concurrent requests)
openai_provider: thetaedgecloud
chat_model: meta-llama/Meta-Llama-3.1-70B-Instruct
embeddings_provider: sentence-transformers # free, runs offline
embeddings_model: all-MiniLM-L6-v2ThetaEdgeCloud note: the API does not support concurrent requests. Always run with
--num-workers 1when using this provider.
Both built-in datasets ship with pre-computed embeddings — no API calls are needed at index-build time:
data/imdb/items.csv→embedding_vectorcolumn (dim 1536)data/hm/handm.csv→embedding_vectorcolumn (dim 1536)
# IMDB movies
python run.py --config configs/imdb/imdb.yaml --build-index
# H&M fashion
python run.py --config configs/hm/hm.yaml --build-index# All enabled strategies, default mode (set in system.yaml)
python run.py --config configs/imdb/imdb.yaml
# Override dialogue mode
python run.py --config configs/imdb/imdb.yaml --mode adaptive
# One specific strategy
python run.py --config configs/imdb/imdb.yaml --strategy user_item_pairs
# Quick smoke-test: 2 pairs, 1 conversation each, serial execution
python run.py --config configs/imdb/imdb.yaml \
--pairs-file configs/imdb/inputs/test_pairs.csv \
--chats-per-entry 1 --num-workers 1
# Override parallel workers
python run.py --config configs/imdb/imdb.yaml --num-workers 4Generated conversations are saved to:
chats/{dataset}/{mode}/{strategy}/{run_timestamp}/
{user_id}-{item_id}-{turns}-{attempts}-{succeed}-{timestamp}.txt
succeed=1 means the target item was successfully recommended and accepted.
| Mode | Description | Best for |
|---|---|---|
free |
User describes preferences in open natural language | Maximum diversity |
static |
Bot asks about predefined metadata attributes (genre, language, …) | Slot-filling systems |
adaptive |
Like static, but bot also discovers additional relevant attributes via LLM | Most realistic |
Set mode_refinement in system.yaml or pass --mode at runtime.
Base retrieval score:
Note that we added BM25 score, where our UMAP'26 paper utilized cosine similarity only. All scores are normalized before computations in linear weighted formula.
S̃_base(q, i) = λ · S̃_BM25(q, M_i,d) + (1 − λ) · S̃_cos(q, i)
Hybrid score with user profile:
S̃_hybrid = α · S̃_base + (1 − α) · [β · S̃_cos(i, p_short) + (1 − β) · S̃_cos(i, p_long)]
Threshold gate: if max(S̃_base) < τ across all retrieved candidates, the system stays in preference-elicitation mode and asks another clarifying question instead of recommending.
| Parameter | Default | Meaning |
|---|---|---|
lambda_bm25 |
0.3 | λ — BM25 vs cosine balance |
weight_es_score |
0.7 | α — retrieval vs user-profile weight |
weight_user_taste_short |
0.3 | β — short-term vs long-term preference ratio |
threshold_similarity |
0.3 | τ — minimum max(S̃_base) to trigger recommendation |
python scripts/compute_nlp_metrics.py \
--folder chats/imdb/adaptive \
--output results/adaptive_metrics.csvpython scripts/run_judge_all.py \
--folder chats/imdb/adaptive/user_item_pairs/ALL \
--output results/judge_adaptive.csv \
--limit 50streamlit run UI/dashboard.py --server.port 8501 --server.address 0.0.0.0Open http://localhost:8501 in your browser.
OmniSim ships with a Streamlit dashboard for exploring generated conversations and comparing dialogue modes side-by-side.
Install dashboard dependencies (if not already installed):
pip install streamlit>=1.32.0 plotly>=5.18.0Launch:
streamlit run UI/dashboard.py
# or with explicit host/port:
streamlit run UI/dashboard.py --server.port 8501 --server.address 0.0.0.0Open http://localhost:8501 in your browser.
What you can explore:
| Panel | Description |
|---|---|
| Conversation Browser | Read any generated .txt file directly in the browser |
| Success Rate | Per-mode success rates and recommendation attempt distributions |
| Turn Statistics | Average turns, question counts, chit-chat frequency |
| NLP Metrics | Distinct-1/2, TTR, MTLD, HDD — compare Free vs Static vs Adaptive |
| LLM Judge Scores | Fluency, Conversational Quality, Content Quality (1–5 scale) |
The dashboard reads from the chats/ folder produced by run.py. Point it
at any run subfolder after a simulation completes.
- Prepare
items.csvwith at minimum: an ID column, a title column, and a descriptive text column. - Optionally add
users.csv(demographics) andinteractions.csv(ratings) to enable personalisation. - Copy
configs/imdb/imdb.yamltoconfigs/mydata/mydata.yamland update:es_index,datasetcol_itemid,col_title,col_category,col_detailsembedding_fields,item_attributes(BM25 uses the auto-builtbm25_detailsfield)role_user,role_bot
- Build the index:
python run.py --config configs/mydata/mydata.yaml --build-index - Run:
python run.py --config configs/mydata/mydata.yaml
If your
items.csvdoes not have a pre-computed embedding column, setprecomputed_embedding_col: ~in your dataset config. OmniSim will compute embeddings via the configuredembeddings_providerat index-build time.
If you use OmniSim in your research, please cite our paper below:
@article{zheng2026omnisim,
title = {OmniSim: A LLM-Powered Open-Source Simulator for Generating Personalized
and Adaptive Conversational Recommendation Dialogues},
author = {Zheng, Yong and Zhang, Jian},
journal = {Proceedings of the 34th ACM Conference on User Modeling, Adaptation and Personalization (UMAP)},
year = {2026}
}Apache 2.0 — see LICENSE.
