Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions .env.example
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# Example Environment Variables for Lightspeed Stack
# Copy this file to .env and set appropriate values

# Required: User anonymization pepper (set to a secure random value)
# This is used for HMAC-based user ID hashing to protect user privacy
USER_ANON_PEPPER=your-secure-random-string-here

# Optional: OpenAI API Key (if using OpenAI models)
OPENAI_API_KEY=your-openai-api-key

# Optional: Other environment variables as needed for your configuration
11 changes: 11 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,17 @@ Lightspeed Core Stack is based on the FastAPI framework (Uvicorn). The service i
- please note that currently Python 3.14 is not officially supported
- all sources are made (backward) compatible with Python 3.12; it is checked on CI

## Environment Variables

The following environment variable is required for the service to start:

* `USER_ANON_PEPPER` - A secure random string used for user anonymization. Set this to a cryptographically secure random value:
```bash
export USER_ANON_PEPPER="your-secure-random-string-here"
```

**Security Note**: This value should be treated as a secret and kept secure. It's used for HMAC-based user ID hashing to protect user privacy while enabling usage analytics.

# Installation

Installation steps depends on operation system. Please look at instructions for your system:
Expand Down
1 change: 1 addition & 0 deletions docker-compose.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ services:
- ./lightspeed-stack.yaml:/app-root/lightspeed-stack.yaml:Z
environment:
- OPENAI_API_KEY=${OPENAI_API_KEY}
- USER_ANON_PEPPER=${USER_ANON_PEPPER:-default-pepper-for-development-only}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

πŸ› οΈ Refactor suggestion

Remove insecure default pepper fallback.

Providing a default pepper invites accidental production deployments with a shared key, breaking privacy guarantees and cross-environment isolation.

Use a required variable and fail fast if it’s missing (compose will error):

-      - USER_ANON_PEPPER=${USER_ANON_PEPPER:-default-pepper-for-development-only}
+      - USER_ANON_PEPPER=${USER_ANON_PEPPER}

Optional (outside this line): keep a dev-only override in docker-compose.override.yaml or an .env file that is not committed.

πŸ“ Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- USER_ANON_PEPPER=${USER_ANON_PEPPER:-default-pepper-for-development-only}
- USER_ANON_PEPPER=${USER_ANON_PEPPER}
πŸ€– Prompt for AI Agents
In docker-compose.yaml around line 33, the service currently falls back to a
hardcoded default pepper which is insecure; remove the default fallback so the
env interpolation uses USER_ANON_PEPPER without a default (so docker-compose
will error if it’s missing) and ensure any developer-only value is provided via
a non-committed .env or docker-compose.override.yaml instead.

depends_on:
llama-stack:
condition: service_healthy
Expand Down
24 changes: 19 additions & 5 deletions src/app/endpoints/conversations.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@
)
from utils.endpoints import check_configuration_loaded, validate_conversation_ownership
from utils.suid import check_suid
from utils.user_anonymization import get_anonymous_user_id

logger = logging.getLogger("app.endpoints.handlers")
router = APIRouter(tags=["conversations"])
Expand Down Expand Up @@ -158,7 +159,13 @@ async def get_conversations_list_endpoint_handler(

user_id, _, _ = auth

logger.info("Retrieving conversations for user %s", user_id)
# Get anonymous user ID for database lookup
anonymous_user_id = get_anonymous_user_id(user_id)

logger.info(
"Retrieving conversations for anonymous user %s",
anonymous_user_id,
)

with get_session() as session:
try:
Expand All @@ -167,7 +174,7 @@ async def get_conversations_list_endpoint_handler(
filtered_query = (
query
if Action.LIST_OTHERS_CONVERSATIONS in request.state.authorized_actions
else query.filter_by(user_id=user_id)
else query.filter_by(anonymous_user_id=anonymous_user_id)
)

user_conversations = filtered_query.all()
Expand All @@ -190,20 +197,27 @@ async def get_conversations_list_endpoint_handler(
]

logger.info(
"Found %d conversations for user %s", len(conversations), user_id
"Found %d conversations for anonymous user %s",
len(conversations),
anonymous_user_id,
)

return ConversationsListResponse(conversations=conversations)

except Exception as e:
logger.exception(
"Error retrieving conversations for user %s: %s", user_id, e
"Error retrieving conversations for anonymous user %s: %s",
anonymous_user_id,
e,
)
raise HTTPException(
status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
detail={
"response": "Unknown error",
"cause": f"Unknown error while getting conversations for user {user_id}",
"cause": (
f"Unknown error while getting conversations for "
f"anonymous user {anonymous_user_id}"
),
},
) from e

Expand Down
10 changes: 8 additions & 2 deletions src/app/endpoints/feedback.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@
ForbiddenResponse,
)
from utils.suid import get_suid
from utils.user_anonymization import get_anonymous_user_id

logger = logging.getLogger(__name__)
router = APIRouter(prefix="/feedback", tags=["feedback"])
Expand Down Expand Up @@ -134,7 +135,8 @@ def store_feedback(user_id: str, feedback: dict) -> None:
user_id (str): Unique identifier of the user submitting feedback.
feedback (dict): Feedback data to be stored, merged with user ID and timestamp.
"""
logger.debug("Storing feedback for user %s", user_id)
anonymous_user_id = get_anonymous_user_id(user_id)
logger.debug("Storing feedback for anonymous user %s", anonymous_user_id)
# Creates storage path only if it doesn't exist. The `exist_ok=True` prevents
# race conditions in case of multiple server instances trying to set up storage
# at the same location.
Expand All @@ -144,7 +146,11 @@ def store_feedback(user_id: str, feedback: dict) -> None:
storage_path.mkdir(parents=True, exist_ok=True)

current_time = str(datetime.now(UTC))
data_to_store = {"user_id": user_id, "timestamp": current_time, **feedback}
data_to_store = {
"anonymous_user_id": anonymous_user_id,
"timestamp": current_time,
**feedback,
}
Comment on lines 148 to +153
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

πŸ› οΈ Refactor suggestion

Prevent client payload from overriding server-controlled fields + use ISO 8601 timestamps.

With the current dict merge order, keys in feedback can overwrite anonymous_user_id and timestamp, allowing a client to spoof these values and corrupt analytics. Also, prefer isoformat() for stable, machine-readable timestamps.

Apply this diff:

-    current_time = str(datetime.now(UTC))
-    data_to_store = {
-        "anonymous_user_id": anonymous_user_id,
-        "timestamp": current_time,
-        **feedback,
-    }
+    current_time = datetime.now(UTC).isoformat()
+    # Ensure server-controlled fields cannot be overridden by client payload
+    data_to_store = {
+        **feedback,
+        "anonymous_user_id": anonymous_user_id,
+        "timestamp": current_time,
+    }
πŸ“ Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
current_time = str(datetime.now(UTC))
data_to_store = {"user_id": user_id, "timestamp": current_time, **feedback}
data_to_store = {
"anonymous_user_id": anonymous_user_id,
"timestamp": current_time,
**feedback,
}
current_time = datetime.now(UTC).isoformat()
# Ensure server-controlled fields cannot be overridden by client payload
data_to_store = {
**feedback,
"anonymous_user_id": anonymous_user_id,
"timestamp": current_time,
}
πŸ€– Prompt for AI Agents
In src/app/endpoints/feedback.py around lines 145 to 150, the current dict merge
lets client-sent feedback override server-controlled keys and uses
str(datetime.now(UTC)); change the merge order so feedback is merged first and
then the server-controlled fields overwrite it (i.e., build dict as {**feedback,
"anonymous_user_id": anonymous_user_id, "timestamp": timestamp}), and replace
str(datetime.now(UTC)) with datetime.now(UTC).isoformat() to produce a stable
ISO 8601 timestamp; ensure UTC is the timezone object used.


# stores feedback in a file under unique uuid
feedback_file_path = storage_path / f"{get_suid()}.json"
Expand Down
28 changes: 10 additions & 18 deletions src/app/endpoints/query.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@
get_system_prompt,
validate_conversation_ownership,
)
from utils.user_anonymization import get_anonymous_user_id
from utils.mcp_headers import mcp_headers_dependency, handle_mcp_headers_with_toolgroups
from utils.transcripts import store_transcript
from utils.types import TurnSummary
Expand Down Expand Up @@ -78,25 +79,30 @@ def is_transcripts_enabled() -> bool:
def persist_user_conversation_details(
user_id: str, conversation_id: str, model: str, provider_id: str
) -> None:
"""Associate conversation to user in the database."""
"""Associate conversation to user in the database using anonymous user ID."""
# Get anonymous user ID for database storage
anonymous_user_id = get_anonymous_user_id(user_id)

with get_session() as session:
existing_conversation = (
session.query(UserConversation)
.filter_by(id=conversation_id, user_id=user_id)
.filter_by(id=conversation_id, anonymous_user_id=anonymous_user_id)
.first()
)

if not existing_conversation:
conversation = UserConversation(
id=conversation_id,
user_id=user_id,
anonymous_user_id=anonymous_user_id,
last_used_model=model,
last_used_provider=provider_id,
message_count=1,
)
session.add(conversation)
logger.debug(
"Associated conversation %s to user %s", conversation_id, user_id
"Associated conversation %s to anonymous user %s",
conversation_id,
anonymous_user_id,
)
else:
existing_conversation.last_used_model = model
Expand Down Expand Up @@ -189,20 +195,6 @@ async def query_endpoint_handler(
),
)

if user_conversation is None:
logger.warning(
"User %s attempted to query conversation %s they don't own",
user_id,
query_request.conversation_id,
)
raise HTTPException(
status_code=status.HTTP_403_FORBIDDEN,
detail={
"response": "Access denied",
"cause": "You do not have permission to access this conversation",
},
)

try:
# try to get Llama Stack client
client = AsyncLlamaStackClientHolder().get_client()
Expand Down
14 changes: 0 additions & 14 deletions src/app/endpoints/streaming_query.py
Original file line number Diff line number Diff line change
Expand Up @@ -552,20 +552,6 @@ async def streaming_query_endpoint_handler( # pylint: disable=too-many-locals
user_id=user_id, conversation_id=query_request.conversation_id
)

if user_conversation is None:
logger.warning(
"User %s attempted to query conversation %s they don't own",
user_id,
query_request.conversation_id,
)
raise HTTPException(
status_code=status.HTTP_403_FORBIDDEN,
detail={
"response": "Access denied",
"cause": "You do not have permission to access this conversation",
},
)

try:
# try to get Llama Stack client
client = AsyncLlamaStackClientHolder().get_client()
Expand Down
8 changes: 5 additions & 3 deletions src/models/database/conversations.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
from datetime import datetime

from sqlalchemy.orm import Mapped, mapped_column
from sqlalchemy import DateTime, func
from sqlalchemy import DateTime, func, ForeignKey

from models.database.base import Base

Expand All @@ -16,8 +16,10 @@ class UserConversation(Base): # pylint: disable=too-few-public-methods
# The conversation ID
id: Mapped[str] = mapped_column(primary_key=True)

# The user ID associated with the conversation
user_id: Mapped[str] = mapped_column(index=True)
# The anonymous user ID associated with the conversation
anonymous_user_id: Mapped[str] = mapped_column(
ForeignKey("user_mapping.anonymous_id"), index=True, nullable=False
)

# The last provider/model used in the conversation
last_used_model: Mapped[str] = mapped_column()
Expand Down
32 changes: 32 additions & 0 deletions src/models/database/user_mapping.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
"""User ID anonymization mapping model."""

from datetime import datetime

from sqlalchemy.orm import Mapped, mapped_column
from sqlalchemy import DateTime, func, Index, String

from models.database.base import Base


class UserMapping(Base): # pylint: disable=too-few-public-methods
"""Model for mapping real user IDs to anonymous UUIDs."""

__tablename__ = "user_mapping"

# Anonymous UUID used for all storage/analytics (primary key)
anonymous_id: Mapped[str] = mapped_column(
String(36), primary_key=True, nullable=False
)

# Original user ID from authentication (hashed for security)
user_id_hash: Mapped[str] = mapped_column(
String(64), index=True, unique=True, nullable=False
)

created_at: Mapped[datetime] = mapped_column(
DateTime(timezone=True),
server_default=func.now(), # pylint: disable=not-callable
)

# Index for efficient lookups
__table_args__ = (Index("ix_user_mapping_hash_lookup", "user_id_hash"),)
41 changes: 35 additions & 6 deletions src/utils/endpoints.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
from configuration import AppConfig
from utils.suid import get_suid
from utils.types import GraniteToolParser
from utils.user_anonymization import get_anonymous_user_id


logger = logging.getLogger("utils.endpoints")
Expand All @@ -21,23 +22,51 @@
def validate_conversation_ownership(
user_id: str, conversation_id: str, others_allowed: bool = False
) -> UserConversation | None:
"""Validate that the conversation belongs to the user.
"""
Validate that the conversation belongs to the user using anonymous ID lookup.

Validates that the conversation with the given ID belongs to the user with the given ID.
If `others_allowed` is True, it allows conversations that do not belong to the user,
which is useful for admin access.

Returns the conversation object if valid, raises HTTPException if not.
"""
# Get anonymous user ID for database lookup
anonymous_user_id = get_anonymous_user_id(user_id)

with get_session() as session:
conversation_query = session.query(UserConversation)

filtered_conversation_query = (
conversation_query.filter_by(id=conversation_id)
if others_allowed
else conversation_query.filter_by(id=conversation_id, user_id=user_id)
)
if others_allowed:
# If others_allowed is True, we can access any conversation by ID
filtered_conversation_query = conversation_query.filter_by(
id=conversation_id
)
else:
# If others_allowed is False, we can only access conversations belonging to this user
filtered_conversation_query = conversation_query.filter_by(
id=conversation_id, anonymous_user_id=anonymous_user_id
)

conversation: UserConversation | None = filtered_conversation_query.first()

if conversation is None:
logger.warning(
"User %s attempted to access conversation %s they don't own",
user_id,
conversation_id,
)
raise HTTPException(
status_code=status.HTTP_403_FORBIDDEN,
detail={
"response": "Forbidden: conversation does not belong to user",
"cause": (
f"User {user_id} does not have access to "
f"conversation {conversation_id}"
),
},
)

return conversation


Expand Down
20 changes: 14 additions & 6 deletions src/utils/transcripts.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,15 +14,16 @@
from models.requests import Attachment, QueryRequest
from utils.suid import get_suid
from utils.types import TurnSummary
from utils.user_anonymization import get_anonymous_user_id

logger = logging.getLogger("utils.transcripts")


def construct_transcripts_path(user_id: str, conversation_id: str) -> Path:
"""Construct path to transcripts."""
def construct_transcripts_path(anonymous_user_id: str, conversation_id: str) -> Path:
"""Construct path to transcripts using anonymous user ID."""
# these two normalizations are required by Snyk as it detects
# this Path sanitization pattern
uid = os.path.normpath("/" + user_id).lstrip("/")
uid = os.path.normpath("/" + anonymous_user_id).lstrip("/")
cid = os.path.normpath("/" + conversation_id).lstrip("/")
file_path = (
configuration.user_data_collection_configuration.transcripts_storage or ""
Expand All @@ -46,7 +47,7 @@ def store_transcript( # pylint: disable=too-many-arguments,too-many-positional-
"""Store transcript in the local filesystem.

Args:
user_id: The user ID (UUID).
user_id: The original user ID from authentication (will be anonymized).
conversation_id: The conversation ID (UUID).
query_is_valid: The result of the query validation.
query: The query (without attachments).
Expand All @@ -56,7 +57,14 @@ def store_transcript( # pylint: disable=too-many-arguments,too-many-positional-
truncated: The flag indicating if the history was truncated.
attachments: The list of `Attachment` objects.
"""
transcripts_path = construct_transcripts_path(user_id, conversation_id)
# Get anonymous user ID for storage
anonymous_user_id = get_anonymous_user_id(user_id)
logger.debug(
"Storing transcript for anonymous user %s",
anonymous_user_id,
)

transcripts_path = construct_transcripts_path(anonymous_user_id, conversation_id)
transcripts_path.mkdir(parents=True, exist_ok=True)

data_to_store = {
Expand All @@ -65,7 +73,7 @@ def store_transcript( # pylint: disable=too-many-arguments,too-many-positional-
"model": model_id,
"query_provider": query_request.provider,
"query_model": query_request.model,
"user_id": user_id,
"anonymous_user_id": anonymous_user_id, # Store anonymous ID only
"conversation_id": conversation_id,
"timestamp": datetime.now(UTC).isoformat(),
},
Expand Down
Loading