fix: adds onboarding rollback to allow user to edit embedding provider if ingestion fails by lucaseduoli · Pull Request #617 · langflow-ai/openrag

lucaseduoli · 2025-12-05T18:50:18Z

This pull request introduces several improvements to onboarding, health checks, and query logic for the chat application. The main focus is on robust error handling during onboarding, enhanced provider health checks, and ensuring queries only run when appropriate (e.g., after onboarding and when providers are healthy). It also adds a mutation to roll back onboarding if ingestion tasks fail. These changes collectively improve user experience, reliability, and correctness of the onboarding and chat flows.

Onboarding error handling and rollback:

Added a new useOnboardingRollbackMutation hook to allow rolling back onboarding when ingestion tasks fail, and integrated it into the onboarding card logic to automatically handle failed sample data files and reset onboarding steps with clear error messaging. [1] [2] [3] [4]
Ensured provider-specific onboarding configuration only marks a provider as "already configured" when the selected provider matches, preventing incorrect UI state. [1] [2] [3]

Copilot

Pull request overview

This PR introduces robust error handling for the onboarding flow by adding a rollback mechanism when ingestion tasks fail, improves provider validation with detailed error messages, and ensures queries only run when appropriate (after onboarding completion and with healthy providers).

Added backend rollback endpoint to cancel tasks, delete ingested files, and reset configuration when onboarding fails
Enhanced provider validation to use full health checks during onboarding and improved error message extraction for all providers
Added onboarding completion state tracking in frontend to gate queries and prevent them from running during onboarding or when providers are unhealthy

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
`src/utils/telemetry/message_id.py`	Added telemetry event for onboarding rollback
`src/main.py`	Added `/onboarding/rollback` POST endpoint for triggering rollback
`src/api/settings.py`	Implemented `rollback_onboarding` function to cancel tasks, delete documents, and reset config; changed onboarding validation to use full health checks
`src/api/provider_validation.py`	Added error parsing helpers and improved error messages for all provider validation failures
`frontend/contexts/chat-context.tsx`	Added `isOnboardingComplete` state tracking based on settings and localStorage
`frontend/components/provider-health-banner.tsx`	Improved error message display to handle duplicate errors gracefully
`frontend/components/chat-renderer.tsx`	Calls `setOnboardingComplete(true)` when onboarding finishes
`frontend/app/onboarding/_components/onboarding-card.tsx`	Added rollback mutation hook and logic to detect failed files (though mutation is never invoked)
`frontend/app/onboarding/_components/ibm-onboarding.tsx`	Only formatting changes (spaces to tabs)
`frontend/app/api/queries/useProviderHealthQuery.ts`	Added onboarding completion and active ingestion checks to gate health queries
`frontend/app/api/queries/useGetNudgesQuery.ts`	Added onboarding completion and LLM health checks to gate nudges queries
`frontend/app/api/queries/useGetConversationsQuery.ts`	Added onboarding completion check to gate conversations queries
`frontend/app/api/mutations/useOnboardingRollbackMutation.ts`	New mutation hook for calling the rollback endpoint

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-05T22:13:57Z

+				(!!debouncedEndpoint && !!debouncedApiKey && !!debouncedProjectId) ||
+				getFromEnv ||


The enabled condition for fetching IBM models has a logical flaw. It enables the query when getFromEnv is true OR when alreadyConfigured is true, even if the required credentials (endpoint, apiKey, projectId) are not provided.

When getFromEnv is true but the user hasn't provided endpoint or projectId, the query will be enabled but will likely fail or return no useful data. The correct logic should ensure all required fields are present when not using environment variables:

enabled: (!!debouncedEndpoint && !!debouncedProjectId && (getFromEnv || !!debouncedApiKey)) || alreadyConfigured,

This ensures the query only runs when:

Endpoint AND projectId are provided AND (env key is used OR api key is provided), OR

Already configured (reusing existing credentials)

Suggested change

(!!debouncedEndpoint && !!debouncedApiKey && !!debouncedProjectId) ||

getFromEnv ||

(

!!debouncedEndpoint &&

!!debouncedProjectId &&

(getFromEnv || !!debouncedApiKey)

) ||

Copilot · 2025-12-05T22:13:57Z

+  // Include hasChatError in queryKey so React Query refetches when it changes
+  // This ensures the health check runs with test_completion=true when chat errors occur
+  const testCompletion = params?.test_completion ?? hasChatError;
+  const queryKey = ["provider", "health", testCompletion, hasChatError];


The queryKey includes hasChatError redundantly. Line 117 computes testCompletion = params?.test_completion ?? hasChatError, and then line 118 creates the queryKey as ["provider", "health", testCompletion, hasChatError].

Since testCompletion already captures the value of hasChatError when params?.test_completion is undefined, including both in the queryKey is redundant. The query will unnecessarily refetch when hasChatError changes even though testCompletion already represents the same state.

Consider simplifying to:

const queryKey = ["provider", "health", testCompletion];

Suggested change

const queryKey = ["provider", "health", testCompletion, hasChatError];

const queryKey = ["provider", "health", testCompletion];

Copilot · 2025-12-05T22:13:58Z

+                yaml.dump(current_config.to_dict(), f, default_flow_style=False, indent=2)
+
+            # Update cached config
+            config_manager._config = current_config


Directly accessing the private attribute config_manager._config violates encapsulation and could lead to maintenance issues if the internal implementation of config_manager changes. While this approach works to bypass the save_config_file method (which would set edited=True), it's accessing a private implementation detail.

Consider adding a public method to config_manager like save_config_with_edited_flag(config, edited=True) to allow saving the configuration with explicit control over the edited flag, rather than directly accessing _config.

Suggested change

config_manager._config = current_config

config_manager.save_config_with_edited_flag(current_config, edited=False)

Copilot · 2025-12-05T22:13:58Z

+                            details += f" (code: {code})"
+                        return details
+
+            # Anthropic format: {"error": {"message": "...", "type": "..."}}


The comment on line 87 says "Anthropic format" but the code on lines 88-89 checks for a direct "message" field in error_data, not under an "error" object. This is inconsistent with the comment and doesn't match typical Anthropic error response format which usually has {"error": {"message": "...", "type": "..."}}.

This block (lines 88-89) actually handles a generic format where message is at the root level, not Anthropic-specific. The comment should be updated to reflect this or the code should be moved to a more appropriate location in the conditional chain.

Suggested change

# Anthropic format: {"error": {"message": "...", "type": "..."}}

# Generic format: {"message": "..."}

Copilot · 2025-12-05T22:13:58Z

+    };
+  }, [settings?.edited]);
+
+  const setOnboardingComplete = useCallback((complete: boolean) => {


The setOnboardingComplete function allows manually setting the onboarding completion state, but this can lead to inconsistency with the computed state in the useEffect (lines 133-153).

The useEffect computes isOnboardingComplete based on settings?.edited && !stepKeyExists, but calling setOnboardingComplete(true) directly overrides this without updating the underlying conditions (localStorage or settings). This means:

If the useEffect runs again (e.g., on settings change), it might reset the state back

The manual state and computed state can diverge

Consider either:

Removing the manual setOnboardingComplete and relying solely on the computed state from settings and localStorage

Or having setOnboardingComplete also update the underlying conditions (localStorage/settings) to keep them in sync

Suggested change

const setOnboardingComplete = useCallback((complete: boolean) => {

const setOnboardingComplete = useCallback((complete: boolean) => {

if (typeof window !== "undefined") {

if (complete) {

// Remove the onboarding step key to indicate completion

localStorage.removeItem(ONBOARDING_STEP_KEY);

} else {

// Set the onboarding step key to indicate onboarding is not complete

localStorage.setItem(ONBOARDING_STEP_KEY, "true");

}

}

Copilot · 2025-12-05T22:13:59Z

+    // If any file failed, show error and jump back one step (like onboardingMutation.onError)
+    // Only handle if we haven't already handled this task
+    if (
+      taskWithFailedFile && 
+      !rollbackMutation.isPending && 
+      !isCompleted &&
+      !handledFailedTasksRef.current.has(taskWithFailedFile.task_id)
+    ) {
+      console.error("File failed in task, jumping back one step", taskWithFailedFile);
+
+      // Mark this task as handled to prevent infinite loops
+      handledFailedTasksRef.current.add(taskWithFailedFile.task_id);
+
+      // Extract error messages from failed files
+      const errorMessages: string[] = [];
+      if (taskWithFailedFile.files) {
+        Object.values(taskWithFailedFile.files).forEach((file) => {
+          if ((file.status === "failed" || file.status === "error") && file.error) {
+            errorMessages.push(file.error);
+          }
+        });
+      }
+
+      // Also check task-level error
+      if (taskWithFailedFile.error) {
+        errorMessages.push(taskWithFailedFile.error);
+      }
+
+      // Use the first error message, or a generic message if no errors found
+      const errorMessage = errorMessages.length > 0
+        ? errorMessages[0]
+        : "Sample data file failed to ingest. Please try again with a different configuration.";
+
+      // Set error message and jump back one step (exactly like onboardingMutation.onError)
+      setError(errorMessage);
+      setCurrentStep(totalSteps);
+      // Jump back one step after 1 second (go back to the step before ingestion)
+      // For embedding: totalSteps is 4, ingestion is step 3, so go back to step 2
+      // For LLM: totalSteps is 3, ingestion is step 2, so go back to step 1
+      setTimeout(() => {
+        // Go back to the step before the last step (which is ingestion)
+        const previousStep = totalSteps > 1 ? totalSteps - 2 : 0;
+        setCurrentStep(previousStep);
+      }, 1000);
+      return;


The rollbackMutation is defined but never called. When a file failure is detected, the code sets an error message and navigates back a step, but it doesn't actually invoke rollbackMutation.mutate() to trigger the backend cleanup (canceling tasks, deleting ingested files, resetting config). This means the rollback endpoint added in src/main.py and the rollback_onboarding function in src/api/settings.py will never be executed, leaving the system in an inconsistent state.

You should call rollbackMutation.mutate() when a failed file is detected to properly clean up the backend state before navigating back.

Copilot · 2025-12-05T22:13:59Z

+            # Generic format: {"detail": "..."}
+            if "detail" in error_data:
+                return error_data["detail"]
+    except (json.JSONDecodeError, ValueError, TypeError):


'except' clause does nothing but pass and there is no explanatory comment.

Suggested change

except (json.JSONDecodeError, ValueError, TypeError):

except (json.JSONDecodeError, ValueError, TypeError):

# Parsing failed; return the original error text as fallback.

lucaseduoli added 2 commits December 5, 2025 15:49

Added onboarding rollback on backend

6cc011d

Added onboarding rollback on frontend, when onboarding ingestion fails

474823d

lucaseduoli requested review from edwinjosechittilappilly and phact December 5, 2025 18:50

lucaseduoli self-assigned this Dec 5, 2025

lucaseduoli linked an issue Dec 5, 2025 that may be closed by this pull request

Watson onboarding only intermittent ingest failure #612

Closed

lucaseduoli mentioned this pull request Dec 5, 2025

Watson onboarding only intermittent ingest failure #612

Closed

phact requested a review from Copilot December 5, 2025 22:09

Copilot started reviewing on behalf of phact December 5, 2025 22:10 View session

Copilot AI reviewed Dec 5, 2025

View reviewed changes

Merge branch 'main' into fix/onboarding_rollback

c9b1c94

phact approved these changes Dec 5, 2025

View reviewed changes

phact merged commit 4eb707a into main Dec 5, 2025
1 check failed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: adds onboarding rollback to allow user to edit embedding provider if ingestion fails#617

fix: adds onboarding rollback to allow user to edit embedding provider if ingestion fails#617
phact merged 3 commits intomainfrom
fix/onboarding_rollback

lucaseduoli commented Dec 5, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Dec 5, 2025

Uh oh!

Copilot AI Dec 5, 2025

Uh oh!

Copilot AI Dec 5, 2025

Uh oh!

Copilot AI Dec 5, 2025

Uh oh!

Copilot AI Dec 5, 2025

Uh oh!

Copilot AI Dec 5, 2025

Uh oh!

Copilot AI Dec 5, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		(!!debouncedEndpoint && !!debouncedApiKey && !!debouncedProjectId) \|\|
		getFromEnv \|\|

	const queryKey = ["provider", "health", testCompletion, hasChatError];
	const queryKey = ["provider", "health", testCompletion];

	config_manager._config = current_config
	config_manager.save_config_with_edited_flag(current_config, edited=False)

	# Anthropic format: {"error": {"message": "...", "type": "..."}}
	# Generic format: {"message": "..."}

-  const setOnboardingComplete = useCallback((complete: boolean) => {
+  const setOnboardingComplete = useCallback((complete: boolean) => {
+    if (typeof window !== "undefined") {
+      if (complete) {
+        // Remove the onboarding step key to indicate completion
+        localStorage.removeItem(ONBOARDING_STEP_KEY);
+      } else {
+        // Set the onboarding step key to indicate onboarding is not complete
+        localStorage.setItem(ONBOARDING_STEP_KEY, "true");
+      }
+    }

	except (json.JSONDecodeError, ValueError, TypeError):
	except (json.JSONDecodeError, ValueError, TypeError):
	# Parsing failed; return the original error text as fallback.

Conversation

lucaseduoli commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

lucaseduoli commented Dec 5, 2025 •

edited

Loading