[ENHANCEMENT] Add fallback handling when LLM returns non-parsable or partially structured output

### Context

While experimenting with the extraction flow, I noticed that in some cases the LLM response is not strictly parsable JSON, even when the prompt is well-structured.

This usually happens in edge cases such as:

* slightly malformed JSON (missing quotes, trailing text)
* partially structured responses
* unexpected formatting around the JSON block

Even though prompt improvements reduce this, it still happens occasionally and can break the pipeline at the parsing stage.

---

### Problem

Currently, the system seems to assume that the LLM response will always be valid JSON.

Because of this:

* `json.loads()` can fail and stop the pipeline
* there is no fallback mechanism to recover usable data
* errors are abrupt instead of gracefully handled

---

### Proposed Improvement

Introduce a fallback handling step when JSON parsing fails.

Possible approach:

* attempt a safe extraction of JSON substring from the response
* retry parsing after minor cleanup (e.g., trimming extra text)
* if still invalid:

  * return a structured error, or
  * mark the output as `requires_review = true`

This ensures that the system degrades gracefully instead of failing completely.

---

### Why this helps

* improves robustness of the pipeline
* reduces hard failures due to minor formatting issues
* complements prompt improvements and validation steps
* makes the system more production-ready

---

### Scope

* Focus only on post-LLM response handling
* No changes to prompt design or schema
* Can be implemented as a small utility in the extraction pipeline


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ENHANCEMENT] Add fallback handling when LLM returns non-parsable or partially structured output #430

Context

Problem

Proposed Improvement

Why this helps

Scope

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[ENHANCEMENT] Add fallback handling when LLM returns non-parsable or partially structured output #430

Description

Context

Problem

Proposed Improvement

Why this helps

Scope

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions