This was flagged as worthy of followup when reviewing #818 :
In granite33/output.py:226 — The line computes:
|
citation["response_end"] = index + len(response_text_without_citations) |
response_text is the individual sentence associated with this citation (looked up from response_sents_by_citation_id)
response_text_without_citations is the entire response with all citation tags stripped
index is where the individual sentence starts within the full response
So response_end ends up being (start of sentence) + (length of entire response), which will always overshoot past the end of the actual sentence — potentially past the end of the string entirely.
The correct formula should be:
citation["response_end"] = index + len(response_text)
The granite32 version does this correctly:
|
citation["response_end"] = last_response_text_match["begin_idx"] + len( |
|
response_text |
|
) |
Impact: Every citation span in granite 3.3 output will have a response_end that points to roughly the end of the full response rather than the end of the cited sentence. Downstream consumers that slice response[begin:end] would get a much larger substring than intended.
This was flagged as worthy of followup when reviewing #818 :
In granite33/output.py:226 — The line computes:
mellea/mellea/formatters/granite/granite3/granite33/output.py
Line 226 in 301ca3e
response_textis the individual sentence associated with this citation (looked up fromresponse_sents_by_citation_id)response_text_without_citationsis the entire response with all citation tags strippedindexis where the individual sentence starts within the full responseSo
response_endends up being (start of sentence) + (length of entire response), which will always overshoot past the end of the actual sentence — potentially past the end of the string entirely.The correct formula should be:
The granite32 version does this correctly:
mellea/mellea/formatters/granite/granite3/granite32/output.py
Lines 291 to 293 in 301ca3e
Impact: Every citation span in granite 3.3 output will have a
response_endthat points to roughly the end of the full response rather than the end of the cited sentence. Downstream consumers that sliceresponse[begin:end]would get a much larger substring than intended.