Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor OpenAI Error Tracing #987

Merged
merged 19 commits into from
Dec 4, 2023

Conversation

umaannamalai
Copy link
Contributor

This PR refactors error tracing for OpenAI with the following changes:

  • Only pass error attrs to notice_error()

  • Attach completion_id/ embedding_id to error trace to distinguish AI errors

  • Send chat completion and embedding events on error as well

  • Add error: True to chat completion summary/ embedding events

  • Add is_response attr to chat completion messages that are LLM responses

@mergify mergify bot removed the merge-conflicts label Nov 21, 2023
Copy link

github-actions bot commented Nov 21, 2023

🦙 MegaLinter status: ✅ SUCCESS

Descriptor Linter Files Fixed Errors Elapsed time
✅ PYTHON bandit 1 0 4.95s
✅ PYTHON black 4 0 0 1.1s
✅ PYTHON flake8 4 0 0.59s
✅ PYTHON isort 4 0 0 0.2s
✅ PYTHON pylint 4 0 3.1s

See detailed report in MegaLinter reports
Set VALIDATE_ALL_CODEBASE: true in mega-linter.yml to validate all sources, not only the diff

MegaLinter is graciously provided by OX Security

@codecov-commenter
Copy link

codecov-commenter commented Nov 21, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

❗ No coverage uploaded for pull request base (develop-ai-limited-preview@7d4828e). Click here to learn what that means.

Additional details and impacted files
@@                      Coverage Diff                      @@
##             develop-ai-limited-preview     #987   +/-   ##
=============================================================
  Coverage                              ?   82.06%           
=============================================================
  Files                                 ?      191           
  Lines                                 ?    20041           
  Branches                              ?     3474           
=============================================================
  Hits                                  ?    16446           
  Misses                                ?     2597           
  Partials                              ?      998           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@mergify mergify bot removed the tests-failing label Nov 21, 2023
@umaannamalai umaannamalai marked this pull request as ready for review November 21, 2023 20:46
@umaannamalai umaannamalai requested a review from a team as a code owner November 21, 2023 20:46
Copy link
Contributor

@hmstepanek hmstepanek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good first whack at this. I noted a couple areas in the expected values for the exceptions that need to be changed-there are a lot more that I didn't specifically call out that are along the same lines. I'd prefer to see less drastic changes to the hook code (as I noted I'm not sure if this will work out in practice but maybe give it a try and see if it ends up being simpler).
There are quite a few linter errors that we might want to take a look at as well.

tests/mlmodel_openai/test_chat_completion.py Outdated Show resolved Hide resolved
tests/mlmodel_openai/test_chat_completion_error.py Outdated Show resolved Hide resolved
tests/mlmodel_openai/test_chat_completion_error.py Outdated Show resolved Hide resolved
tests/mlmodel_openai/test_chat_completion_error.py Outdated Show resolved Hide resolved
tests/mlmodel_openai/test_chat_completion_error.py Outdated Show resolved Hide resolved
exact_agents={
"error.message": "Must provide an 'engine' or 'model' parameter to create a <class 'openai.api_resources.chat_completion.ChatCompletion'>",
}
exact_agents={
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like your editor is changing the spacing on a lot of these to be 3 instead of 4 spaces. I'm not sure how black didn't fix that either but we shouldn't be dropping the 4th space on these.

tests/mlmodel_openai/test_embeddings_error.py Outdated Show resolved Hide resolved
tests/mlmodel_openai/test_embeddings_error.py Show resolved Hide resolved
newrelic/hooks/mlmodel_openai.py Outdated Show resolved Hide resolved
"embedding_id": embedding_id,
"appName": settings.app_name,
"api_key_last_four_digits": api_key_last_four_digits,
"span_id": span_id,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a bug in this code right now where we are capturing the wrong span id? Shouldn't we be capturing the span id inside the function trace? I just looked at the SDK code and they attach the function trace's span id.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how this would work in practice but what I was imaging for this design and what I would propose is that this code wouldn't change this much from the previous version and that our .get(attr_name, "") fallback logic would handle most of the cases where things don't exist because an error occurred. I'd propose maybe approaching the implementation from that perspective and see if that works out better?

Copy link
Contributor

@hmstepanek hmstepanek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had couple small fixups to simplify the logic. I also think we shouldn't be trying to handle feedback in the case that an error occurs but I've put a message in the slack channel asking everyone what they think as well.

"request.model": request_args.get("model") or request_args.get("engine") or "",
"vendor": "openAI",
"ingest_source": "Python",
"response.organization": "" if exc_organization is None else exc_organization,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this already defaults to empty string above so you can simplify the logic here.

Suggested change
"response.organization": "" if exc_organization is None else exc_organization,
"response.organization": exc_organization,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I added this was because the organization attribute is found on the exception but it is None (at least for our test cases). The default of "" isn't returned since there was no attribute error so this is attempting to coerce that behavior.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oooh ok that makes sense.

newrelic/hooks/mlmodel_openai.py Show resolved Hide resolved
newrelic/hooks/mlmodel_openai.py Outdated Show resolved Hide resolved
newrelic/hooks/mlmodel_openai.py Outdated Show resolved Hide resolved
newrelic/hooks/mlmodel_openai.py Outdated Show resolved Hide resolved
@mergify mergify bot removed the tests-failing label Nov 29, 2023
Copy link
Contributor

@hmstepanek hmstepanek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for sticking it out through this review process. It's been a lot of back and forth! There's only one minor issue here so I'm just gonna pre-approve this and assume you'll make that change before merging.

"request.model": request_args.get("model") or request_args.get("engine") or "",
"vendor": "openAI",
"ingest_source": "Python",
"response.organization": "" if exc_organization is None else exc_organization,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oooh ok that makes sense.

span_id,
trace_id,
"",
error_response_id,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have code inside create_message to default this correctly if it doesn't exist so just let that code run, otherwise we end up with uuid-0 instead of just uuid.

Suggested change
error_response_id,
None,

}
transaction.record_custom_event("LlmChatCompletionSummary", error_chat_completion_dict)

error_response_id = str(uuid.uuid4())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
error_response_id = str(uuid.uuid4())

span_id,
trace_id,
"",
error_response_id,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
error_response_id,
None,

}
transaction.record_custom_event("LlmChatCompletionSummary", error_chat_completion_dict)

error_response_id = str(uuid.uuid4())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
error_response_id = str(uuid.uuid4())

@hmstepanek hmstepanek merged commit 091a815 into develop-ai-limited-preview Dec 4, 2023
47 checks passed
@hmstepanek hmstepanek deleted the refactor-error-tracing branch December 4, 2023 23:09
@umaannamalai umaannamalai added this to the v9.8.0 milestone Mar 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants