-
Notifications
You must be signed in to change notification settings - Fork 10.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Python Otel] Manage call tracer life cycle use call arena. #37460
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
XuanWang-Amos
added
the
release notes: yes
Indicates if PR needs to be in release notes
label
Aug 12, 2024
@XuanWang-Amos For reference, can you please link the PR where the new Core API was introduced into the description? |
It's added in this PR. |
grpc-checks
bot
added
per-call-memory/neutral
and removed
per-call-memory/decrease
labels
Aug 13, 2024
gnossen
approved these changes
Aug 13, 2024
yashykt
reviewed
Aug 13, 2024
yashykt
reviewed
Aug 13, 2024
yashykt
reviewed
Aug 13, 2024
grpc-checks
bot
added
per-call-memory/decrease
per-call-memory/neutral
and removed
per-call-memory/neutral
per-call-memory/decrease
labels
Aug 13, 2024
yashykt
approved these changes
Aug 14, 2024
XuanWang-Amos
added a commit
to XuanWang-Amos/grpc
that referenced
this pull request
Aug 14, 2024
We're seeing segfault in Python CSM tests: ``` 2024-08-03T09:49:45.720555997Z *** SIGSEGV received at time=1722678585 on cpu 0 *** 2024-08-03T09:49:45.721761998Z PC: @ 0x7847ffd5c1c9 (unknown) (unknown) 2024-08-03T09:49:45.722070502Z @ 0x7847fa309d8c 64 absl::lts_20240116::WriteFailureInfo() 2024-08-03T09:49:45.722175904Z @ 0x7847fa309a15 272 absl::lts_20240116::AbslFailureSignalHandler() 2024-08-03T09:49:45.722187675Z @ 0x7847ffc3d050 1592 (unknown) 2024-08-03T09:49:45.723432238Z @ 0x7847e97f9390 (unknown) (unknown) 2024-08-03T09:49:45.723487349Z @ ... and at least 1 more frames 2024-08-03T09:49:45.829702781Z [INFO tini (1)] Spawned child process '/xds_interop_client' with pid '7' 2024-08-03T09:49:45.829766869Z [DEBUG tini (1)] Received SIGCHLD 2024-08-03T09:49:45.829778749Z [DEBUG tini (1)] Reaped child with pid: '7' 2024-08-03T09:49:45.829787070Z [INFO tini (1)] Main child exited with signal (with signal 'Segmentation fault') ``` After investigation, we found that the call tracer was deleted before `RecordEnd` was called. * To fix this, we decide to use arena to manage the life cycle of CallTracer. * Since CallTracer was created in another shard object library (`grpcio_observability`) which don't have a dependency on grpc core, we can't use `grpc_core::Arena` directly when creating the call tracer. * As a workaround, we created a wrapper class `ClientCallTracerWrapper` to wrap the CallTracer and created another core API `grpc_call_tracer_set_and_manage` so that we can manage the life cycle of CallTracer use the wrapper class. <!-- If you know who should review your pull request, please assign it to that person, otherwise the pull request would get assigned randomly. If your pull request is for a specific language, please add the appropriate lang label. --> Closes grpc#37460 COPYBARA_INTEGRATE_REVIEW=grpc#37460 from XuanWang-Amos:fix_otel_segfault 33c0b98 PiperOrigin-RevId: 662966853
XuanWang-Amos
added a commit
to XuanWang-Amos/grpc
that referenced
this pull request
Aug 14, 2024
We're seeing segfault in Python CSM tests: ``` 2024-08-03T09:49:45.720555997Z *** SIGSEGV received at time=1722678585 on cpu 0 *** 2024-08-03T09:49:45.721761998Z PC: @ 0x7847ffd5c1c9 (unknown) (unknown) 2024-08-03T09:49:45.722070502Z @ 0x7847fa309d8c 64 absl::lts_20240116::WriteFailureInfo() 2024-08-03T09:49:45.722175904Z @ 0x7847fa309a15 272 absl::lts_20240116::AbslFailureSignalHandler() 2024-08-03T09:49:45.722187675Z @ 0x7847ffc3d050 1592 (unknown) 2024-08-03T09:49:45.723432238Z @ 0x7847e97f9390 (unknown) (unknown) 2024-08-03T09:49:45.723487349Z @ ... and at least 1 more frames 2024-08-03T09:49:45.829702781Z [INFO tini (1)] Spawned child process '/xds_interop_client' with pid '7' 2024-08-03T09:49:45.829766869Z [DEBUG tini (1)] Received SIGCHLD 2024-08-03T09:49:45.829778749Z [DEBUG tini (1)] Reaped child with pid: '7' 2024-08-03T09:49:45.829787070Z [INFO tini (1)] Main child exited with signal (with signal 'Segmentation fault') ``` ### The issue After investigation, we found that the call tracer was deleted before `RecordEnd` was called. ### Why this fix * To fix this, we decide to use arena to manage the life cycle of CallTracer. * Since CallTracer was created in another shard object library (`grpcio_observability`) which don't have a dependency on grpc core, we can't use `grpc_core::Arena` directly when creating the call tracer. * As a workaround, we created a wrapper class `ClientCallTracerWrapper` to wrap the CallTracer and created another core API `grpc_call_tracer_set_and_manage` so that we can manage the life cycle of CallTracer use the wrapper class. <!-- If you know who should review your pull request, please assign it to that person, otherwise the pull request would get assigned randomly. If your pull request is for a specific language, please add the appropriate lang label. --> Closes grpc#37460 COPYBARA_INTEGRATE_REVIEW=grpc#37460 from XuanWang-Amos:fix_otel_segfault 33c0b98 PiperOrigin-RevId: 662966853
drfloob
pushed a commit
that referenced
this pull request
Aug 14, 2024
…backport) (#37479) Backport of #37460 to v1.66.x. --- We're seeing segfault in Python CSM tests: ``` 2024-08-03T09:49:45.720555997Z *** SIGSEGV received at time=1722678585 on cpu 0 *** 2024-08-03T09:49:45.721761998Z PC: @ 0x7847ffd5c1c9 (unknown) (unknown) 2024-08-03T09:49:45.722070502Z @ 0x7847fa309d8c 64 absl::lts_20240116::WriteFailureInfo() 2024-08-03T09:49:45.722175904Z @ 0x7847fa309a15 272 absl::lts_20240116::AbslFailureSignalHandler() 2024-08-03T09:49:45.722187675Z @ 0x7847ffc3d050 1592 (unknown) 2024-08-03T09:49:45.723432238Z @ 0x7847e97f9390 (unknown) (unknown) 2024-08-03T09:49:45.723487349Z @ ... and at least 1 more frames 2024-08-03T09:49:45.829702781Z [INFO tini (1)] Spawned child process '/xds_interop_client' with pid '7' 2024-08-03T09:49:45.829766869Z [DEBUG tini (1)] Received SIGCHLD 2024-08-03T09:49:45.829778749Z [DEBUG tini (1)] Reaped child with pid: '7' 2024-08-03T09:49:45.829787070Z [INFO tini (1)] Main child exited with signal (with signal 'Segmentation fault') ``` ### The issue After investigation, we found that the call tracer was deleted before `RecordEnd` was called. ### Why this fix * To fix this, we decide to use arena to manage the life cycle of CallTracer. * Since CallTracer was created in another shard object library (`grpcio_observability`) which don't have a dependency on grpc core, we can't use `grpc_core::Arena` directly when creating the call tracer. * As a workaround, we created a wrapper class `ClientCallTracerWrapper` to wrap the CallTracer and created another core API `grpc_call_tracer_set_and_manage` so that we can manage the life cycle of CallTracer use the wrapper class. <!-- If you know who should review your pull request, please assign it to that person, otherwise the pull request would get assigned randomly. If your pull request is for a specific language, please add the appropriate lang label. -->
XuanWang-Amos
added a commit
that referenced
this pull request
Aug 14, 2024
…backport) (#37478) Backport of #37460 to v1.65.x. --- We're seeing segfault in Python CSM tests: ``` 2024-08-03T09:49:45.720555997Z *** SIGSEGV received at time=1722678585 on cpu 0 *** 2024-08-03T09:49:45.721761998Z PC: @ 0x7847ffd5c1c9 (unknown) (unknown) 2024-08-03T09:49:45.722070502Z @ 0x7847fa309d8c 64 absl::lts_20240116::WriteFailureInfo() 2024-08-03T09:49:45.722175904Z @ 0x7847fa309a15 272 absl::lts_20240116::AbslFailureSignalHandler() 2024-08-03T09:49:45.722187675Z @ 0x7847ffc3d050 1592 (unknown) 2024-08-03T09:49:45.723432238Z @ 0x7847e97f9390 (unknown) (unknown) 2024-08-03T09:49:45.723487349Z @ ... and at least 1 more frames 2024-08-03T09:49:45.829702781Z [INFO tini (1)] Spawned child process '/xds_interop_client' with pid '7' 2024-08-03T09:49:45.829766869Z [DEBUG tini (1)] Received SIGCHLD 2024-08-03T09:49:45.829778749Z [DEBUG tini (1)] Reaped child with pid: '7' 2024-08-03T09:49:45.829787070Z [INFO tini (1)] Main child exited with signal (with signal 'Segmentation fault') ``` ### The issue After investigation, we found that the call tracer was deleted before `RecordEnd` was called. ### Why this fix * To fix this, we decide to use arena to manage the life cycle of CallTracer. * Since CallTracer was created in another shard object library (`grpcio_observability`) which don't have a dependency on grpc core, we can't use `grpc_core::Arena` directly when creating the call tracer. * As a workaround, we created a wrapper class `ClientCallTracerWrapper` to wrap the CallTracer and created another core API `grpc_call_tracer_set_and_manage` so that we can manage the life cycle of CallTracer use the wrapper class. <!-- If you know who should review your pull request, please assign it to that person, otherwise the pull request would get assigned randomly. If your pull request is for a specific language, please add the appropriate lang label. -->
sourabhsinghs
pushed a commit
to sourabhsinghs/grpc
that referenced
this pull request
Sep 26, 2024
We're seeing segfault in Python CSM tests: ``` 2024-08-03T09:49:45.720555997Z *** SIGSEGV received at time=1722678585 on cpu 0 *** 2024-08-03T09:49:45.721761998Z PC: @ 0x7847ffd5c1c9 (unknown) (unknown) 2024-08-03T09:49:45.722070502Z @ 0x7847fa309d8c 64 absl::lts_20240116::WriteFailureInfo() 2024-08-03T09:49:45.722175904Z @ 0x7847fa309a15 272 absl::lts_20240116::AbslFailureSignalHandler() 2024-08-03T09:49:45.722187675Z @ 0x7847ffc3d050 1592 (unknown) 2024-08-03T09:49:45.723432238Z @ 0x7847e97f9390 (unknown) (unknown) 2024-08-03T09:49:45.723487349Z @ ... and at least 1 more frames 2024-08-03T09:49:45.829702781Z [INFO tini (1)] Spawned child process '/xds_interop_client' with pid '7' 2024-08-03T09:49:45.829766869Z [DEBUG tini (1)] Received SIGCHLD 2024-08-03T09:49:45.829778749Z [DEBUG tini (1)] Reaped child with pid: '7' 2024-08-03T09:49:45.829787070Z [INFO tini (1)] Main child exited with signal (with signal 'Segmentation fault') ``` ### The issue After investigation, we found that the call tracer was deleted before `RecordEnd` was called. ### Why this fix * To fix this, we decide to use arena to manage the life cycle of CallTracer. * Since CallTracer was created in another shard object library (`grpcio_observability`) which don't have a dependency on grpc core, we can't use `grpc_core::Arena` directly when creating the call tracer. * As a workaround, we created a wrapper class `ClientCallTracerWrapper` to wrap the CallTracer and created another core API `grpc_call_tracer_set_and_manage` so that we can manage the life cycle of CallTracer use the wrapper class. <!-- If you know who should review your pull request, please assign it to that person, otherwise the pull request would get assigned randomly. If your pull request is for a specific language, please add the appropriate lang label. --> Closes grpc#37460 COPYBARA_INTEGRATE_REVIEW=grpc#37460 from XuanWang-Amos:fix_otel_segfault 33c0b98 PiperOrigin-RevId: 662966853
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
bloat/none
lang/core
lang/Python
per-call-memory/neutral
per-channel-memory/neutral
release notes: yes
Indicates if PR needs to be in release notes
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
We're seeing segfault in Python CSM tests:
The issue
After investigation, we found that the call tracer was deleted before
RecordEnd
was called.Why this fix
grpcio_observability
) which don't have a dependency on grpc core, we can't usegrpc_core::Arena
directly when creating the call tracer.ClientCallTracerWrapper
to wrap the CallTracer and created another core APIgrpc_call_tracer_set_and_manage
so that we can manage the life cycle of CallTracer use the wrapper class.