From f84c36863007cd15134cd9680e30f7441a084997 Mon Sep 17 00:00:00 2001 From: "Mr. ChatGPT" Date: Tue, 2 Jan 2024 04:04:01 +0000 Subject: [PATCH] feat: Added OpenTelemetry Tracing Support This commit introduces OpenTelemetry tracing support to enhance the observability of the application. With this addition, developers and operators can now trace the execution of operations within the application, gaining insights into performance bottlenecks, errors, and overall flow through the service architecture. Key Changes: 1. **Tracing Integration**: - Updated `otel_handler.trace` decorator to enable tracing across critical functions within the application, including camera control, face detection, and launcher operations. - Updated `README.md` with detailed instructions on testing traces and viewing them in Grafana Cloud, ensuring users can effectively utilize tracing capabilities. 2. **Source Code Enhancements**: - Augmented various classes and methods with the `@otel_handler.trace` decorator, embedding tracing into the core functionalities of the application. - Made necessary adjustments to configuration and credential management to support the OpenTelemetry tracing infrastructure. 3. **Documentation Update**: - Extended the `README.md` to include a new section on "Testing Traces," providing users with clear guidance on how to implement, test, and view traces. Testing Done: - **Pytests**: Ran the entire suite of automated tests to ensure existing functionalities remain unaffected and new tracing capabilities integrate seamlessly. - **Real Hardware Testing**: Conducted multiple rounds of thorough testing on real hardware to ensure the traces accurately represent the application's behavior and performance under realistic conditions. - **Grafana Cloud Verification**: Verified that the spans are correctly captured and displayed in Grafana Cloud, providing clear visibility into the application's operational traces. Addresses GitHub Issue: - This commit addresses GitHub issue #49, significantly enhancing the application's observability and troubleshooting capabilities through OpenTelemetry tracing. With the addition of tracing, the application's operations can now be visualized and analyzed in greater detail, providing valuable insights for development, troubleshooting, and performance optimization. ChatGPT links: 1. https://chat.openai.com/share/d9aaa3f5-d3b1-4ff5-81b7-68d204dbcc16 --- README.md | 53 ++++++++++++++++++++++++++++++ src/pygptcourse/camera_control.py | 8 +++++ src/pygptcourse/camera_manager.py | 5 +++ src/pygptcourse/credentials.py | 2 +- src/pygptcourse/face_detector.py | 2 ++ src/pygptcourse/main.py | 1 + src/pygptcourse/otel_decorators.py | 52 ++++++++++++++++++++++------- src/pygptcourse/tshirt_launcher.py | 17 ++++++++++ 8 files changed, 128 insertions(+), 12 deletions(-) diff --git a/README.md b/README.md index e19f965..45af9ea 100644 --- a/README.md +++ b/README.md @@ -625,6 +625,59 @@ Before you begin, ensure you have the following: After running the application and generating some data, you should see metrics appearing in your Grafana dashboard. Verify that the metrics make sense and reflect the application's operations accurately. Look for any discrepancies or unexpected behavior in metric reporting. +### Testing Traces + +#### Tracing Functions + +To trace a function, decorate it with the `@otel_handler.trace` decorator: + +```python +@otel_handler.trace +def your_function_to_trace(arg1, arg2): + # Your function logic +``` + +#### Viewing Traces in Grafana Cloud + +After integrating enhanced tracing capabilities into your application using OpenTelemetry, you can visualize and analyze the traces in Grafana Cloud. Here's how to view the traces: + +##### Tracing Prerequisites + +- Ensure that your application is configured to send traces to Grafana Cloud's OTLP endpoint. This typically involves setting the correct endpoint, API token, and other necessary configuration in your application's OpenTelemetry setup. +- Have access to a Grafana Cloud account where the traces are sent. Ensure you have the appropriate permissions to view and manage traces. + +##### Viewing Traces + +1. **Log in to Grafana Cloud**: Navigate to your Grafana Cloud instance and log in with your credentials. + +1. **Navigate to the Traces Section**: + - Once logged in, look for the "Explore" section in the left-hand menu. + - Within "Explore", you should see an option for "Traces" or "Tempo" (Grafana's tracing backend), depending on your Grafana Cloud setup. + +1. **Selecting Data Source**: + - If prompted, select the appropriate data source that corresponds to where your application sends its traces. This is typically the OTLP endpoint you configured in your application. + +1. **Exploring Traces**: + - **View Trace List**: You will see a list of recent traces. Each trace typically represents a request or transaction in your application. + - **Filtering and Searching**: Use available filters or search functionalities to find specific traces. You can filter by service, operation, duration, and other trace attributes. + - **Trace Details**: Click on a specific trace to view its detailed information, including spans, attributes, and any logs or errors captured. + +1. **Understanding Trace Details**: + - **Spans**: Each trace consists of multiple spans. Each span represents a unit of work in your application, like a function call or a database query. + - **Attributes**: Look at the attributes to understand more about each span, including function arguments, return values, and error messages. + - **Visualization**: Spans are typically visualized in a waterfall diagram showing the parent-child relationships and the time each span took. + +#### Tips for Effective Trace Analysis + +- **Correlate Logs and Metrics**: If possible, correlate trace data with logs and metrics to get a comprehensive view of the application behavior. +- **Use Trace ID**: If you need to correlate a trace with logs or other data, use the trace ID as a reference. +- **Regular Review**: Regularly review trace data to understand typical application behavior and identify areas for performance improvement or error correction. + +#### Grafana Cloud Support + +For more detailed instructions or troubleshooting, refer to the Grafana Cloud documentation or contact Grafana Cloud support. Ensure your Grafana Cloud and OpenTelemetry configurations are correctly set up for successful trace collection and visualization. + + ## Credits This code is based on the original source available at [https://github.com/hovren/pymissile](https://github.com/hovren/pymissile). diff --git a/src/pygptcourse/camera_control.py b/src/pygptcourse/camera_control.py index 969bd0d..b27ac57 100644 --- a/src/pygptcourse/camera_control.py +++ b/src/pygptcourse/camera_control.py @@ -1,5 +1,6 @@ import time +from pygptcourse.otel_decorators import otel_handler from pygptcourse.tshirt_launcher import ( DOWN, LEFT, @@ -20,6 +21,7 @@ class CameraControl: TOLERANCE = IMAGE_HEIGHT / (4 * 2) # for 480, it is 60 launch_count = 0 + @otel_handler.trace def __init__(self, simulation_mode=False): self.simulation_mode = simulation_mode print("Starting initialization of Launcher") @@ -27,11 +29,13 @@ def __init__(self, simulation_mode=False): print("Finished initialization of Launcher") self.current_camera_position = [self.TOTAL_TIME_LR, self.TOTAL_TIME_TB] + @otel_handler.trace def start(self): if not self.launcher.running: print("Starting launcher...") self.launcher.start() + @otel_handler.trace def move_camera(self, direction, duration): cmd = STOP prev_current_camera_position = self.current_camera_position.copy() @@ -77,6 +81,7 @@ def move_camera(self, direction, duration): ) self.launcher.move(cmd, duration) + @otel_handler.trace def move_camera_to_center(self): print("Moving camera to center") # Move to bottom left (0, TOTAL_TIME_TB) @@ -93,6 +98,7 @@ def move_camera_to_center(self): self.move_camera("RIGHT", self.TOTAL_TIME_LR / 2) self.move_camera("UP", self.TOTAL_TIME_TB / 2) + @otel_handler.trace def check_and_move_camera(self, face_center): dx = face_center[0] - (self.IMAGE_WIDTH / 2) dy = face_center[1] - (self.IMAGE_HEIGHT / 2) @@ -114,6 +120,7 @@ def check_and_move_camera(self, face_center): return moving + @otel_handler.trace def launch_if_aligned(self, face_center): moving = self.check_and_move_camera(face_center) if not moving: @@ -122,6 +129,7 @@ def launch_if_aligned(self, face_center): else: print("Target not aligned. Holding launch.") + @otel_handler.trace def stop(self): self.launcher.running = False self.launcher.close() diff --git a/src/pygptcourse/camera_manager.py b/src/pygptcourse/camera_manager.py index 56b78af..2de6deb 100644 --- a/src/pygptcourse/camera_manager.py +++ b/src/pygptcourse/camera_manager.py @@ -1,16 +1,21 @@ import cv2 # type: ignore +from pygptcourse.otel_decorators import otel_handler + class CameraManager: + @otel_handler.trace def __init__(self, resolution=(640, 480)): self.video_capture = cv2.VideoCapture(0) self.video_capture.set(3, resolution[0]) # Horizontal resolution self.video_capture.set(4, resolution[1]) # Vertical resolution + @otel_handler.trace def start(self): # Additional logic for starting the camera can be added here return self.video_capture + @otel_handler.trace def stop(self): # Stop and release the video capture self.video_capture.release() diff --git a/src/pygptcourse/credentials.py b/src/pygptcourse/credentials.py index 01fee33..ce0dff2 100644 --- a/src/pygptcourse/credentials.py +++ b/src/pygptcourse/credentials.py @@ -18,7 +18,7 @@ def __init__(self): ).decode("utf-8") self.endpoint = os.getenv("GRAFANA_OTLP_ENDPOINT") if self.endpoint: - self.trace_endpoint = self.endpoint + "/v1/traces" + self.traces_endpoint = self.endpoint + "/v1/traces" self.metrics_endpoint = self.endpoint + "/v1/metrics" self.logs_endpoint = self.endpoint + "/v1/logs" diff --git a/src/pygptcourse/face_detector.py b/src/pygptcourse/face_detector.py index ca005db..32a98f0 100644 --- a/src/pygptcourse/face_detector.py +++ b/src/pygptcourse/face_detector.py @@ -8,6 +8,7 @@ def __init__(self, face_images, image_loader): self.image_loader = image_loader self.face_encodings = self.load_and_encode_faces(face_images) + @otel_handler.trace def load_and_encode_faces(self, face_images): encodings = {} for name, image_path in face_images.items(): @@ -16,6 +17,7 @@ def load_and_encode_faces(self, face_images): encodings[name] = face_recognition.face_encodings(image)[0] return encodings + @otel_handler.trace def detect_faces(self, image): face_locations = face_recognition.face_locations(image) face_encodings = face_recognition.face_encodings(image, face_locations) diff --git a/src/pygptcourse/main.py b/src/pygptcourse/main.py index 4c4fa27..5babc7a 100755 --- a/src/pygptcourse/main.py +++ b/src/pygptcourse/main.py @@ -25,6 +25,7 @@ def is_display_available(): return "DISPLAY" in os.environ +@otel_handler.trace def main(): parser = argparse.ArgumentParser(description="Run the camera control system.") parser.add_argument( diff --git a/src/pygptcourse/otel_decorators.py b/src/pygptcourse/otel_decorators.py index 3039c9e..e2a1c96 100644 --- a/src/pygptcourse/otel_decorators.py +++ b/src/pygptcourse/otel_decorators.py @@ -3,10 +3,14 @@ from functools import wraps from opentelemetry.exporter.otlp.proto.http.metric_exporter import OTLPMetricExporter +from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter from opentelemetry.metrics import get_meter_provider, set_meter_provider from opentelemetry.sdk.metrics import MeterProvider from opentelemetry.sdk.metrics.export import PeriodicExportingMetricReader from opentelemetry.sdk.resources import SERVICE_NAME, Resource +from opentelemetry.sdk.trace import TracerProvider +from opentelemetry.sdk.trace.export import BatchSpanProcessor +from opentelemetry.trace import get_tracer_provider, set_tracer_provider from pygptcourse.credentials import OpenTelemetryCredentials @@ -33,9 +37,11 @@ def get_count(self, labels=None): class OpenTelemetryHandler: - def __init__(self): + def __init__(self, trace_interval_minutes=5): self.creds = OpenTelemetryCredentials() self.enabled = self.creds.is_configured() + self.last_trace_time = 0 + self.trace_interval_seconds = trace_interval_minutes * 60 if self.enabled: try: @@ -59,6 +65,21 @@ def __init__(self): self.meter = get_meter_provider().get_meter(service_name, VERSION) + # Setup Tracing + # Initialize the tracer provider and trace exporter + self.otlp_trace_exporter = OTLPSpanExporter( + endpoint=f"{self.creds.traces_endpoint}", + headers={"authorization": f"Basic {self.creds.api_encoded_token}"}, + ) + trace_provider = TracerProvider(resource=self.resource) + trace_provider.add_span_processor( + BatchSpanProcessor(self.otlp_trace_exporter) + ) + + # Set the fully configured tracer provider globally + set_tracer_provider(trace_provider) + self.tracer = get_tracer_provider().get_tracer(service_name, VERSION) + # Metric definitions self.usb_failures = self.meter.create_counter( "usb_connection_failures", @@ -73,6 +94,7 @@ def __init__(self): description="Total number of faces detected", unit="int", ) + except Exception as e: # Handle initialization failure by disabling OpenTelemetry and using dummy metrics self.enabled = False @@ -93,21 +115,29 @@ def _initialize_dummy_metrics(self): def trace(self, func): @wraps(func) def wrapper(*args, **kwargs): - if self.enabled: - # If OTLP is enabled, do something before the function (e.g., start a span) + if not self.enabled: # Skip all tracing if OTLP is not configured + return func(*args, **kwargs) - # Execute the function - result = func(*args, **kwargs) + # Use the stored tracer instance + with self.tracer.start_as_current_span(func.__name__) as span: + # Capture and log function arguments + span.set_attribute("arguments", str(args) + " " + str(kwargs)) - # Do something after the function (e.g., end the span) + try: + # Execute the wrapped function + result = func(*args, **kwargs) - return result - else: - # If OTLP is not enabled, just execute the function - return func(*args, **kwargs) + # Capture and log the return value + span.set_attribute("return_value", str(result)) + return result + except Exception as e: + # Capture and log the exception details + span.set_attribute("error", True) + span.record_exception(e) + raise return wrapper # Global instance of the handler -otel_handler = OpenTelemetryHandler() +otel_handler = OpenTelemetryHandler(trace_interval_minutes=1) diff --git a/src/pygptcourse/tshirt_launcher.py b/src/pygptcourse/tshirt_launcher.py index f0d012d..9de9180 100644 --- a/src/pygptcourse/tshirt_launcher.py +++ b/src/pygptcourse/tshirt_launcher.py @@ -8,6 +8,8 @@ import usb.core # type: ignore import usb.util # type: ignore +from pygptcourse.otel_decorators import otel_handler + VENDOR = 0x1941 PRODUCT = 0x8021 @@ -50,28 +52,35 @@ def __init__(self): self.running = False super().__init__() + @otel_handler.trace def send_command(self, command): print(f"Simulated sending command {command}") + @otel_handler.trace def start(self): self.running = True print("Simulated launcher started") + @otel_handler.trace def stop(self): self.running = False print("Simulated launcher stopped") + @otel_handler.trace def fire(self): print("Simulated firing") + @otel_handler.trace def move(self, command, duration): print(f"Simulating move with command {command} for duration {duration}") + @otel_handler.trace def close(self): print("Simulated launcher closed") class Launcher(AbstractLauncher): + @otel_handler.trace def __init__(self): dev = usb.core.find(idVendor=VENDOR, idProduct=PRODUCT) @@ -119,16 +128,19 @@ def __init__(self): # except usb.core.USBError, e: # print("RESET ERROR", e) + @otel_handler.trace def start(self): self.running = True self.t = threading.Thread(target=self.read_process) self.t.start() self.running = True + @otel_handler.trace def stop(self): self.running = False print("Thread stopped") + @otel_handler.trace def read_process(self): abort_fire = False fire_complete_time = time.time() @@ -199,12 +211,14 @@ def read_process(self): self.close() print("THREAD STOPPED") + @otel_handler.trace def read(self, length): try: return self.ep.read(length) except usb.core.USBError: return None + @otel_handler.trace def send_command(self, command): try: self.command = command @@ -212,6 +226,7 @@ def send_command(self, command): except usb.core.USBError as e: print("SEND ERROR", e) + @otel_handler.trace def move(self, command, duration): try: self.send_command(command) @@ -220,6 +235,7 @@ def move(self, command, duration): except usb.core.USBError as e: print("SEND ERROR", e) + @otel_handler.trace def fire(self): try: self.firing = True @@ -230,6 +246,7 @@ def fire(self): # added to see if this would fix the overheating problem # after the program exits when connected to a Mac + @otel_handler.trace def close(self): self.stop() print("Closing connection")