Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recursive tracing issue in database operations when using opentelemetry-instrumentation-psycopg2 with psycopg2's ThreadedConnectionPool #1925

Closed
changemyminds opened this issue Aug 30, 2023 · 1 comment
Labels
bug Something isn't working

Comments

@changemyminds
Copy link
Contributor

changemyminds commented Aug 30, 2023

Describe your environment

Language & Library

  • python version: 3.6.15
  • pip version: 21.3.1
  • psycopg2 version: 2.9.7
  • opentelemetry-instrumentation-psycopg2 version: 0.33b0
  • opentelemetry-exporter-otlp version: 1.12.0

Infra (Docker Image)

  • jaegertracing/all-in-one:1.42
  • otel/opentelemetry-collector:0.72.0
  • postgres:13.2-alpine

Steps to reproduce

Prepare

  • otel-collector-config.yaml
receivers:
  otlp:
    protocols:
      grpc:
      http:

exporters:
  logging:
    loglevel: debug

  jaeger:
    endpoint: jaeger-all-in-one:14250
    tls:
      insecure: true
processors:
  batch:

service:
  pipelines:
    traces:
      receivers: [otlp]
      exporters: [logging, jaeger]
      processors: [batch]
  • docker-compose.yaml
version: "3"

services:
  jaeger-all-in-one:
    image: jaegertracing/all-in-one:1.42
    restart: always
    environment:
      - COLLECTOR_OTLP_ENABLED=true
    ports:
      - "16686:16686" # server frontend
      - "14268:14268" # HTTP collector
      - "14250:14250" # gRPC collector

  otel-collector:
    image: otel/opentelemetry-collector:0.72.0
    restart: always
    command: ["--config=/etc/otel-collector-config.yaml"]
    volumes:
      - ./otel-collector-config.yaml:/etc/otel-collector-config.yaml
    ports:
      - "4317:4317"   # OTLP gRPC receiver
      - "4318:4318"   # OTLP Http receiver
    depends_on:
      - jaeger-all-in-one

  # database
  postgres:
    image: postgres:13.2-alpine
    environment:
      - POSTGRES_USER=root
      - POSTGRES_PASSWORD=12345678
      - POSTGRES_DB=example
    ports:
      - "5432:5432"
  • requirements.txt
psycopg2==2.9.7
opentelemetry-instrumentation-psycopg2>=0.33b0
opentelemetry-exporter-otlp>=1.12.0
  • index.py
import logging
import threading
import time
from psycopg2 import pool
import psycopg2
from opentelemetry import trace
from opentelemetry.instrumentation.psycopg2 import Psycopg2Instrumentor
from opentelemetry.sdk.resources import SERVICE_NAME, Resource
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor

# this database use the connection poll
class PostgresDatabasePool:
    def __init__(self, minconn, maxconn, host, database, user, password, port):
        self.db_pool = pool.ThreadedConnectionPool(
            minconn=minconn,
            maxconn=maxconn,
            host=host,
            database=database,
            user=user,
            password=password,
            port=port
        )

    def execute_query(self, query):
        conn = None
        cursor = None
        try:
            conn = self.db_pool.getconn()
            cursor = conn.cursor()
            cursor.execute(query)
            logging.info(f"execute command: {query}")
        except Exception as e:
            logging.error(f"SQL error: {e}")
        finally:
            if cursor:
                cursor.close()
            if conn:
                self.db_pool.putconn(conn)

# this database use the default connect
class PostgresDatabase:
    def __init__(self, host, database, user, password, port):
        self.conn = psycopg2.connect(
            host=host,
            database=database,
            user=user,
            password=password,
            port=port
        )
        self.conn.autocommit = True
        self.cursor = self.conn.cursor()

    def execute_query(self, query):
        if not self.cursor:
            logging.warning("Please connect first")
            return
        try:
            self.cursor.execute(query)
            logging.info(f"execute command: {query}")
        except Exception as e:
            logging.error(f"SQL error: {e}")

def delete1_with_db_pool(thread_name):
    while True:
        with tracer.start_as_current_span("delete1_with_db_pool", kind=trace.SpanKind.INTERNAL):
            dbPool.execute_query("DELETE FROM public.test1;")
            dbPool.execute_query("DELETE FROM public.test1;")
            dbPool.execute_query("DELETE FROM public.test1;")
            dbPool.execute_query("DELETE FROM public.test1;")
            dbPool.execute_query("DELETE FROM public.test1;")
        time.sleep(5)

def delete2_with_db_pool(thread_name):
    while True:
        with tracer.start_as_current_span("delete2_with_db_pool", kind=trace.SpanKind.INTERNAL):
            dbPool.execute_query("DELETE FROM public.test2;")
            dbPool.execute_query("DELETE FROM public.test2;")
            dbPool.execute_query("DELETE FROM public.test2;")
            dbPool.execute_query("DELETE FROM public.test2;")
            dbPool.execute_query("DELETE FROM public.test2;")
        time.sleep(5)

def delete1(thread_name):
    while True:
        with tracer.start_as_current_span("delete1", kind=trace.SpanKind.INTERNAL):
            db.execute_query("DELETE FROM public.test1;")
            db.execute_query("DELETE FROM public.test1;")
            db.execute_query("DELETE FROM public.test1;")
            db.execute_query("DELETE FROM public.test1;")
            db.execute_query("DELETE FROM public.test1;")
        time.sleep(5)

def delete2(thread_name):
    while True:
        with tracer.start_as_current_span("delete2", kind=trace.SpanKind.INTERNAL):
            db.execute_query("DELETE FROM public.test2;")
            db.execute_query("DELETE FROM public.test2;")
            db.execute_query("DELETE FROM public.test2;")
            db.execute_query("DELETE FROM public.test2;")
            db.execute_query("DELETE FROM public.test2;")
        time.sleep(5)

# tracing
resource = Resource(attributes={SERVICE_NAME: 'Demo_Bug'})
provider = TracerProvider(resource=resource)
otlp_exporter = OTLPSpanExporter(endpoint='localhost:4317', insecure=True)
provider.add_span_processor(BatchSpanProcessor(otlp_exporter))
trace.set_tracer_provider(provider)
tracer = trace.get_tracer("test", "1.0.0")

# Psycopg2
Psycopg2Instrumentor().instrument()

# init database
dbPool = PostgresDatabasePool(1, 5, "localhost", "example", "root", "12345678", 5432)
db = PostgresDatabase("localhost", "example", "root", "12345678", 5432)

# create table
db.execute_query("CREATE TABLE IF NOT EXISTS public.test1 (id serial  NOT NULL , text varchar(64)  NOT NULL);")
db.execute_query("CREATE TABLE IF NOT EXISTS public.test2 (id serial  NOT NULL , text varchar(64)  NOT NULL);")

# init thread
threads = [threading.Thread(target=delete1_with_db_pool, args=("Thread-1",)), threading.Thread(target=delete2_with_db_pool, args=("Thread-2",)),
           threading.Thread(target=delete1, args=("Thread-3",)), threading.Thread(target=delete2, args=("Thread-4",))]

# start
for thread in threads:
    thread.start()

# wait
for thread in threads:
    thread.join()

Run command

docker-compose up -d
pip install -r requirements.txt
python index.py

What is the expected behavior?
The result of tracing methods delete1_with_db_pool and delete2_with_db_pool should be the same as delete1 and delete2.

What is the actual behavior?
image
image

Additional context
I suspect the problem lies in our use of the pool.ThreadedConnectionPool() function in the psycopg2 library to reuse the database connection. It seems that opentelemetry-instrumentation-psycopg2 hasn't taken this case.

#355
#381

@changemyminds changemyminds added the bug Something isn't working label Aug 30, 2023
@changemyminds changemyminds changed the title Recursion issue in database operations when using opentelemetry-instrumentation-psycopg2 with psycopg2's ThreadedConnectionPool Recursive tracing issue in database operations when using opentelemetry-instrumentation-psycopg2 with psycopg2's ThreadedConnectionPool Aug 31, 2023
@changemyminds
Copy link
Contributor Author

After I switched the Python version and ran the same code for testing, the final test result changed.

If you use Python version between 3.6 and 3.8 for testing, you will encounter a recursion issue. However, if you use versions between 3.9 and 3.11, the recursive issue does not appear. It's very strange.

Here are my test result after switching Python versions:

  • 3.6.15 (has a recursion issue)
  • 3.7.16 (has a recursion issue)
  • 3.8.16 (has a recursion issue)
  • 3.9.16 (works well)
  • 3.10.9 (works well)
  • 3.11.1 (works well)

I suspect that the root cause is the Python version, if you want to use psycopg2's ThreadedConnectionPool. make sure your Python version is above 3.9.x; otherwise, you will encounter a recursion issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant