Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce latency between EC2-hosted database and Django containers #50

Open
MikeTheCanuck opened this issue Mar 31, 2017 · 3 comments
Open

Comments

@MikeTheCanuck
Copy link
Collaborator

MikeTheCanuck commented Mar 31, 2017

Summary

Backend Django API containers deployed to ECS are routinely/rapidly deemed "unhealthy" by ALB and bounced out for a new container, which also doesn't work, ad infinitum.

Details

Generally speaking, the backend Django-hosting containers are not a healthy lot. While some will respond to HTTP requests (either to the Swagger root or to the API endpoints themselves), nearly all of them are in some state of disrepair/inability to service client requests consistently.

Potential Issue: database latency

Requests to the Budget database are incredibly slow for non-trivial endpoints, even when running via a local container and talking to the EC2-hosted PostgreSQL:

  • 502/504 errors for the /ocrb/ and /history/ endpoints when no parameters are submitted
  • 5-15 second response time to the /code/ and /kpm/ endpoints

Oddly, parameterized (i.e. filtered) requests to these endpoints receive super-quick responses.

In the ECS environment, the containers aren't faring any better. In ECS at least, the database is "across the Internet" however - the container app is configured to look for the DB on its external IP address, losing all the benefits of both app + DB being hosted in the same AWS region.

Hell, submitting this request (/budget/history/?fiscal_year=2015-16) via the ECS container still 502'd, but when submitted through a local container, it responded after ~10 seconds

Possible fixes (discussed in #49)

  1. Move to RDS
  2. route from app to DB via private IP addresses in a single VPC
  3. host the PostgreSQL database in an adjacent container

If we had any experience with it to date, the "right" (though likely more costly) answer is start with (1) for as many projects as can tolerate it . That we have no experience with an RDS deployment means we're in danger of sinking days or weeks into figuring that deployment model out, when we have so many other critical tasks between now and Demo Day.

In the absence of (1), (2) sounds like next-best (but adding more complexity to the branching setup we already have), and (3) seems least-good but might be our last resort.

@MikeTheCanuck
Copy link
Collaborator Author

MikeTheCanuck commented Mar 31, 2017

Idea: dig into psycopg2, thread safety, "library-friendly lock"

Interesting information: from this gunicorn bug report I spotted this info about the psycopg2 adapter and wonder if this is related:

Following your pointer, I had a look at the psycopg2 adapter - which we use to connect our Django app to Postgres - and discovered this section of the documentation which states:

Warning: Psycopg connections are not green thread safe and can’t be used concurrently by different green threads. Trying to execute more than one command at time using one cursor per thread will result in an error (or a deadlock on versions before 2.4.2).

Therefore, programmers are advised to either avoid sharing connections between coroutines or to use a library-friendly lock to synchronize shared connections, e.g. for pooling.

In other words - psycopg2 doesn't like green threads. Based on the behaviour we encountered, I would guess that this is the source of the error. The suggested way to deal with this issue, according to the psycopg docs, is to use a library which enables psycopg support for coroutines.

The recommended library is psycogreen.

I don't know squat about "green threads" so I'm hoping one of you fine folks recognize if this is relevant.

@MikeTheCanuck
Copy link
Collaborator Author

Idea: reduce the ALB Health Check timeout

This comment about a conceptually-similar timeout in Heroku makes this approach seem very promising.

@MikeTheCanuck
Copy link
Collaborator Author

Idea: investigate the use of uWSGI

This comment is one anecdote to give us hope?

@MikeTheCanuck MikeTheCanuck changed the title Database latency leading to unhealthy containers? Reduce latency between database and Django containers Apr 1, 2017
@MikeTheCanuck MikeTheCanuck changed the title Reduce latency between database and Django containers Reduce latency between EC2-hosted database and Django containers Apr 1, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant