Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create/implement task definition for 2018 Transportation-Systems-Service project #104

Closed
iant01 opened this issue May 17, 2018 · 19 comments
Closed

Comments

@iant01
Copy link
Contributor

iant01 commented May 17, 2018

Create the service sub directory and service.yaml file for use in getting the service task definition into ECS.

@iant01
Copy link
Contributor Author

iant01 commented May 18, 2018

created PR16
changes needed to master.yaml to add transportation-systems service and service.yaml file to define task definition and load balancer listener rule for service.

Right now, the following items have arbitrarily set values:
Host: staging-2018.civicpdx.org
Path: /transportation-systems
Port:3000
Priority: 40 (needs to be before the civic-2018 service and the civic-lab service)
Memory: 2048 2GB (last years service was a memory pig, hopefully this years will be less, setting high to start)

@iant01 iant01 self-assigned this May 19, 2018
@znmeb
Copy link
Contributor

znmeb commented May 19, 2018

@iant01 How much memory did we use last year? And how do you measure it? Is there some way we can test this locally before deploying?

@iant01
Copy link
Contributor Author

iant01 commented May 19, 2018

Can possible use docker stat on a running host to get the memory info on the running containers. Either a container developer would need to run the command on their local system or on another ECS instance. Since we can't ssh into the hacko's container instance to run the command we might be able to run the transportation-systems container on another ECS instance, but I have not had any success running the 2017 container in my AWS account, so may not have success with the 2018 container. I will give it a try.

There may be a docker API that might work to the hacko ECS instance, but again we might need an access key to get in.

@znmeb
Copy link
Contributor

znmeb commented May 19, 2018

This is the API containers, right? If those look like this year's API images from the backend-examplar, either there's an AWS way to monitor their usage or we'd need console access to the Docker host. :-(

@MikeTheCanuck
Copy link
Contributor

@znmeb , is there any chance of running the container locally, performing a few operations through the API (to load up some in-memory data) and running the docker stat command as Ian suggested above?

@MikeTheCanuck
Copy link
Contributor

There is no way we're going to throw 1/4 of our available memory at a new container "just in case" - this was only done last year as a last-minute, last-resort fix, and no one's had time to go back and characterize that pig since then.

@znmeb
Copy link
Contributor

znmeb commented May 19, 2018

Yeah, I can spin it up locally but this isn't the full API. Should I just use the Docker host default settings for container resource usage?

It would be really nice if we could build resource limiting into the images - interpreted languages like Python tend to take up all the RAM they can find even if they're sharing it with a dozen other containers / VMs they don't know about.

@MikeTheCanuck
Copy link
Contributor

I'm confused - why isn't the Docker image you'd spin up locally not "the full API"? Isn't that one of the benefits of Docker, so that the app you run locally and the one you deploy into production are identical?

@znmeb
Copy link
Contributor

znmeb commented May 19, 2018

It's the full API for the one database we had when we built the image. We have more data now, which will mean more models and more API endpoints and probably more RAM used.

@BrianHGrant
Copy link
Member

So we have some options to profile python and django behavior, including running DEBUG true with the gunicorn server (-p) connecting with aws db, some usage of the django DEBUG toolbar (not currently installed) or maybe through new relic if we need some more advance info not provided by docker stat

That said there were some complexities to the transportaation project last year that didn't exist when I left a bit ago, will catch up this weekend but not sure if this will be an issue.

Good data on usage is awesome and great though.

@MikeTheCanuck
Copy link
Contributor

Let's not go overboard here - the most significant information we'll need to know is roughly how much RAM the Django app(s) in the container will consume, so that we can allocate a sufficient amount of RAM in the AWS CloudFormation template for this container. We generally start out with 100MB and bump it by increments of 100 from there, and spent a lot of time last year debugging containers that wouldn't stay running because we had no idea what kind of memory load they would have.

However, we're not just going to throw RAM at these - this isn't an unlimited resource - so if there's some risk that they'll need more than 100MB, let's get a rough number based on some rough characterization. Thanks!

@znmeb
Copy link
Contributor

znmeb commented May 19, 2018

By the way, wouldn't DEBUG=True use more RAM?

@iant01
Copy link
Contributor Author

iant01 commented May 19, 2018

Silly question... was the transportation container last year running a database rather than connecting to a database server or was it a hybrid of both (keeping large amounts of data local after grabbing it from a remote DB server?

@bhgrant8
Copy link
Member

Yes DEBUG would use more RAM.

But here is where I've got to:

docker stats gives a streaming output, so a PIT of memory usage and a few other stats:

CPU %               MEM USAGE / LIMIT     MEM %               NET I/O             BLOCK I/O           PIDS

I then ran this on the docker container transportation-system-backend_api_production_1 on my host machine. Using the current api

During startup of container using the prod flag ./bin/start.sh -p and connecting to the aws hosted db, we see CPU % maxing out around 85%, with the memory usage going to around 152 MiBs.

transportsystems_docker_stat

The thing I was seeing though is that the MEM USAGE did not seem to drop by more then a few MiBs, a few queries using filters on the crash data, I made it to a ~225MiBs. So started looking into what this figure actually included.

First, I found Google's cAdvisor (https://github.com/google/cadvisor). This provides a GUI and provides 60 seconds of historical data, so a bit more useful then docker stat.

screen shot 2018-05-19 at 12 13 04 pm

Looking into the MEM usage came across this issue, which documents what the different types of memory are being recorded:

google/cadvisor#638

tldr is:

Hot is the working set - pages that has been recently touched as calculated
by the kernel.

Total includes hot + cold memory - where cold are the pages that have not
been touched in a while and can be reclaimed if there was global memory
pressure.

or another way:

Total (memory.usage_in_bytes) = rss + cache
Working set = Total - inactive (not recently accessed memory = inactive_anon + inactive_file)

So question becomes which is most important number?

@bhgrant8
Copy link
Member

@iant01 I feel like there was some type of hybrid data store going on, but was not directly on project last year and was not completely sure of the full magic that was happening.

@MikeTheCanuck
Copy link
Contributor

Awesome data Brian, thank you.

When we allocate memory to each container, there’s no memory management to worry about - as in, the “cold” memory that could be reclaimed probably wouldn’t be, because there’s nothing else in the container that would appreciably request contended memory (it’d all be consumed by one process - gunicorn, Python, whatever the runtime host is).

So given we’re doing hard allocations per container, I’m going to conservatively assume that we should use the Total - and then round up to the nearest 100 (just to give us a little breathing room for edge cases and future API enhancements).

Based on this data, I’m inclined to allocate 300 MB to this transportation-systems container.

@znmeb
Copy link
Contributor

znmeb commented May 19, 2018

I've got the merged database ready for testing - I'm planning to build a local development environment from it at the May 20 build session so we can see what we have.

@iant01
Copy link
Contributor Author

iant01 commented May 20, 2018

All of the discussion on memory use should be moved to its own new issue, this issue was intended for creation of the service task to get things going in ECS.

This issue can be closed once all the Memory discussion is in its own issue and PR 16 has been merged.

@iant01
Copy link
Contributor Author

iant01 commented May 20, 2018

On the issue of which memory size is relevant it would be the Total memory size.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

No branches or pull requests

5 participants