Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2019Q4 - 3.4 Investigate dockerization of webcompat.com application #109

Closed
miketaylr opened this issue Oct 1, 2019 · 16 comments
Closed
Assignees
Labels

Comments

@miketaylr
Copy link

No description provided.

@miketaylr miketaylr added this to Planned in 2019 Q4 OKRs Oct 1, 2019
@karlcow
Copy link

karlcow commented Oct 1, 2019

4791-conteneur-appart

@karlcow karlcow self-assigned this Oct 1, 2019
@karlcow
Copy link

karlcow commented Oct 2, 2019

@karlcow karlcow moved this from Planned to In progress in 2019 Q4 OKRs Oct 2, 2019
@karlcow
Copy link

karlcow commented Oct 10, 2019

Interesting take on the persistence of a sqlite db in between containers.

Persist the SQLite database between containers

Instead of trying to keep the container around, a much cleaner solution is to arrange for the database (or other files you’d like to persist) to be stored in the host’s filesystem. You can do this by adding another volume mount with -v to the run command. Here’s the full command, which stores the database with the source code:
https://developers.redhat.com/blog/2019/09/12/develop-with-flask-and-python-3-in-a-container-on-red-hat-enterprise-linux/

@karlcow
Copy link

karlcow commented Oct 16, 2019

HEROKU

The current webcompat metrics dashboard is being deployed through/on heroku. One of the benefits of heroku deployment is people getting PR individual URLs for testing. Kind of removing the need for a staging instance. Any PR is a staging instance.

This made me think that if we use docker only for deploying stuff, maybe we do not need docker and we could "just" shift to full heroku. I'm not a big fan of these solutions, maybe because of the habits of owning your own stuff and avoiding to rely on too many services.

Heroku has no file system. It means that we would need to change a couple of things.

  1. static assets hosting
  2. images hosting, see 2020Q1 - 0.4 Create image upload service (for eventual cloud migration) #108 (old and new) See also the S3 integration in Heroku.
  3. Sessions sqlite DB moved to Postgres
  4. topsites DB move to Postgres
  5. milestones.json which is created when the application starts as a caching mechanism. Maybe MongoDB solution.
  6. uwsgi seems usable on heroku. Or switch to gunicorn.
  7. Oh! and the domain name, but that seems easy to add custom domain name to heroku instances.

memcache and performances for Flask on heroku. Also check the notion of dyno for web concurrency with gunicorn on heroku.

That would do it.

@karlcow
Copy link

karlcow commented Oct 16, 2019

SHIV

Another proposal for deploying a simple app is Shiv. As explained on Shiv website

Shiv is a command line utility for building fully self contained Python zipapps as outlined in PEP 441 but with all their dependencies included! Shiv’s primary goal is making distributing Python applications fast & easy.

example of a deployment with a django app.

@karlcow
Copy link

karlcow commented Oct 16, 2019

PIKU

piku project The tiniest PaaS you've ever seen. Piku allows you to do git push deployments to your own servers.

piku codebase

The tiniest Heroku/CloudFoundry-like PaaS you've ever seen.
piku, inspired by dokku, allows you do git push deployments to your own servers.

This is very much in development right now.

@johngian
Copy link

johngian commented Oct 24, 2019

Since we've already done the move for a lot of projects from bare-metal servers to containers here are a couple of thoughts:

PaaS

  • Heroku is amazing for teams without dedicated Ops/SRE people working dedicated to the projects. It's very high level but if you don't need to have a low level approach for performance/security/etc reasons its good enough. The other problem is that the pricing model tends to be on the cheap side for non-prod workloads but gets a little bit more expensive when you want to scale up.

  • For a self-hosted solution, I've heard that dokku has a good track record for being rock solid. Also it follows the heroku paradigm on how to deploy services

  • Despite the hype around Kubernetes, as long as there are no dedicated people working on that, I think its an overkill. That said if you can tap into an already maintained running cluster, it might be a good way forward.

Migrating to docker

What I've found pretty useful in the past is

  • Migrating to a 12-factor-like setup for the configuration.
  • Understanding which part of your app can be stateless and where state lives
  • Understanding the notion of a container as a software packaging method (with some isolation up to an extend)
  • Splitting the various components (web workers, background tasks, db, cache, SSL termination, load balancing) to different containers/services.
  • Docker hub is not the most stable registry out there. Maybe it worths the effort to investigate other options.
  • Making sure that the envs (local dev-env, stage, prod) are as similar as possible.
  • For python projects specifically
    • I think whitenoise is great to decouple serving static files from the actual frontend servers
    • The idea is that your app serves the files for you but it comes built-in with CDN, caching, compression features.

Current docker setup

For various reasons I've already dockerized webcompat.com for my dev env and for debugging staging wsgi. This might be useful for future reference:

I hope this is helpful for your migration to the ☁️ 💻 📦 ☁️

@karlcow
Copy link

karlcow commented Oct 25, 2019

Thanks a lot @johngian for the information. Super useful.
Could you review if you have time some of these notes about docker.

As for PaaS, we are using heroku already for http://webcompat-dashboard.herokuapp.com/
https://github.com/webcompat/webcompat-metrics-server/
https://github.com/webcompat/webcompat-metrics-client/

@karlcow
Copy link

karlcow commented Oct 25, 2019

Also for people reading along and not familiar with 12 factor app

@karlcow
Copy link

karlcow commented Oct 29, 2019

2019-10-29

The more I look at it the more i wonder if it's necessary. In terms of cost/benefits. And Probably a PaaS if we want to virtualize a bit more would bring us "benefits with a loss of controls". Doing like webcompat dashboard and pushing to Heroku.

@johngian
Copy link

Thanks a lot @johngian for the information. Super useful.
Could you review if you have time some of these notes about docker.

As for PaaS, we are using heroku already for http://webcompat-dashboard.herokuapp.com/
https://github.com/webcompat/webcompat-metrics-server/
https://github.com/webcompat/webcompat-metrics-client/

Nice post @karlcow. Some comments about it:

  • Dev setup + process can be significantly better if you build around it. In the end docker is "just resource isolation" in Linux kernel. It wont work out for you if you just take your stack and run it in containers.
  • Container benefits not captured (i think): If built properly, you can have significantly less attack vectors (especially in a PaaS world), maintenance overhead and dependency management issues. Also from my experience, having environments so similar to each other (local, dev, stage, prod) it brings much more confidence to release code and can make prod issues debugging much easier

Other than that yeah, there is a lot of hype out there on why all software needs to be containerized which of course doesn't necessarily apply to all the cases.

@karlcow
Copy link

karlcow commented Oct 30, 2019

@ksy36 ksy36 added the On Track label Nov 12, 2019
@karlcow
Copy link

karlcow commented Nov 26, 2019

docker is not necessary the answer. Once the images are separated for processing we have a lot more freedom, and we can imagine hosting the application on Heroku for example. The rest of the dockerization seems not necessary useful. But we could still do docker. The blocker is really the independance of the images.

@miketaylr
Copy link
Author

@karlcow can we write up a summary comment and call this one done?

@karlcow
Copy link

karlcow commented Dec 11, 2019

So to be able to go to a docker or a Heroku solution, these are the friction points we need to address:

  1. Move from sqllite to a postgres like solution for all our dbs, because basically you do not want to lose these data in between each deployment.
  • session.db
  • topsites.db
  1. Remove the dependency on start up for milestones.json
  2. Rewrite the request/saving of assets (images and json) to a storage server (AWS S3 or other)
  3. Move all the static assets to a storage server (css, js, images) to make the caching more effective. Heroku filesystem is ephemeral.

It's not a light modification. A probably a couple of quarters with the externalization of systems step by step.

There is also probably a bit of documentation rewriting when we move to these solutions.

A docker solution would need to address the same issues basically. Coming up with a dev image as @johngian did is simple enough for testing scenario, but not for prod as-is without moving the things outside of the image.

  • Digital Ocean proposes to host docker images.
  • Heroku we have an account, we already use for the webcompat dashboards.

@karlcow karlcow added Complete and removed On Track labels Dec 11, 2019
@karlcow karlcow moved this from In progress to Done in 2019 Q4 OKRs Dec 11, 2019
@karlcow
Copy link

karlcow commented Dec 16, 2019

Deploying with Docker

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
2019 Q4 OKRs
  
Done
Development

No branches or pull requests

4 participants