Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setup new staging and prod environments #57

Open
jlewi opened this issue Jan 5, 2020 · 8 comments
Open

Setup new staging and prod environments #57

jlewi opened this issue Jan 5, 2020 · 8 comments
Labels

Comments

@jlewi
Copy link
Collaborator

@jlewi jlewi commented Jan 5, 2020

Opening this issue to document the setup of new staging and prod environments.

We'd like to roll out some changes to the label bot frontend (see kubeflow/code-intelligence#90).
Trying to figure out how to roll things safely is revealing some areas for improvement in the way
our staging and prod clusters are setup.

  • We recently created a new GitHub App kf-label-bot-dev (see kubeflow/code-intelligence#84) to correspond to the dev/staging instance of the label bot

  • We also cleaned up the deployment of the backend (See kubeflow/code-intelligence#70) so we have separate namespaces for dev vs. prod

  • We'd like a similar setup for the label bot frontend with a dev instance using the dev label bot and the prod instance using the prod instance.

It looks like the prod instance is currently running in

  • project: github-probots
  • cluster: kf-ci-ml
  • namespace: mlapp

It looks like there are two separate ingresses in this namespace

kubectl get ingress 
NAME        HOSTS               ADDRESS         PORTS     AGE
ml-gh-app   predict.mlbot.net   35.190.23.225   80, 443   264d
mlbot-net   mlbot.net           34.95.77.230    80        264d
  • The predict.mlbot.net endpoint is handling the webhook for the issue-label-bot GitHub App
  • mlbot.net is handling the web app
  • They are both pointing at the same flask app/ K8s service
    • Looks like there are two ingresses just to allow provisioning two SSL certificates corresponding to two domains

Here's my plan

  • Create namespace label-bot-dev with a dev instance of the label bot
  • Configure the domain label-bot-dev.mlbot.net to use this server
  • Configure namespace label-bot-prod with a prod instance
  • Configure the domain label-bot-prod.mlbot.net to use this server
  • Update the label bot webhook to use label-bot-prod.mlbot.net
  • Update mlbot.net to point to the service in label-bot-prod namespace
@issue-label-bot

This comment has been minimized.

Copy link

@issue-label-bot issue-label-bot bot commented Jan 5, 2020

Issue-Label Bot is automatically applying the label enhancement to this issue, with a confidence of 0.78. Please mark this comment with 👍 or 👎 to give our bot feedback!

Links: app homepage, dashboard and code for this bot.

jlewi added a commit to jlewi/Issue-Label-Bot that referenced this issue Jan 5, 2020
  * create_secrets.py creates secrets needed for dev instance

* Created a kustomize package for deploying the app.

* Need to add in the ingress resources

Related to: machine-learning-apps#57 setup a dev instance
@jlewi

This comment has been minimized.

Copy link
Collaborator Author

@jlewi jlewi commented Jan 6, 2020

Created static IP resources in github-probots named

  • label-bot-dev
  • label-bot-prod

I created CNAME records corresponding to those addresses as well.

@jlewi

This comment has been minimized.

Copy link
Collaborator Author

@jlewi jlewi commented Jan 6, 2020

@jlewi

This comment has been minimized.

Copy link
Collaborator Author

@jlewi jlewi commented Jan 6, 2020

For the dev instance the webhooks are failing with 405's method not allowed.

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<title>405 Method Not Allowed</title>
<h1>Method Not Allowed</h1>
<p>The method is not allowed for the requested URL.</p>

Maybe I have the wrong URL for the webhook; maybe there's a missing path?

jlewi added a commit to jlewi/Issue-Label-Bot that referenced this issue Jan 17, 2020
* machine-learning-apps#57 is tracking setting up new staging and prod environments

  * This PR sets up a new staging (or dev environment)
  * We create a kustomize manifest for deploying the front end into that
    namespace
  * The staging environment is configured to use the dev instance of the
    issue label bot backend microservice (i.e the pubsub workers)
  * I created some python scripts to make it easier to setup the secrets.
  * The motivation for doing this was to test the changes to the front end

* Front end now forwards all issues for the kubeflow org to the backend

  * This is needed because we want to use multiple models for all Kubeflow
    repos kubeflow/code-intelligence#70

  * The backend should also be configured with logging to measure the impact
    of the predictions.

kubeflow/code-intelligence#104 is an a test issue showing that the bot is
working.

* Fix how keys are handled

  * For GOOGLE_APPLICATION_CREDENTIALS; depend on that environment variable
    being set and pointing to the file containing the private key;
    don't get the private key from an environment variable and then write it
    to a file.

* For the GitHub App private key; use an environment variable to point to
  the file containing the PEM key.

* Create a script to create the secrets.

* Flask app is running in dev namespace

  * create_secrets.py creates secrets needed for dev instance
@jlewi

This comment has been minimized.

Copy link
Collaborator Author

@jlewi jlewi commented Jan 18, 2020

Everying this is deployed in prod. Looks like issues were labeled correctly when I manually sent a webhook

Updating the app

Old url
http://predict.mlbot.net/event_handler

New URL
https://label-bot-prod.mlbot.net/event_handler

@jlewi

This comment has been minimized.

Copy link
Collaborator Author

@jlewi jlewi commented Jan 18, 2020

Delivery of webhooks is returning 502s but I don't see any errors in my weblogs.

@jlewi

This comment has been minimized.

Copy link
Collaborator Author

@jlewi jlewi commented Jan 18, 2020

Success issue was labeled with the new bot see
kubeflow/code-intelligence#108

@jlewi

This comment has been minimized.

Copy link
Collaborator Author

@jlewi jlewi commented Jan 18, 2020

We need to tear down the old namespace mlapp

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
1 participant
You can’t perform that action at this time.