Skip to content
This repository has been archived by the owner on Jan 31, 2022. It is now read-only.

Ensemble repo specific and non-repo specific models #70

Closed
jlewi opened this issue Nov 27, 2019 · 32 comments
Closed

Ensemble repo specific and non-repo specific models #70

jlewi opened this issue Nov 27, 2019 · 32 comments

Comments

@jlewi
Copy link
Contributor

jlewi commented Nov 27, 2019

Currently the label bot has two models

  1. A generic model trained on all repositories that only handles labels bug, question, feature
  2. (Optionally) A repo specific model for repo specific labels

Currently the code uses one model or the other
https://github.com/machine-learning-apps/Issue-Label-Bot/blob/536e8bf4928b03d522dd021c0464587747e90a87/flask_app/app.py#L143

In particular if a repo specific model is available it is used to predict the kind label. This is undesirable because the repo specific model usually does much worse when predicting the kind label compared to the repo specific model.

What we'd prefer to do is use the generic model for the kind label to predict the kind label and the repo specific model to predict other labels.

We can probably achieve this just by modifying the code here

self.load_yaml(repo_owner, repo_name)

To also use the generic model to compute the kind label. Basically by copying this code
https://github.com/machine-learning-apps/Issue-Label-Bot/blob/536e8bf4928b03d522dd021c0464587747e90a87/flask_app/app.py#L164

@issue-label-bot
Copy link

Issue-Label Bot is automatically applying the label kind/feature to this issue, with a confidence of 0.70. Please mark this comment with 👍 or 👎 to give our bot feedback!

Links: app homepage, dashboard and code for this bot.

@jlewi jlewi added the help wanted Extra attention is needed label Nov 27, 2019
@hamelsmu hamelsmu changed the title Combine repo specific and non-repo specific models Ensemble repo specific and non-repo specific models Dec 6, 2019
jlewi pushed a commit to jlewi/code-intelligence that referenced this issue Dec 23, 2019
… kind.

* To combine multiple models we define a base class IssueLabelModel
  that defines a common interface for all the models.

  * This way different models can just implement that interface and we
    can easily combine the results.

* UniveralKindLabelModel will be used for the generic model to predict
  label kind that is trained on all repositories.

  * The UniversalKindLabelModel class is based on the IssueLabeler code
    https://github.com/machine-learning-apps/Issue-Label-Bot/blob/536e8bf4928b03d522dd021c0464587747e90a87/flask_app/utils.py#L67

* kubeflow#70 Combine multiple models
jlewi pushed a commit to jlewi/code-intelligence that referenced this issue Dec 27, 2019
…er models.

* This PR includes a number of changes that make it easier to spin up
  the issue embedding service as a microservice intended as an internal
  backend for other models/service.

* issue_embbedding service should not be exposed externally or do auth
  based on a token

  * Fix kubeflow#76

  * Doing auth just complicates matters and is unnecessary; if we want to expose
    it externally and secure it (e.g. with IAP) we can do so by using Kubeflow
    and putting it behind the Kubeflow ingress

  * Doing auth makes it harder to use the embedding service as backend because
    the client and server need to aggree on the API key used for authorization.

  * As a result delete the K8s resources related to the ingress and certificate

* Create a kustomize package for deploying it.

  * Using kustomize will make it easier to define different versions of the
    service corresponding to dev, staging, and prod environments

* Create a skaffold configuration for the issue embedding service

  * Skaffold makes it easy to continually rebuild and deploy the service;
    this is very useful for development.

  * Related to: kubeflow#78

* Cleanup the Dockerfile for the embedding service
  * We don't install requirements.txt
  * Need to install flask session.

* Simplify the naming scheme for the issue embedding service.

* The embedding service flask app can't run with debug mode turned on
  so raise an exception if debug mode is on.

* Fixes to Embedding service Dockerfile

  * Don't cache the model inside the dockerfile
  * Make the docker context the root of the repo as opposed to Issue_Embeddings
    This makes it possible to include files in /py in the docker container

  * It doesn't look like we need to pip install requirements.txt because the
    Dockerfile is explicitly installing the dependencies we need.

* Use port 80 for the kubernetes service; its more convenient to expose it on
  the common port rather than a unique service specific port.

Related to: kubeflow#70 combine universal label model
  and repo specific model.
jlewi pushed a commit to jlewi/code-intelligence that referenced this issue Dec 27, 2019
…al model for issue kind.

* To combine multiple models we define a base class IssueLabelModel
  that defines a common interface for all the models.

  * This way different models can just implement that interface and we
    can easily combine the results.

* UniveralKindLabelModel will be used for the generic model to predict
  label kind that is trained on all repositories.

  * The UniversalKindLabelModel class is based on the IssueLabeler code
    https://github.com/machine-learning-apps/Issue-Label-Bot/blob/536e8bf4928b03d522dd021c0464587747e90a87/flask_app/utils.py#L67

* repo_specific_model.py contains a model class for predicting labels
  using a repo specific model.

  * Most of the code was previously inside worker.py
  * We wrap this code in the newly defined model interface so we can easily
    combine different models.

  * Subsequent PR wil update worker.py to use the new code

* Copy unittest functions from test_worker and into repo_specific_mode_test.py

* Change the signature of functions from (owner, repo, issue_num) to
  (title, text)

  * We want to be able to generate predictions using multiple models but we
    only want to make calls to the GitHub API to fetch issue information once.

  * So we should do the fetching from GitHub upstream of the prediction
    functions.

* Related to: kubeflow#70 Combine multiple models
@jlewi
Copy link
Contributor Author

jlewi commented Dec 30, 2019

Now that we are moving to an asynchronous model based on pubsub we could easily have multiple models operating independently just by creating multiple workers with different pubsub subscriptions.

If the models are operating independently we'd have to think about how label combination would be applied. e.g. suppose 1 model applies the label kind/bug but a different model predicts kind/question. How should these be combined? If the workers are operating independently would we remove the previous models labels?

@kf-label-bot-dev
Copy link

Issue-Label Bot is automatically applying the labels feature, kind/bug, kind/question to this issue, with the confidence of 0.75, 0.01, 0.24.
Please mark this comment with 👍 or 👎 to give our bot feedback!
Links: app homepage, dashboard and code for this bot.

@jlewi
Copy link
Contributor Author

jlewi commented Dec 30, 2019

One of the issues that @hamelsmu raised in the code review #82 is that the probabilities for different models won't be on the same scale. So simply take the max probability from the two models for the same label may not make sense.

I think this also means that each model should have its own confidence thresholds. For the generic model the thresholds are set here
https://github.com/machine-learning-apps/Issue-Label-Bot/blob/536e8bf4928b03d522dd021c0464587747e90a87/flask_app/app.py#L42

One way to combine the models would be for each model apply the respective thresholds and then take the union of the labels.

k8s-ci-robot pushed a commit that referenced this issue Dec 31, 2019
…al model for issue kind (#82)

* Define classes to combine repository specific labels with the universal model for issue kind.

* To combine multiple models we define a base class IssueLabelModel
  that defines a common interface for all the models.

  * This way different models can just implement that interface and we
    can easily combine the results.

* UniveralKindLabelModel will be used for the generic model to predict
  label kind that is trained on all repositories.

  * The UniversalKindLabelModel class is based on the IssueLabeler code
    https://github.com/machine-learning-apps/Issue-Label-Bot/blob/536e8bf4928b03d522dd021c0464587747e90a87/flask_app/utils.py#L67

* repo_specific_model.py contains a model class for predicting labels
  using a repo specific model.

  * Most of the code was previously inside worker.py
  * We wrap this code in the newly defined model interface so we can easily
    combine different models.

  * Subsequent PR wil update worker.py to use the new code

* Copy unittest functions from test_worker and into repo_specific_mode_test.py

* Change the signature of functions from (owner, repo, issue_num) to
  (title, text)

  * We want to be able to generate predictions using multiple models but we
    only want to make calls to the GitHub API to fetch issue information once.

  * So we should do the fetching from GitHub upstream of the prediction
    functions.

* Related to: #70 Combine multiple models

* Address comments.

* Fix the tests; the code had been updated but the unittests had not been updated to match.

* Update the comment.

* Update comment.
k8s-ci-robot pushed a commit that referenced this issue Dec 31, 2019
…er models. (#80)

* This PR includes a number of changes that make it easier to spin up
  the issue embedding service as a microservice intended as an internal
  backend for other models/service.

* issue_embbedding service should not be exposed externally or do auth
  based on a token

  * Fix #76

  * Doing auth just complicates matters and is unnecessary; if we want to expose
    it externally and secure it (e.g. with IAP) we can do so by using Kubeflow
    and putting it behind the Kubeflow ingress

  * Doing auth makes it harder to use the embedding service as backend because
    the client and server need to aggree on the API key used for authorization.

  * As a result delete the K8s resources related to the ingress and certificate

* Create a kustomize package for deploying it.

  * Using kustomize will make it easier to define different versions of the
    service corresponding to dev, staging, and prod environments

* Create a skaffold configuration for the issue embedding service

  * Skaffold makes it easy to continually rebuild and deploy the service;
    this is very useful for development.

  * Related to: #78

* Cleanup the Dockerfile for the embedding service
  * We don't install requirements.txt
  * Need to install flask session.

* Simplify the naming scheme for the issue embedding service.

* The embedding service flask app can't run with debug mode turned on
  so raise an exception if debug mode is on.

* Fixes to Embedding service Dockerfile

  * Don't cache the model inside the dockerfile
  * Make the docker context the root of the repo as opposed to Issue_Embeddings
    This makes it possible to include files in /py in the docker container

  * It doesn't look like we need to pip install requirements.txt because the
    Dockerfile is explicitly installing the dependencies we need.

* Use port 80 for the kubernetes service; its more convenient to expose it on
  the common port rather than a unique service specific port.

Related to: #70 combine universal label model
  and repo specific model.
jlewi pushed a commit to jlewi/code-intelligence that referenced this issue Dec 31, 2019
… kind.

* To combine multiple models we define a base class IssueLabelModel
  that defines a common interface for all the models.

  * This way different models can just implement that interface and we
    can easily combine the results.

* UniveralKindLabelModel will be used for the generic model to predict
  label kind that is trained on all repositories.

  * The UniversalKindLabelModel class is based on the IssueLabeler code
    https://github.com/machine-learning-apps/Issue-Label-Bot/blob/536e8bf4928b03d522dd021c0464587747e90a87/flask_app/utils.py#L67

* kubeflow#70 Combine multiple models
jlewi pushed a commit to jlewi/code-intelligence that referenced this issue Dec 31, 2019
* Most of the inference logic in the Worker class can be deleted and
we can instead use the newly created Model classes.

* Use an in cluster deployment service
* Log the model and thresholds.

* cli.py is a simple CLI to send pubsub requests
  * It is ended for development and debugging.

* Create a utility to parse issue specs of the form {org}/{repo}#number

* We need to initialize IssueLabelPredictor() inside the pubsub thread;
otherwise we have threading issues when trying to access the model.

* The universal model needs to apply appropriate thresholds to its predictions.

* Worker no longer needs to apply thresholding of predictions because that is
  now handled by the individual model classes.

* Add a method to worker apply_repo_config that handles repo specific behavior
specified in the YAML file in the repo

  * Apply aliases to the labels
  * If the config specifies which labels to predict then filter out all other
    labels.

* When commenting on issues about the predictions, the bot should reate a markdown table with the probabilities for the labels in the comment.

* Remove test_worker.py; the test is no longer relevant after the refactor.
  There is a TODO in worker.py to create an appropriate unittest for the new
  code.

* Change how the GitHub App Private Keys are handled.

  * The private key should be mounted as a secret and an environment variable
    should specify the path of the PEM key.

  * Stop storing the PEM key as an environment variable and then in python code writing it to a file.

  * Change the name of the environment variables to add the prefix "GITHUB_"
to make it more explanatory.

  * Move some of the helper methods for creating GitHubApp out of github_util
    and into the GitHubApp class

Related to: kubeflow#70 combine universal and repo specific models.
jlewi pushed a commit to jlewi/code-intelligence that referenced this issue Dec 31, 2019
… kind.

* To combine multiple models we define a base class IssueLabelModel
  that defines a common interface for all the models.

  * This way different models can just implement that interface and we
    can easily combine the results.

* UniveralKindLabelModel will be used for the generic model to predict
  label kind that is trained on all repositories.

  * The UniversalKindLabelModel class is based on the IssueLabeler code
    https://github.com/machine-learning-apps/Issue-Label-Bot/blob/536e8bf4928b03d522dd021c0464587747e90a87/flask_app/utils.py#L67

* kubeflow#70 Combine multiple models
@kf-label-bot-dev
Copy link

Issue-Label Bot is automatically applying the labels:

Label Probability
feature 0.75

Please mark this comment with 👍 or 👎 to give our bot feedback!
Links: app homepage, dashboard and code for this bot.

@jlewi
Copy link
Contributor Author

jlewi commented Dec 31, 2019

Now that the code is complete we need to deploy it.

I think we want to create a new Kubeflow cluster to deploy the updated microservices. Currently we have the different microservices each running in their own cluster in project issue-label-bot-dev.

  • github-api-cluster - This cluster appears to be running the issue embedding microservice.

    • After the code changes; this should now be a microservice in the same cluster as the pubsub
      workers.
  • workers - This cluster appears to be running the pubsub workers that are handling the production topic.

  • github-mlapp-test This appears to be the staging cluster

    • Per setup_staging_cluster.md this appears to be a staging instance that is also running an instance of the flask app that handles the GitHub webhooks.
  • issue-label-bot-dev-kf This is a KF cluster used for development.

I think we should do the following

@jlewi
Copy link
Contributor Author

jlewi commented Dec 31, 2019

Created a 0.7 deployment.

  • project: issue-label-bot-dev
  • cluster: issue-label-bot
  • zone: us-east1-d

@jlewi
Copy link
Contributor Author

jlewi commented Dec 31, 2019

Disable istio injection in this namespace

kubectl label namespace kaniko istio-injection=disabled --overwrite=true
  • I used the GKE UI to bump the node auto-provisioning settings; increased the number of CPUs to 48.

    • otherwise it couldn't provision new nodes to schedule my kaniko builds.
  • Per [GCP] Node Auto provisioner pool is missing VM service account and oauth scopes kubeflow#4259 we need to update the node auto-provisioner node scope

    gcloud --project=issue-label-bot-dev beta container clusters update issue-label-bot --enable-autoprovisioning --autoprovisioning-scopes=https://www.googleapis.com/auth/logging.write,https://www.googleapis.com/auth/monitoring,https://www.googleapis.com/auth/devstorage.read_only
    

@jlewi
Copy link
Contributor Author

jlewi commented Dec 31, 2019

  • skaffold's kaniko jobs won't work with workload identity so we need to copy the GCP secret to the kaniko namespace ( see kaniko builds should work with GKE workload identity GoogleContainerTools/skaffold#3468)

    NAMESPACE=kaniko
    SOURCE=kubeflow
    NAME=user-gcp-sa
    SECRET=$(kubectl -n ${SOURCE} get secrets ${NAME} -o jsonpath="{.data.${NAME}\.json}" | base64 -d)
    kubectl create -n ${NAMESPACE} secret generic ${NAME} --from-literal="${NAME}.json=${SECRET}"
    

@jlewi
Copy link
Contributor Author

jlewi commented Dec 31, 2019

  • Create a Kubeflow profile corresponding to namespace label-bot-dev

  • Load the github-app secret

    python3 scripts/create_secrets.py create-dev
    
  • Disable istio sidecar injection because istio seems to be interferring with the services

    kubectl label namespace label-bot-dev istio-injection=disabled --overwrite=true
    
    • The embedding service was failing its readiness probe; it was returning 503s/

@jlewi
Copy link
Contributor Author

jlewi commented Dec 31, 2019

Grant pubsub permissions

PROJECT=issue-label-bot-dev
ACCOUNT=issue-label-bot-user@issue-label-bot-dev.iam.gserviceaccount.com
gcloud projects add-iam-policy-binding ${PROJECT} --role roles/pubsub.editor  --member "serviceAccount:${ACCOUNT}"

@kf-label-bot-dev
Copy link

Issue Label Bot is not confident enough to auto-label this issue.
See dashboard for more details.

@jlewi
Copy link
Contributor Author

jlewi commented Jan 3, 2020

Deployed to prod; based on kubeflow/kubeflow#4618 looks like its working.

jlewi pushed a commit to jlewi/code-intelligence that referenced this issue Jan 4, 2020
* Check in the Kubeflow configuration files for the cluster we are using.

Related to kubeflow#70 - ensemble predictions.
jlewi pushed a commit to jlewi/code-intelligence that referenced this issue Jan 4, 2020
* This PR provides a kustomize package to deploy the label microservice

* Also add a skaffolding config for the Label_Microservice

* Remove the old YAML deployment files for the Label Microservice.

* Edit the worker Dockerfile

  * Use TensorFlow 1.15.0 rather than using the "latest" image
  * We can also use a regular TensorFlow image and not a GPU version
    since this is just for inference and so we shouldn't need GPUs

  * Create a new requirements.worker.txt to only include the libraries that
    are needed in the worker. This should be much smaller than the uber
    set of python libraries (e.g. we don't need Jupyter, fairing, etc...)

  * Create requirements.universal_model.txt to contain some of the required
    python dependencies for the universal model.

    * Universal model is using ktext and some other libraries.

* Add a prod overlay for the issue_embedding service.

* create_secrets.py is a helper script for creating the required secrets
  in the clusters based on files in GCS.

Related to kubeflow#70 ensemble models.
k8s-ci-robot pushed a commit that referenced this issue Jan 4, 2020
* This PR provides a kustomize package to deploy the label microservice

* Also add a skaffolding config for the Label_Microservice

* Remove the old YAML deployment files for the Label Microservice.

* Edit the worker Dockerfile

  * Use TensorFlow 1.15.0 rather than using the "latest" image
  * We can also use a regular TensorFlow image and not a GPU version
    since this is just for inference and so we shouldn't need GPUs

  * Create a new requirements.worker.txt to only include the libraries that
    are needed in the worker. This should be much smaller than the uber
    set of python libraries (e.g. we don't need Jupyter, fairing, etc...)

  * Create requirements.universal_model.txt to contain some of the required
    python dependencies for the universal model.

    * Universal model is using ktext and some other libraries.

* Add a prod overlay for the issue_embedding service.

* create_secrets.py is a helper script for creating the required secrets
  in the clusters based on files in GCS.

Related to #70 ensemble models.
k8s-ci-robot pushed a commit that referenced this issue Jan 7, 2020
* Check in the Kubeflow config for the latest cluster.

* Check in the Kubeflow configuration files for the cluster we are using.

Related to #70 - ensemble predictions.

* Remove OWNERs file.
jlewi pushed a commit to jlewi/Issue-Label-Bot that referenced this issue Jan 17, 2020
* machine-learning-apps#57 is tracking setting up new staging and prod environments

  * This PR sets up a new staging (or dev environment)
  * We create a kustomize manifest for deploying the front end into that
    namespace
  * The staging environment is configured to use the dev instance of the
    issue label bot backend microservice (i.e the pubsub workers)
  * I created some python scripts to make it easier to setup the secrets.
  * The motivation for doing this was to test the changes to the front end

* Front end now forwards all issues for the kubeflow org to the backend

  * This is needed because we want to use multiple models for all Kubeflow
    repos kubeflow/code-intelligence#70

  * The backend should also be configured with logging to measure the impact
    of the predictions.

kubeflow/code-intelligence#104 is an a test issue showing that the bot is
working.

* Fix how keys are handled

  * For GOOGLE_APPLICATION_CREDENTIALS; depend on that environment variable
    being set and pointing to the file containing the private key;
    don't get the private key from an environment variable and then write it
    to a file.

* For the GitHub App private key; use an environment variable to point to
  the file containing the PEM key.

* Create a script to create the secrets.

* Flask app is running in dev namespace

  * create_secrets.py creates secrets needed for dev instance
jlewi pushed a commit to jlewi/code-intelligence that referenced this issue Jan 18, 2020
… kind.

* To combine multiple models we define a base class IssueLabelModel
  that defines a common interface for all the models.

  * This way different models can just implement that interface and we
    can easily combine the results.

* UniveralKindLabelModel will be used for the generic model to predict
  label kind that is trained on all repositories.

  * The UniversalKindLabelModel class is based on the IssueLabeler code
    https://github.com/machine-learning-apps/Issue-Label-Bot/blob/536e8bf4928b03d522dd021c0464587747e90a87/flask_app/utils.py#L67

* kubeflow#70 Combine multiple models
jlewi added a commit to jlewi/code-intelligence that referenced this issue Jan 18, 2020
* Attach extra metadata information in the log messages.

* Related to kubeflow#70 - use ensemble models
jlewi pushed a commit to jlewi/Issue-Label-Bot that referenced this issue Jan 18, 2020
jlewi pushed a commit to jlewi/Issue-Label-Bot that referenced this issue Jan 18, 2020
* kubeflow/code-intelligence#70

* update create_secrets for prod.

* Delete old K8s manifests its now replaced by kustomize packages.
jlewi pushed a commit to jlewi/Issue-Label-Bot that referenced this issue Jan 18, 2020
* kubeflow/code-intelligence#70

* update create_secrets for prod.

* Delete old K8s manifests its now replaced by kustomize packages.
jlewi pushed a commit to jlewi/Issue-Label-Bot that referenced this issue Jan 18, 2020
* kubeflow/code-intelligence#70

* update create_secrets for prod.

* Delete old K8s manifests its now replaced by kustomize packages.
k8s-ci-robot pushed a commit that referenced this issue Jan 19, 2020
…hen exceptions occur (#77)

* Fix logging in the new worker images.

* Attach extra metadata information in the log messages.

* Related to #70 - use ensemble models

* Fix logging.

* Update prod image.

* Fix logging.

* Update the image.
@jlewi jlewi removed the feature label Mar 20, 2020
@jlewi
Copy link
Contributor Author

jlewi commented Apr 6, 2020

Here's an initial report of how we are doing:
https://storage.cloud.google.com/issue-label-bot-dev_reports/metrics_2020_0406.html

@jlewi jlewi closed this as completed Apr 6, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

1 participant