Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a K8s app to deploy cncf/devstats for Kubeflow. #91

Merged
merged 2 commits into from
Apr 20, 2018

Conversation

jlewi
Copy link
Contributor

@jlewi jlewi commented Apr 18, 2018

  • Create a Docker image with the cncf/devstats binaries
  • Use a ConfigMap to store the project config
  • Add stateful sets for influxdb and postgre
  • Add a stateful set to deploy devstats container so that we can do
    kubectl exec to run commands.

/assign @gaocegege
/cc @lukaszgryglicki

Related to #34


This change is Reviewable

@k8s-ci-robot
Copy link

@jlewi: GitHub didn't allow me to request PR reviews from the following users: lukaszgryglicki.

Note that only kubeflow members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

  • Create a Docker image with the cncf/devstats binaries
  • Use a ConfigMap to store the project config
  • Add stateful sets for influxdb and postgre
  • Add a stateful set to deploy devstats container so that we can do
    kubectl exec to run commands.

/assign @gaocegege
/cc @lukaszgryglicki

Related to #34

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link

@lukaszgryglicki lukaszgryglicki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving, added some suggestions/comments.
Please note that I have no ability to run/test this.
So I've just reviewed the code carefully - nothing more.

git clone https://github.com/cncf/devstats.git devstats

RUN cd ${GOPATH}/src/devstats && \
git checkout v0.4.0 && \

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This release is old. I would go with master or let me create another v0.5.0 release tomorrow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


# Add /go/src/devstats to the path so that all the binaries will be on the path.
# This is needed by the devstats binary.
ENV PATH $PATH:/go/src/devstats

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All other lgtm for me, but I cannot actually test this and confirm that it works.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem; wasn't expecting you to. Thanks for the comments.

# this a ksonnet app and run locally on minikube.
run_postgre:
-docker rm -f postgres
docker run -d --name=postgres -p 5432:5432 -e POSTGRES_USER=gha_admin -e POSTGRES_PASSWORD=password -e POSTGRES_DB=gha postgres:10.1-alpine

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't test this, so cannot actually say if this os OK or not. Basically it looks OK for me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

### Create the config map with the projects file

kubectl create -n devstats configmap projects --from-file=./projects.yaml

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

projects.yaml taken from cncf/devstats repo (it contains 22 or 23 CNCF projects) or just a new projects.yaml containing only kubeflow config?
Basically this file must be customized per your needs.

EDIT: I see that you have created custom projects.yaml: good, lgtm.


```
Error(time=2018-04-18 18:39:01.274140856 +0000 UTC):
Error: 'pq: database "devstats" does not exist'

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

devstats database is a separate postgres database that only holds logs (gha_logs table).
Its setup is described in INSTALL_*.md files.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done; thank you.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW I'm going to update auto deploy script - to check if that DB exists. If not (it is shared for all projects) assume that this is a "from zero" setup - create database, admin, ro_user, team role etc.

value: gha
- name: PG_PASS
value: password

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably need to refer logs database here somehow.
You need two Postgres databases minmally:

  • Your project's database - can have any name
  • Logs database named devstats containing single table gha_logs, see INSTALL_*.md files.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have two databases both stored in a single instance of postgre.

- name: POSTGRES_PASSWORD
value: password
- name: POSTGRES_DB
value: gha

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably logs database also needed here (or maybe logs database will be in a different pod/yaml/deployment)?

# TODO(jlewi): Move the start date back in time once we are done doing some initial testing
start_date: 2018-04-16T00:00:00Z
# Can we get rid of join date since we aren't part of CNCF yet?
join_date: 2018-04-16T00:00:00Z

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can set it to null just like opencontainers project in CNCF repo:
See: https://github.com/cncf/devstats/blob/master/projects.yaml#L241

Copy link
Member

@gaocegege gaocegege left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM although I am not sure if it works.

Thanks :-)

go get gopkg.in/yaml.v2 && \
go get github.com/google/go-github/github && \
go get golang.org/x/oauth2 && \
go get github.com/mattn/go-sqlite3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, @lukaszgryglicki Could we add vendor in devstats side in the future thus we could avoid go get in the docker image.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure we can.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jlewi https://github.com/cncf/devstats/pull/99 devstas has vendor now and maybe we could not install so many deps in the docker image.

* This deploys postgres in a pod to store the data from github archive
  and influx db.

* There is a cron job to run devstats to regularly sync the data.

* There is also a K8s job that can be used to load data from a specific
  time range (useful for backfilling)

* I've verified that data can be loaded into postgres and queried.

* I don't know if any data is being dumped to influxdb.

* We create a Docker image with the cncf/devstats binaries following the cncf/devstats instructions for installing the binaries in ubuntu 17.

* Add stateful sets for influxdb and postgre
* Add a stateful set to deploy devstats container so that we can do
  kubectl exec to run commands.
@jlewi jlewi changed the title Create some K8s manifests to deploy cncf/devstats for Kubeflow. Create a K8s app to deploy cncf/devstats for Kubeflow. Apr 19, 2018
@jlewi
Copy link
Contributor Author

jlewi commented Apr 19, 2018

Thanks PTAL I made significant changes.

Copy link
Member

@gaocegege gaocegege left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM although I am not sure if it works.

# /etc/projects-volume will be volume mounted into the pod.
ln -sf /etc/projects-volume/projects.yaml /etc/gha2db/projects.yaml

# TODO(jlewi): Per the instructions for devstats we should increase the number of default connections for postgres
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should set the postgre-docker-entrypoint.sh as entrypoint.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docker image is used to run other things; like the CLI stateful set. Which is why I didn't want to set the entrypoint here and instead do it in the pod spec.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM


RUN mkdir -p /home/
COPY postgre-docker-entrypoint.sh /usr/local/bin/
RUN chmod a+x /usr/local/bin/postgre-docker-entrypoint.sh
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we set the permission in docker image? Could we set it outside and copy it into the image or just set bash /usr/local/bin/postgre-docker-entrypoint.sh as entrypoint?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think permissions are preserved when you copy the file. Usually I have to do chmod a+x to make scripts executable inside the container.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This depends on the number of CPU/cores reported by the go runtime. Our servers report 96 for example. And this is the number of threads that devstats spawn. Sometimes it is too much for postgres default setup. But when the system reports 8 or even 24 cores - you can skip this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

@jlewi
Copy link
Contributor Author

jlewi commented Apr 19, 2018

@gaocegege LGTM?

@gaocegege
Copy link
Member

/lgtm

@jlewi
Copy link
Contributor Author

jlewi commented Apr 20, 2018

/approve

@k8s-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: gaocegege, jlewi

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit ad939bc into kubeflow:master Apr 20, 2018
woop pushed a commit to woop/community that referenced this pull request Nov 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants