Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DevStats to track velocity for Kubeflow #34

Closed
jlewi opened this issue Mar 8, 2018 · 55 comments
Closed

DevStats to track velocity for Kubeflow #34

jlewi opened this issue Mar 8, 2018 · 55 comments

Comments

@jlewi
Copy link
Contributor

jlewi commented Mar 8, 2018

Kubernetes has a great set of dashboards for velocity related metrics
https://k8s.devstats.cncf.io/d/44/time-metrics?orgId=1&var-period=w&var-repogroup_name=Kubernetes&var-repogroup=kubernetes&var-apichange=All&var-size_name=All&var-size=all&var-full_name=Kubernetes

There are dashboards for things like
PR time to LGTM

approvers/# reviewers

https://k8s.devstats.cncf.io/d/38/reviewers?orgId=1

It would be great to get the same metrics for Kubeflow.

@jlewi
Copy link
Contributor Author

jlewi commented Mar 22, 2018

Here's the code behind that
https://github.com/cncf/devstats

@ScorpioCPH
Copy link
Member

@jlewi That's great!

@gaocegege
Copy link
Member

That is awesome!

@jlewi
Copy link
Contributor Author

jlewi commented Apr 18, 2018

@lukaszgryglicki I created a postgres database and I ran gha2db to load the data from my repo into it.

Is it possible to run a simple query e.g dump a time series of number of PRs per day? I tried using runq to run some of the sql files in util_sql/top_unknowns but I'm not sure what arguments to provide and I keep getting segfaults.

@lukaszgryglicki
Copy link

lukaszgryglicki commented Apr 18, 2018

First you need to run query in psql.
For example via:

  • sudo -u postgres psql dbname
    or
  • psql -h localhost -U ro_user dbname
    To count commits/day you can count push events
    See gha_events table.
    Select type = 'PushEvent' and group by created_at

util_sql/top_unknowns.sql contains "macros":

  • {{ago}}
  • {{lim}}

So to run it, you must provide replacements.
Correct command would be for example:

  • PG_PASS=your_password PG_DB=dbname ./runq util_sql/top_unknowns.sql {{ago}} '1 month' {{lim}} 10:
root@devstats:~/go/src/devstats# PG_PASS=<<redacted>> PG_DB=gha ./runq util_sql/top_unknowns.sql {{ago}} '1 month' {{lim}} 10
2018-04-18 06:54:38 /runq: /------------------+--------+---\
2018-04-18 06:54:38 /runq: |actor             |actor_id|cnt|
2018-04-18 06:54:38 /runq: +------------------+--------+---+
2018-04-18 06:54:38 /runq: |rdcastro          |241302  |216|
2018-04-18 06:54:38 /runq: |jonyhy96          |33408569|189|
2018-04-18 06:54:38 /runq: |k82               |18107181|172|
2018-04-18 06:54:38 /runq: |wking             |209920  |164|
2018-04-18 06:54:38 /runq: |theopenlab-ci[bot]|33714816|156|
2018-04-18 06:54:38 /runq: |ymqytw            |2589105 |151|
2018-04-18 06:54:38 /runq: |justaugustus      |567897  |141|
2018-04-18 06:54:38 /runq: |zhaoxpZTE         |15881573|140|
2018-04-18 06:54:38 /runq: |Steve53           |13792782|133|
2018-04-18 06:54:38 /runq: |grayluck          |4760200 |133|
2018-04-18 06:54:38 /runq: \------------------+--------+---/
2018-04-18 06:54:38 /runq: Rows: 10
2018-04-18 06:54:38 /runq: Time: 590.562158ms
root@devstats:~/go/src/devstats# 

@lukaszgryglicki
Copy link

lukaszgryglicki commented Apr 18, 2018

BTW: computed time series are in the influx database.
Postgres database contains all data that I can get from GitHub archives (GHA).

If you have a server then you can implement something very similar for Kubeflow.
Follow ADDING_NEW_PROJECT.md instructions for automatic deploys.
You can replace our projects.yaml with Your, containing only Kubeflow for instance and then just deploy it.

I can help with any problems on the way. And maybe (checking this) I just can set up everything for you on your own server - not sure yet.

@lukaszgryglicki
Copy link

So I have a green light to setup DevStats instance for Kubeflow (and to create good documentation during doing so).
All I need now is a root access to some server where I can set up everything.
After I set up everything you change the root password and/or remove my SSH access keys so I cannot use it anymore.

@jlewi

@jlewi
Copy link
Contributor Author

jlewi commented Apr 18, 2018

That's fantastic. If you send me the email associated with your google account; I can give you access to the GKE cluster and GCP project where we'd like to deploy this.

@lukaszgryglicki
Copy link

lukaszgryglicki commented Apr 18, 2018

lgryglicki@cncf.io
I was thinking about bare metal server.
I have no knowledge (yet) how to deploy stuff on the cluster.
You can talk with me on slack or skype: "lukaszgryglicki"

@jlewi
Copy link
Contributor Author

jlewi commented Apr 18, 2018

We used to run postgres on K8s for Airflow. This is how we did

  1. Instructions for creating a PD
  2. K8s Deployment to deploy postgres in a container backed by PD on the cluster.

@jlewi
Copy link
Contributor Author

jlewi commented Apr 18, 2018

Which slack channel? (We use kubeflow.slack.com)

@lukaszgryglicki
Copy link

lukaszgryglicki commented Apr 18, 2018

Kubernetes (kubernetes.slack.com) then #devstats - you can pm me.
Or cloud-native.slack.com then also pm me "lukaszgryglicki".

DevStats wasn't ported to K8s yet, I don't even know how to use gcloud.
It was only running on bare metal.
@jberkus probably started some work to port DevStats to K8s clusters here: cncf/devstats#72

I can also create account on kubeflow.slack.com but seems like I need some invitation?

@jlewi
Copy link
Contributor Author

jlewi commented Apr 18, 2018

So here's my guess of what we need to do

  1. Create PD's for postgres and influx DBs
  2. Create a Deployment for postgres
  3. Create a Deployment for influx DB
  4. Create a Docker image with Devstats code
  5. Create K8s jobs to run the devstats scripts
    • structure to setup postgres DB
    • ./grafana/influxdb_setup.sh to setup ingress db
    • gha2b to load the data into postgres
  6. A deployment for grafana
  7. Cron job to run sync regularly

@lukaszgryglicki
Copy link

lukaszgryglicki commented Apr 18, 2018

I can't probably help with that :-(

We already discussed this on DevStats repo and it was decided that I shouldn't work on porting DevStats to K8s yet.
@jberkus started some work about this, I can only help with bare metal deployment.

If you are interested in this, there are tiny Packet servers that will be able to handle that. Even the smallest one $0.40/hour should be enough for such a small deployment.

Unless @dankohn wants me to dive into K8s porting.

I think that porting to K8s will be a bit more complex than you described it.

@jlewi
Copy link
Contributor Author

jlewi commented Apr 18, 2018

No problem. Thanks for all the great work on devstats; looks fantastic.

@lukaszgryglicki
Copy link

For k8s you can ask @jberkus for bare metal - I can create the instance for you, for me helping with k8s version of devstats - I need a green light from @dankohn.

@jlewi
Copy link
Contributor Author

jlewi commented Apr 18, 2018

@lukaszgryglicki does the outline provided above make sense though? Was there anything obvious I missed.

Thanks for the offer. I think I'd prefer to deploy it on K8s even if that means setting it up ourselves. I think longer term that will be easier to manage and I don't think it will be that hard to convert to K8s.

@lukaszgryglicki
Copy link

lukaszgryglicki commented Apr 18, 2018

My knowledge about K8s deployments is quite limited by I can see the following componentes:

  • One grafana instance per project (N pods)
  • One postgres database per project + logs instance (N + 1 pods) (it will also have cron jobs for DB backups)
  • One influx instance per project (N pods)
  • One or N instances of devstats tools (why? we can either have one running all sync jobs one after another, just like the bare metal does now, or [better imho] one instance per each project) (N or 1 pod(s)) - this will include cron job imho
  • Current DevStats also has "webhook" tool that manages CI and CD (continuous integrations/deployment) - not sure if we want the same tool in k8s deployment, if so then it looks complex at first glance.

Obviously for single project (Kubeflow) N=1.

BTW: I like the idea of deploying to k8s. Would be great if you upstream your code. I can help with any questions.

@jlewi
Copy link
Contributor Author

jlewi commented Apr 18, 2018

Do we need to run some binary regularly to execute the queries and load the data into influxdb? Is this handled by the devstats binary

@lukaszgryglicki
Copy link

devstats binary calls all other binaries correctly, actually this is all that it does.
All other binaries are here to allow the user to execute parts of the workflow manually or in the special mode.
Most of the switches are described in context.go and all binaries are listed in Makefile.

@lukaszgryglicki
Copy link

Binaries that are called by devstats are:

  • structure
  • gha2db
  • db2influx
  • z2influx
  • gha2db_sync
  • annotations
  • idb_tags
  • get_repos
  • ghapi2db
    Some other binaries that can be used manually, but are not called by devstats:
  • runq
  • import_affs
  • idb_backup
  • webhook
  • merge_pdbs
  • idb_vars
  • replacer
  • pdb_vars
  • idb_tst
  • sqlitedb

@jlewi
Copy link
Contributor Author

jlewi commented Apr 18, 2018

Will devstats create the database's if they don't already exist? If not is there an option to automatically run the setup scripts that will create the DBs but only if they exist?

Can we just start devstats and leave it running in order to periodically get the latest data? Or do we need to invoke it from a cron job?

@lukaszgryglicki
Copy link

lukaszgryglicki commented Apr 18, 2018

  • Devstats will not create the database, initial setup is described in ADDING_NEW_PROJECT.md

  • If there are no databases at all, automatic deploy scripts will currently fail, they're added after we had - say 15-16 projects out of current 22.

  • I've worked today on automatic deploy on ARMv8 but this work is not yet ready.

  • So currently the situation is that if we already have some projects, we can add new ones totally automatically by using CD (continuous deploy) and "[deploy]" message in the commit on the production branch. The script executed is "devel/deploy_all.sh" - but again - it won't handle initial state when there is nothing yet. Sorry :-( this is a bare metal binary not set of k8s pods yet. Deploy happens on CI hook in cmd/webhook/webhook.go.

  • Devstats can be used to automatically sync everything but it needs to run every hour from the cron job, see examples and comments in crontab.entry file.

@lukaszgryglicki
Copy link

Actually after my last changes the devel/deploy_all.sh script have small changes to deal with no databases at all - but it was never tested yet.

@jlewi
Copy link
Contributor Author

jlewi commented Apr 19, 2018

@lukaszgryglicki @jberkus so I have devstats running and I'm definitely getting data into the postgress database. I'm not sure though if data is making into influx though. Any suggestion how to test?

I'm a little confused about how to setup grafana. I looked at the instructions here but its not clear to me how we configure GRAFANA to access the postgres and influx databases.

@jlewi
Copy link
Contributor Author

jlewi commented Apr 19, 2018

So it looks like one problem is that gha_repos is empty.

I ran the following query to update that table

update gha_repos set repo_group = 'kubeflow', alias= 'kubeflow' where org_login in ('kubeflow');

I'll file a bug to run it periodically.

@jlewi
Copy link
Contributor Author

jlewi commented Apr 19, 2018

I reran devstats and now influx db shows the following time series

repo_group_activity_d
repo_group_activity_d7
repo_group_activity_h24
repo_group_activity_m
repo_group_activity_q
repo_group_activity_y
repo_group_commits_d
repo_group_commits_d7
repo_group_commits_h24
repo_group_commits_m
repo_group_commits_q
repo_group_commits_y

@jlewi
Copy link
Contributor Author

jlewi commented Apr 19, 2018

Looks like a bunch of data is still missing

> SELECT kubeflow FROM "repo_group_activity_d"
name: repo_group_activity_d
time                 kubeflow
----                 --------
2018-04-19T00:00:00Z 169

Lets try rerunning shared/reinit.sh

Note: Running reinit.sh complains about missing ` 'open ./metrics/shared/idb_vars.yaml: no such file or directory'

I just created an empty file and that seemed to work.

@jlewi
Copy link
Contributor Author

jlewi commented Apr 19, 2018

So the timeseries now looks to have all the data.

> SELECT kubeflow FROM "repo_group_activity_d"
name: repo_group_activity_d
time                 kubeflow
----                 --------
2018-01-29T00:00:00Z 21
...
2018-04-19T00:00:00Z 169

@jlewi
Copy link
Contributor Author

jlewi commented Apr 19, 2018

Grafana is complaing

pq: relation "metric_table" does not exist

@jlewi
Copy link
Contributor Author

jlewi commented Apr 20, 2018

I manually edited the influxdb query in Grafana to be

SELECT kubeflow FROM "repo_group_activity_[[period]]" WHERE $timeFilter

That worked.

So the problem appears to be that Grafana isn't properly updating

jlewi added a commit to jlewi/community that referenced this issue Apr 20, 2018
* The dashboards aren't properly set up yet. We're having some problems getting
Grafana to handle repository groups correctly.

Related to kubeflow#34
@jlewi
Copy link
Contributor Author

jlewi commented Apr 20, 2018

So it looks like the problem was that the dashboards are using Grafana variables whose value is supposed to be populated from influxdb. But when I imported the dashboards I must set the data source incorrectly because it was trying to read the data from the postgre data which didn't work as a result I was getting the error "template variables could not be initialized".

I edited the dashboard via the UI and that fixed things.

Now I just need to figure out how to load all the json files defining the dashboards into Grafana.

@jlewi
Copy link
Contributor Author

jlewi commented Apr 20, 2018

@jberkus @lukaszgryglicki Is there a script I can use to create dashboards from all the json files in a directory? If the dashboards already exist I'd like them to be overwriten.

@jlewi
Copy link
Contributor Author

jlewi commented Apr 20, 2018

See grafana/grafana#10052

Looks like we can configure Grafana to load dashboards from a directory.

@jlewi
Copy link
Contributor Author

jlewi commented Apr 20, 2018

Found some helpful instructions about how to configure Grafana to load dashboards from files in conjuction with a config map.

@lukaszgryglicki
Copy link

In my case, I was copying grafana.db file from the test server to another server.
And then I could update dashboards using cmd/sqlitedb tool.
See scripts: in devel/*sqlite*
Unfortunately, sqlitedb tool cannot import new JSONs atm, it only supports updating.
I'll try to add such a feature.

Postgres variables are created by pdb_vars tool (see devel/pdb_vars_all.sh), Influx variables are created by idb_vars tool (see devel/idb_vars_all.sh), they both have yaml configuration in pdb.yaml/idb_vars.yaml.

jlewi added a commit to jlewi/community that referenced this issue Apr 20, 2018
* Dashboards are provided via a configmap.
* Grafana is configured via config maps to load dashboards from the directory.

* Some of the dashboards don't appear to be working correctly and/or showing
  data.

* The dashboards are based on the K8s dashboards and were copied from here:
https://github.com/cncf/devstats/tree/master/grafana/dashboards/kubernetes
    * We excluded the sig tables since we don't have any sigs.

* We added some shell scripts to convert them to use for Kubeflow.

Related to kubeflow#34
@jlewi
Copy link
Contributor Author

jlewi commented Apr 20, 2018

Thanks. I ended up configuring Grafana to load dashaboards from files. I store the dashboards in a config map so its super easy to update them (just ks apply).

/cc @jberkus

@lukaszgryglicki
Copy link

OK great.

k8s-ci-robot pushed a commit that referenced this issue Apr 26, 2018
* Deploy grafana to show the dashboards

* Dashboards are provided via a configmap.
* Grafana is configured via config maps to load dashboards from the directory.

* Some of the dashboards don't appear to be working correctly and/or showing
  data.

* The dashboards are based on the K8s dashboards and were copied from here:
https://github.com/cncf/devstats/tree/master/grafana/dashboards/kubernetes
    * We excluded the sig tables since we don't have any sigs.

* We added some shell scripts to convert them to use for Kubeflow.

Related to #34

* * Expose devstats dashboards publicly
   * Enable anonymous access
   * Make the admin password secure
   * Setup an ingress to allow http access

* Update command to print out password.
@ant31
Copy link

ant31 commented Nov 13, 2018

@jlewi have deployed devstat on k8s? is there any resources/source code to re-use of your deployment ?

@gaocegege
Copy link
Member

@ant31 I think you can find the artifacts here: https://github.com/kubeflow/community/tree/master/devstats

@ant31
Copy link

ant31 commented Nov 13, 2018

@gaocegege great, thanks !

@stale
Copy link

stale bot commented May 3, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the lifecycle/stale label May 3, 2020
@stale stale bot closed this as completed May 10, 2020
@issue-label-bot
Copy link

Issue-Label Bot is automatically applying the labels:

Label Probability
kind/feature 0.95

Please mark this comment with 👍 or 👎 to give our bot feedback!
Links: app homepage, dashboard and code for this bot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants