New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Contributors graph #847

Open
ShalokShalom opened this Issue Feb 5, 2017 · 19 comments

Comments

7 participants
@ShalokShalom
Copy link

ShalokShalom commented Feb 5, 2017

Implement contributor graphs: https://github.com/go-gitea/gitea/graphs/contributors

screenshot_20170205_131515


Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.

@lunny lunny added this to the 1.x.x milestone Feb 5, 2017

@lunny lunny added the kind/feature label Feb 5, 2017

@bkcsoft

This comment has been minimized.

Copy link
Member

bkcsoft commented Feb 6, 2017

Is there a good graph-lib? In my opinion this can be rendered and cached server-side

@trahibidadido

This comment has been minimized.

Copy link

trahibidadido commented Dec 24, 2017

Any progress?

@ghost

This comment has been minimized.

Copy link

ghost commented Jun 2, 2018

would be nice to have 🎉

@linusg

This comment has been minimized.

Copy link

linusg commented Oct 2, 2018

I would like to start working on this feature, if no one is already on it (yeah @lafriks, I learned my lesson, +1 is not constructive 😉).

I would probably need some assistance every now and then, e.g. on how to decide about server or client side rendering, what charting library to use etc.
I also basically don't know any Go but have good frontend knowledge so it should work, and everything has a first time, also I wanted to dive into hacking on Gitea some while ago 😄

@linusg

This comment has been minimized.

Copy link

linusg commented Oct 2, 2018

Let's start by taking apart existing solutions to identify required data and possible data structure.

GitHub

API endpoint for contibutions data is https://github.com/<owner>/<repo>/graphs/contributors-data.

The returned JSON data is basically a list of objects (each representing one contributor) sorted least contributions first, most contributions last:

[
  { ... }, // User with least contributions
    ...
  { ... }, // User with second most contributions
  { ... }  // User with most contributions
]

The structure is roughly similar to the one documented here and looks like this:

{
  "author": {
    "id": 12345,
    "login": "octocat",
    "avatar": "https://avatars3.githubusercontent.com/u/12345?s=60&v=4",
    "path": "/octocat",
    "hovercard_url": "/hovercards?user_id=12345"
  },
  "total": 123,
  "weeks": [
    // First week in which the repo existed
    {
      "w": 1391904000,
      "a": 6898,
      "d": 77,
      "c": 10
    },
    // Second week in which the repo existed
    {
      "w": 1392508800,
      "a": 2437,
      "d": 439,
      "c": 6
    },
    ...
    // Current week
    {
      "w": 1538265600,
      "a": 0,
      "d": 0,
      "c": 0
    }
  ]
}

Each member of the "weeks" array is contructed has the following attributes:

  • w - Start of the week, given as a Unix timestamp.
  • a - Number of additions
  • d - Number of deletions
  • c - Number of commits

All that information is used to build these cards:

grafik

The big contributions graph obviously can be built by adding up the stats from each user of a week n (0 <= n <= weeks since the repo exists) and plotting the cumulative value for each week.

GitLab

GitLab CE is Open Source, so we have the relevant files:

API endpoint is https://gitlab.com/<owner>/<repo>/graphs/master?format=json.

The returned JSON data is much simpler:

[
  { ... }, // Latest commit
  { ... }, // Second latest commit
    ...
  { ... }, // First commit
]

Each member of the array represents one commit, sorted latest commit first, initial commit last. The structure looks as follows:

{
  "author_name": "Some User",
  "author_email": "some.user@example.com",
  "date": "2018-10-02"
}

If a user made multiple commits on the same day, there will simply be duplicate entries with the same user information and date, one for each commit.

The per-user tiles will contain less information than on GitHub, the plotting is done by taking the number of commits for one day, X-Axis is time, Y-Axis number of commits. That is done for both the whole repo (ignoring username) and each user (taking all commit entries for a specific user on a specific day).


In both cases rendering is done client side, which has the great advantage of being able to build dynamic charts with zooming.

@linusg

This comment has been minimized.

Copy link

linusg commented Oct 11, 2018

If it works with your general workflow here, I'd be fine with being assigned to this issue.


Some more thoughts on this. Constructive feedback is of course highly appreciated!

Placing the page link on the UI

image

That should work fine, no need to restructure anything for now.

Speaking of links, the page should probably live at https://git.example.com/<owner>/<repo>/contributors, that's how all the other links up there work.

Another idea, which I do not prefer, is putting the contributor graph(s) on the Activity page.

I did some DOM editing:

image

I chose octicon-organization as the icon, octicon-graph might work as well.

Now some quick CSS editing on the GitHub contributors chart for Gitea and merging the images:

image

That's a very rough idea of how it may look like, not taking individual per-user charts into account.

@ShalokShalom

This comment has been minimized.

Copy link
Author

ShalokShalom commented Oct 11, 2018

Looks wonderful ^-^

@lunny

This comment has been minimized.

Copy link
Member

lunny commented Oct 12, 2018

@linusg great! Go ahead!

@linusg

This comment has been minimized.

Copy link

linusg commented Oct 12, 2018

@lunny I'm kinda confused right now: Who is @Morlinest and what role will he play in this issue?

@Morlinest

This comment has been minimized.

Copy link
Member

Morlinest commented Oct 12, 2018

It is probably a mistake or maybe he has some secret plans with me :D

@lunny

This comment has been minimized.

Copy link
Member

lunny commented Oct 12, 2018

@linusg @Morlinest :( sorry. A mistake like what @Morlinest said. I want to assign this issue to @linusg but I found it cannot be assigned to non-maintainers and issue poster.

@linusg

This comment has been minimized.

Copy link

linusg commented Oct 12, 2018

Ok, thanks for clarification 😄

@Morlinest

This comment has been minimized.

Copy link
Member

Morlinest commented Oct 12, 2018

Oh, so I will have to do it now :D

@linusg

This comment has been minimized.

Copy link

linusg commented Jan 6, 2019

Short heads-up for those interested: I wanted to work on this during the Christmas holidays, but couldn't find much time. I've created the basic stuff (page, routing etc.) and plan to continue working on it!

@ShalokShalom

This comment has been minimized.

Copy link
Author

ShalokShalom commented Jan 6, 2019

Thanks a lot ^-^

@linusg

This comment has been minimized.

Copy link

linusg commented Jan 7, 2019

Ok folks, yet another update. I managed it to get to this state:

image


Click to expand:

Gitea vs GitHub (real-life example)

image

image

Dark

image


Details:

  • No data exposed over HTTP API, graphs are rendered to SVG (using https://github.com/wcharczuk/go-chart) on the server. This is really performant and keeps things simple.
  • Sorting by number of commits, additions and deletions
  • The UI is "slightly" based on GitHub 😄

Issues left:

  • Contributors, which are not in the gitea DB (e.g. because the repo was imported) will not show up.
  • Performance issues with bigger repositories. Edit: not performance, but HTTP 500 with the message "http: multiple response.WriteHeader calls" written to the console. Might be just me being a golang n00b.
  • Removing the AM/PM stuff from the X-Axis (can be easily done via custom formatter)
  • Fix the Y-Axis scale of user graphs, 1 commit should be half the height as 2 commits
  • Proper dark theme support (CSS for above was tweaked in the dev tools)

Possible enhancements:

  • Stats are for the master branch (hardcoded), this can be easily changed and exposed as an UI control

Ideas for changes and improvements welcome - I'm exited so far! I fear the upcoming code review though 😄

@linusg

This comment has been minimized.

Copy link

linusg commented Jan 8, 2019

Sooo... here we go! Now it's the time for some external input, so please see below the images.

image

(gitea repo taken from GitHub)

image

Let me explain:

  • Users who are not in the gitea users DB will be shown, but with no link to the profile, obv. Stats are calculated by Username (available is only "name" and "email" per each commit), that's why there's "Unknown", "Unknwon" and "无闻" vs only "Unknwon" in GitHub: The information, that this is all the same user is lost when cloning/importing the repo. I guess that's the best option available, thoughts?

  • GitHub compiles stats per week, I went with daily stats. Should this be changed?

    That's the reason why the Y-Axis on GitHub ends at ~150 [commits per week] and the Gitea one at 52 [commits per day]. Also it makes the chart on Gitea appear with more "spikes". (interpolation isn't available as well)

  • GitHub excludes merge commits from the stats, I didn't implement anything of this kind (and don't know how hard distinguishing one from a normal commit would be). Do we want this feature?

  • Do you wish a separate color for the per-user charts?

  • What else do you think can be improved?

Performance:

I fixed all of the issues noted in my last post, and I'm back to some performance issues. All stats from my dev machine:

The contributors page of the Gitea blog repo takes 1.1s to load, which is probably fine (Page: 1090ms Template: 7ms)

The one for the gitea main repo took 1min 14s and reports Page: 74443ms Template: 47ms. It has nine years of history and almost 7k commits, though.

Possible improvements: the gitea repo contributor page ends up with 602 user cards, I believe GitHub cuts off at 100. See https://github.com/go-gitea/gitea/graphs/contributors.

What do you think about that? As the charts become less useful with very few changes/commits because of the fixed Y-Axis, should we either only show top X contributors or maybe don't generate charts for more than e.g. 100 contributors?

image

Since the whole commit history will be traversed each time the page is visited, we can probably as well improve the situation by caching the stats. No clue if that makes sense and how implementation would look like.

I had to clear the ServiceWorker's cache for the changed CSS files to show up (normal cache refresh wouldn't work). What do I have to do here so it works OOTB?

More screenshots, click to expand

image

image

image

image

@lunny

This comment has been minimized.

Copy link
Member

lunny commented Jan 9, 2019

@linusg Great job!!! How about to let the work as a cronjob when the repository is big(i.e. over 1000 commits)? It can be run one or more days according the configuration. I think top 100 is enough, otherwise pagination is better.

@yasuokav

This comment has been minimized.

Copy link
Contributor

yasuokav commented Jan 9, 2019

@linusg

  • Stats are for the master branch (hardcoded), this can be easily changed and exposed as an UI control

Maybe you can use the default branch option instead of creating another option.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment