Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/pkgsite: show no. of downloads per week #41805

Open
Delta456 opened this issue Oct 6, 2020 · 7 comments
Open

x/pkgsite: show no. of downloads per week #41805

Delta456 opened this issue Oct 6, 2020 · 7 comments

Comments

@Delta456
Copy link

@Delta456 Delta456 commented Oct 6, 2020

NOTE: This proposal was opened to a separate issue as told by @julieqiu The feature was originally requested here.

Nowadays almost all package registries show no. of downloads per week of packages installed by the CLI. I would like to propose the same for pkg,go.dev. Currently users have to see via Insights/Git clones for seeing how many people have installed their package via go get. This feature would be really handle for the registry and the users so they will be able to keep track of it.

We can probably show this it here. PS: I am not a designer.

image

As we don't have make to accounts to submit packages to the registry so the downloads listing will be public i.e. visible to all users by default.

Example of NPM of this feature. (public by default)
For PyPL you would have to see it via their public dataset on Google Big Query as answered here.

@gopherbot gopherbot added this to the Unreleased milestone Oct 6, 2020
@jba jba removed this from the Unreleased milestone Oct 6, 2020
@jba jba added this to the pkgsite/unplanned milestone Oct 6, 2020
@jba
Copy link
Contributor

@jba jba commented Oct 6, 2020

cc @heschik @katiehockman @hyangah for proxy support.

@mvdan
Copy link
Member

@mvdan mvdan commented Oct 6, 2020

I'm not sure how useful this would be; proxy.golang.org is the default for Go setups, but it's designed to not be centralized. At best, it would be "download stats for this proxy", and not "download stats for all proxies".

Also, how do you intend to prevent "gaming" the stats? If I set up some sort of bot net to download a module thousands of times per day, do I suddenly make it to the "most downloaded" lists? Note that other proxy implementations like gocenter already have download stats, but I'm not sure what useful information you can actually gather from that.

@Delta456
Copy link
Author

@Delta456 Delta456 commented Oct 6, 2020

@mvdan Hey! Thanks for reaching out on this proposal. I don't have any experience on how it will work on the proxy side (I am really serious). But as an author of a package (PS: I am not advertising my package but just linking if anyone doesn't believes me), I would like to how much popular the package is. I feel this is must for package authors so that they can be motivated to add more features and fix bugs and optimize code for the users and for the users too as if the package is actually good (most judge by downloads) so they can use in their projects etc.

I have checked GoCenter but it seems to be broken and outdated with the progress of pkg.go.dev(I believe as the README.md of my package is broken and it was fixed for me recently #40203).

Most people will also rely on the official registry of GoLang IMHO.

@ingvarm-gr
Copy link

@ingvarm-gr ingvarm-gr commented Oct 8, 2020

I would say that either "number of packages/modules that has this as a (direct or indirect) dependency" is a better measure of popularity than "downloads".

Number of downloads is a proxy for "how frequently are things with this as a dependency built for the first time on a specific machine". Direct or indirect dependencies are a proxy for how many find your package/module useful. It will be a lower number, growing more slowly, but is probably a better indicator than just downloads.

Both will be skewed by a variety of factors. Only packages already indexed by pkg.go.dev will be accounted for in either a direct or indirect dependency count (this excludes a whole slew of possible packages). Downloads will be skewed by caches (on-disk module caches, Athens proxies in corporate environments, ...).

@MarvinJWendt
Copy link

@MarvinJWendt MarvinJWendt commented May 20, 2021

Hi, I find a download count quite useful. Of course, this can be easily manipulated, which is why I would advise against creating a ranking list. It's true that the number of times a module is used as a dependency indicates how popular a module is. But I personally would be more motivated to write modules, if I could see directly how much my module contributes to the community, or how much it is used.
With the dependency graphs, it is quite difficult to find out how relevant your module is, because you can't see directly how often the modules are used, which import your module. A module could be used in 100 different projects, but still have fewer downloads than a module that is used in an extremely large project. It is currently not obvious what kind of impact the own project has in the Go community.

For example, let's say that Project B is a low-level module, which is not used often, but it's used in popular modules, A for example.

Project Imported By Actual Downloads
A 1,000 1,000,000
B 10 1,400,000

As you can see, B has many downloads because it's used in A, but it's not obvious that it has an impact in the community, as we only see that it's used 10 times.

I think the whole thing also has a psychological effect. It gives the creators of such modules direct feedback that they are making a significant contribution, which increases the chances that they will feel more comfortable in the community and possibly have more fun writing modules.

Of course, such statistics are to be enjoyed with caution, because they can be easily manipulated and might not show every download (if another proxy is used for example), but this applies to almost all such statistics (GitHub Release download count, Docker pulls, NPM downloads, GitHub clone count, even GitHub stars can be bought, just to just name a few).

All in all, I think such statistics would make a great addition to motivate developers.

@mvdan what do you think about it?

@mvdan
Copy link
Member

@mvdan mvdan commented May 20, 2021

My opinion is largely unchanged - I don't think download stats should be prominently exposed. The moment they are, they'll start being misinterpreted and abused. Remember that, even when there isn't abuse, downloads will still be greatly skewed by caching, CIs set up without caching, indexing bots, etc.

I do think that some form of popularity stats could be useful, though, and I'm sure that the team have something planned for that. But they certainly cannot use download counts as a primary source.

@MarvinJWendt
Copy link

@MarvinJWendt MarvinJWendt commented May 20, 2021

Yes I totally agree that "download count" is not equal to popularity. If there is a popularity score planned for the future that would be better.

Just to throw an example in: I have a project that has 1,3k Stars on GitHub, and the GitHub insights tell me that it's cloned 100-300 times a day. pkg.go.dev tells me that it's imported by 26 projects. According to the GitHub traffic insights, I would guess that we have around 30k-50k total downloads.

In my case, I don't even want to know how "popular" my module is, I'd rather like to know if my effort is worth it (I wouldn't use my free time to work on it, if nobody would use it), and I think many people ask themselves the same question (a popularity score would also help).

Another idea that you might like more: In addition to "imported by" it could also show "deeply imported by", which would also increase if the project is used as a sub-module/indirect module (I don't know the exact term for it, but I think you know what I mean).

This would also solve my example from earlier:

Project Imported By Deeply Imported By
A 1,000 1,100
B 10 1,140

(B is imported by A, so Deeply Imported By of B includes Deeply Imported By of A)

I think this might fit better than a download count, and it would just extend the current calculation to include indirect imports, which are way more reliable and harder to manipulate.

After all, I don't really care about how exactly it's going to be displayed (download count, deep imported by, score, grades, etc.), I would just appreciate a better indication on how much impact a module has. I am sure that the Go team will find a good solution here. Do you think there will be a proposal or some other kind of public discussion? I would really like to follow such a thread.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
6 participants