Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

There has to be a way to automate this #18

Open
frigginglorious opened this issue Mar 1, 2016 · 50 comments
Open

There has to be a way to automate this #18

frigginglorious opened this issue Mar 1, 2016 · 50 comments
Labels
discussion Discussion relevant to the repo help wanted Extra attention is needed pinned status: in progress

Comments

@frigginglorious
Copy link
Collaborator

For github in particular, It probably could be automated from an API.

Or has anyone tried to add a shell script that parses pages like the contributor page (eg https://github.com/kentcdodds/all-contributors/graphs/contributors) that updates and replaces a contributor table at the bottom of README.md?

Maybe not. Doing it that way would be hardly foolproof, and difficult to organize.

@kentcdodds
Copy link
Collaborator

The challenge with fully automating it is that you'd have trouble automating when someone posts a blogpost, gives a talk, is constructively reviewing a PR, or answering questions on Stack Overflow (just to name a few examples, not to mention the difficulty with telling the difference between a code contribution and a design/test contribution).

However, I think that we can make it easier to maintain this list and @jfmengels is working on that 👍

@chrissimpkins
Copy link
Collaborator

While I really hesitate to propose additional repository meta files, a JSON or YAML file spec could help here and tools could be built around it.

Ugh, did I just say that...

@kentcdodds
Copy link
Collaborator

I agree that a config spec would be nice. Here's a start maybe: all-contributors/cli#1

@chrissimpkins
Copy link
Collaborator

@kentcdodds @jfmengels looks like it is coming together already!

@jfmengels
Copy link
Collaborator

Trying to :)

@glasnt
Copy link

glasnt commented Mar 2, 2016

I think I might be able to help here :)

I've made a tool specifically for automating the collecting of contributors from the GitHub API, called octohatrack (article about the app).

The algorithm is fairly straight forward, but it takes a while to parse for larger repos.

I've also little apps that show how to get at least some of the contributors for Stackoverflow tags and Wikipedia posts under the LABHR, if that helps :)

The visualisation method that @leesdolphin and I have been working on at https://labhr.github.io/js-hatrack/ doesn't have as much visual differentiation as this version, but works on a binary code/non-coding method.

@kentcdodds
Copy link
Collaborator

Very cool @glasnt! Thanks for chiming in!

I just ran it on my more popular repo angular-formly and it came up with hundreds of contributors. Super cool!

I think the real challenge with a fully automated solution is as I stated earlier:

The challenge with fully automating it is that you'd have trouble automating when someone posts a blogpost, gives a talk, is constructively reviewing a PR, or answering questions on Stack Overflow (just to name a few examples, not to mention the difficulty with telling the difference between a code contribution and a design/test contribution).

I'm just not sure that we can go 100% automated. I think the best is to develop a tool as @jfmengels is working on to make it easier to generate the markdown, and then encourage contributors to add themselves (and their contributions) to the list.

@glasnt
Copy link

glasnt commented Mar 2, 2016

@kentcdodds

So that's always going to be a problem. The major hubs can be automated, but it's not going to pick up everyone.

You can have 'rss feed planets' and such, but there's no a lot that beats the maintainers actively attributing help.

The future work I have planned for hatrack is to automate the github hubs, and then have a manual addition process, specifically for acknowledgements outside of GitHub. I think having a CONTRIBUTORS page that feeds into that might be super useful, and your categorisation stuff on top makes it extra fancy :)

@chrissimpkins
Copy link
Collaborator

@glasnt Very cool project Katie. It seems that you've already given a good deal of thought to something that (to my knowledge) we haven't resolved which is how to bridge high contributor volume, mature projects to this specification. This will prove to be a challenge.

Anyone have thoughts about this? From what I understand about @jfmengels's tool, it will be perfect for moderated, semi-automated list generation for low contributor volume projects at any stage of the project history, and for all projects from early stages in the project history where you have the opportunity to build the list prospectively. How do more mature, higher volume projects that already have numerous previous contributors reach into their history to support the spec?

@kentcdodds kentcdodds added the help wanted Extra attention is needed label Mar 2, 2016
@jfmengels
Copy link
Collaborator

@glasnt Cool project!
I'm working on all-contributors/cli#1 (soon in the shelves), so any tool that can fetch contributor data and write it in the configuration file will work with my tool. So if we build tools that find contributors well (I don't know how effective @glasnt's project currently is at that), I think we can build a nice toolchain.
Please file issues on my repo if you think of cool/useful things :)

@burodepeper
Copy link

Hi there, just wanted to say hi! ; ) (hey Chris!)

Actually, I was looking into/for something like this, so yay! I have to look deeper in to what has already been said and done, but thus far I find the suggested things rather (visually) verbose, though I recognize that that is fully my own opinion.

I'm curious what you all think about the order in which contributors appear. I might have skimmed over some previous remarks, but automation doesn't really allow for the subtle nuances of "intrinsic value added to a project". Even more so when it comes to sources that are non-public or non-trivial; random conversations via chat/email, social-media attention, perhaps even a seemingly unrelated thing a friend mentions during a bar-crawl. It's a little off-topic, but what are your views on adding those people in the first place?

@kentcdodds
Copy link
Collaborator

@burodepeper, thanks for your interest in the spec! I think that as you read up on the spec and the comments here your feelings are shared :-) What you describe is basically the reason this spec was created.

@mbad0la
Copy link
Contributor

mbad0la commented Mar 29, 2016

@kentcdodds @frigginglorious @chrissimpkins @jfmengels , please have a look at this atom package i'm working on. This helps to automate the all-contributors-spec inside atom text editor.

@kentcdodds
Copy link
Collaborator

Very cool! Do you wanna add it to the README?

@mbad0la
Copy link
Contributor

mbad0la commented Mar 30, 2016

Sure! I'll send a PR.

@boneskull
Copy link

I'm curious what you all think about the order in which contributors appear.

alphabetical 😄

@kentcdodds
Copy link
Collaborator

Actually, on the JavaScript Air website, I recently had (almost) all guest/panelist images appear in random order each time the site is generated. We could do the same for this project. Randomize the contributors each time the table is generated. This would obviously only be reasonable with the CLI.

@mbad0la
Copy link
Contributor

mbad0la commented Jul 4, 2016

@kentcdodds @boneskull , having a randomised contributors' ordering seems like a good idea!
Btw, just a thought. Will it make sense to keep the repository maintainers out of the randomisation process? Let's always assign them the initial blocks in our table?

@boneskull
Copy link

So, I'm interested in getting this to work for Mocha. I've explored writing a tool to get at this type of dataset. It works similarly to octohatrack--you make a bunch of API calls and aggregate things.

But that's really bending the GitHub API in a direction it doesn't want to go. Sure, you can grab all this data at once and throw it in a cache. How can you efficiently process new data that comes in? You can't really lean on the Events API because of its limitations. You basically have to pull down all of the issues, PRs, wiki pages and commits again, then check if anything is newer than last time you ran the pulls.

It took me three tries--running it once, waiting 5 minutes, running again, waiting 25 minutes, then running a third time--using octohatrack to pull down all of the Mocha data, because I exceeded my request limit with my OAuth token. Granted, octohatrack could be optimized in a couple places, but this is not a fast or cheap operation any way you look at it.

Having gone through this song-and-dance with the API, I'm thinking the only solution that scales properly is going to be hook-based. You must run a service, and when something happens, let GitHub tell you. You will still have to do the initial pull of data.

Of course, that only covers GitHub! We're not talking about blog posts, webcasts, SO answers, etc. You may be able to automate some of it, but you're not necessarily going to have access to hooks on any given service--especially if your project doesn't own the user account. Even if you do, you'd want throw everything in a pile for manual review. Some sort of Bayesian classifier which could assist with the hooks or periodic searches or whathaveyou. Oh, and then there's BitBucket.

I feel like the spec, as written, simply won't scale to large projects; it's too ambitious. For large projects, this must be automated. If it must be automated, then it has to start with machines. The questions I'd ask then are:

  • What public APIs can we use?
  • Where can we use web hooks? If our service is event-based, we'll scale better.
  • Which types of contributions are too "fuzzy" to easily automate? These should wait until after we cover the "easy" types.

FWIW, there is demand for this type of data amongst projects. It's important to make it visible, not just for the OSS community, but also for companies who may sponsor projects, or want more insight into their own (and their employees' activities). I don't have a lot of bandwidth, but I'm more than happy to think further about how to build out a service that accomplishes this.

@boneskull
Copy link

boneskull commented Jul 4, 2016

I'd like to clarify this point: if the project doesn't own the data, it's going to be tough to get. If Joe Contributor writes a blog post about your project, how will you know?

UPDATE: It still isn't very clear. But I hope you understand what I'm trying to say. Understanding a user wrote a blog post about your project is a very different thing from a computer understanding the same. Maybe Palantir would be interested in lending us some tech. 😝

@glasnt
Copy link

glasnt commented Jul 4, 2016

Hey @boneskull, I have a few things that might help you here:

  • the ability to add 'custom' contributors to octohatrack that I mentioned earlier up in this thread has been implemented, which allows for the use case of 'Someone has blogged about this'
  • There are a number of tools already in the LABHR suite for wikipedia and stackoverflow that leverage existing aggregations
  • There have been a number of changes in GitHub for user-profile contribution attribution, so there may or may not be more project-profile contribution attribution later. Specifically, adding comment-level contributions. I highly doubt this will ever extend natively to contributions outside GitHub
  • While webhooks might work for events from a certain point, backfilling will also be a problem
  • As much as we can automate the internet-layer stuff, there's always the meatsphere we can't automate, so there always needs to be a manual addition mechanism for a comprehensive list of contributors.

@boneskull
Copy link

boneskull commented Jul 4, 2016

@glasnt Thanks.

I understand the "custom" contributor functionality, but this is not easily automated, which is my point.

There have been a number of changes in GitHub for user-profile contribution attribution, so there may or may not be more project-profile contribution attribution later. Specifically, adding comment-level contributions. I highly doubt this will ever extend natively to contributions outside GitHub

I'm not sure I follow what this is about.

While webhooks might work for events from a certain point, backfilling will also be a problem

Yes, the initial pull needs to happen, which I mentioned above. But thereafter, it's expensive to "poll".

As much as we can automate the internet-layer stuff, there's always the meatsphere we can't automate, so there always needs to be a manual addition mechanism for a comprehensive list of contributors.

I suppose what I'm trying to ask then w/r/t the specification is "should we have one or more contribution types for which data cannot be automatically gathered"?

@glasnt
Copy link

glasnt commented Jul 4, 2016

I'm not sure I follow what this is about.

There's been a bunch of changes recently in the GitHub interface to allow for actual customisation of user profiles:

As far as I'm aware, there's no changes at this level for project profiles, to date.


"should we have one or more contribution types that for which data cannot be automatically gathered"?

Within the scope of all-contributors, I can't comment.

@boneskull
Copy link

There's been a bunch of changes recently in the GitHub interface to allow for actual customisation of user profiles:

I was under the impression that this was just a Facebook-like feature where you could present yourself to others the way you want to be seen. That's how I use it anyway. 😝 Did they alter the API for this?

So I think what you're saying is that GitHub may do some of this work for us in the future, based on the changes they made to the user profiles. But that's speculation, because GitHub doesn't tend to give advance notice of new features, and there's certainly not much of a dialogue going on. Which is 👎.

"should we have one or more contribution types that for which data cannot be automatically gathered"?

Within the scope of all-contributors, I can't comment.

I meant this as a general question to anyone interested.

@kentcdodds
Copy link
Collaborator

I think that the qualitative nature of this spec prevents it from being 100% automated. However, I think that adding people to the list of contributors is fairly simple. What I've done so far is ask that someone add themselves to the list. I have it as part of my CONTRIBUTING.md. It's worked fairly well so far. Of course, I don't have a huge project like Mocha. But I think that if people see that they can be recognized for writing a blogpost, they'll be eager to add themselves (despite the required effort).

And honestly, they don't even have to pull down the repo to add themselves with the script. One look at .all-contributorsrc and they'll see what's going on (though you may need to document how people can get the URL for their profile picture). Then people can simply add themselves in a PR from github.com

@patcon
Copy link
Contributor

patcon commented Nov 16, 2017

Oh hey! This other proejct appeared on my radar c/o @jywarren @ebarry, and might be relevant (or worth collaborating with): https://github.com/mntnr/name-your-contributors

cc: @RichardLitt (related: mntnr/name-your-contributors#45) @

@kentcdodds
Copy link
Collaborator

That's awesome @patcon! I'll bet that we could use that for the bot (#58) if anyone ever gets around to it!

@RichardLitt
Copy link

RichardLitt commented Nov 16, 2017

I've been meaning to post here for a while - I've been waiting for time to look over this closer. name-your-contributors could definitely help. It was almost tailor made for this purpose.

Happy to collaborate and work together! What would be the easiest way to figure out how?

@Berkmann18 Berkmann18 added the status: available Available for grab label Jan 25, 2019
@RichardLitt
Copy link

@Berkmann18 The bot probably covers this. I haven't had the time. :(

@Berkmann18
Copy link
Member

@RichardLitt It doesn't as it's not auto fetching from the repo, it does automate the config/README change process but it will still require a request for each contributors.

jakebolam pushed a commit that referenced this issue Feb 4, 2019
* docs: update README.md

* docs: update .all-contributorsrc
jakebolam added a commit that referenced this issue Feb 4, 2019
* Initial commit

* Update README.md

* first commit

* docs: add ci badge

* docs: add license

* infra: use nvm locally

* ci: correct deploy key

* docs: all contributors

* docs: badge space

* ci: fix cache key

* docs: contributing

* ci: fix working dir

* ci: website build path

* docs: arrange badges

* docs: all contributors specification

* docs: badge not flat square

* docs: switch to all-contributors org (#1)

* docs: update all-contributors link

* docs: switch org

* docs: chat on slack

* docs: use google analytics

* feat: add branding (#3)

* docs: update all-contributors link

* feat: add branding

* docs: add tbenning as a contributor (#4)

* docs: update README.md

* docs: update .all-contributorsrc

* docs: say hello to @all-contributors bot

* docs: links & grammar

* docs: sync with master repo (drops atom plugin recommendation)

* docs: add Greenkeeper badge 🌴 (#5)

* docs(readme): add Greenkeeper badge

* Update README.md

* docs(overview): added new maintenance category (#6)

This goes hand in hand with
[all-contributors-cli#142](all-contributors/cli#142).

* docs: document bot and cli tools (#8)

* docs: update all-contributors link

* docs: wip

* docs: tweaks

* new homepage

* docs: wip

* docs: first pass

* docs: tweaks

* docs: fix some grammar (#13)

* docs: grammar tweaks

* docs: use github star button

* docs: the problem section for CLI

* docs: cli docs

* docs: wording on motivation

* docs: description wrong

* docs: enable search

* docs: update contributing with search

* chore: added commitizen (#9)

* chore(package): added commitizen

* docs(readme): added the cz badge

* chore(yarn): updated yarn.lock

* chore: removed all-contributors-cli

* feat: add translations (#14)

* translation: first pass

* translation: first pass

* brew

* docs: add translations

* infra: write translations in ci

* infra: drop ci for branches

* valid

* docs: add Jongjineee as a contributor (#18)

* docs: update README.md

* docs: update .all-contributorsrc

* docs: add robertlluberes as a contributor (#19)

* docs: update README.md

* docs: update .all-contributorsrc

* docs(emoji-key): added missing `business` and `content` (#20)

* docs(emoji-key): added missing `business` and `content`

I added the missing types and removed the uneeded `npx` call in the `commit` NPM script.

* docs(emoji-key): removed 'N/I' line

* docs: switch to Portuguese, Brazilian

* docs: switch to Portuguese, Brazilian

* docs: enable chinese simplified zh-CN

* docs: enable chinese simplified zh-CN

* docs: add Berkmann18 as a contributor (#21)

* docs: update README.md

* docs: update .all-contributorsrc

* docs: update README.md

* docs: update .all-contributorsrc

* docs: add MarsXue as a contributor (#22)

* docs: update README.md

* docs: update .all-contributorsrc

* docs: update README.md

* docs: update .all-contributorsrc

* docs: add MatheusRC as a contributor (#25)

* docs: update README.md

* docs: update .all-contributorsrc

* Revert "docs: add MatheusRC as a contributor (#25)" (#26)

This reverts commit b20eae0.

* docs: add MatheusRV as a contributor (#27)

* docs: update README.md

* docs: update .all-contributorsrc

* docs: update README.md

* docs: update .all-contributorsrc

* doc: doc fix (#24)

* docs(emoji-key): added missing `business` and `content`

I added the missing types and removed the uneeded `npx` call in the `commit` NPM script.

* docs: improvements and fixes

Made some bits more comprehensible and fixed some typos

* infra: fix ci

* feat: support translations, drop example files

* docs(cli-usage): use inline code to specify the command "generate" (#28)

* docs: add MarsXue as a contributor (#29)

* docs: update README.md

* docs: update .all-contributorsrc

* docs: add Berkmann18 as a contributor (#31)

* docs: update README.md

* docs: update .all-contributorsrc

* docs: add greenkeeper as a contributor (#32)

* docs: update README.md

* docs: update .all-contributorsrc

* docs: add allcontributors as a contributor (#33)

* docs: update README.md

* docs: update .all-contributorsrc

* docs: add root pages for languages

* docs: remove greenkeeper (#36)

* docs: add Greenkeeper[bot] as a contributor (#37)

* docs: update README.md

* docs: update .all-contributorsrc

* feat: new homepage (#15)

* Added HTML structure for the Index page.

* added base styles, css variables, and a grid. Modified some of the HTML.

* changed a few css rules

* added normalize, continued styling

* added image files, continued styling, modified some of the copy for how to

* fixed the wavey background css

* started adding media queries fixing styling issues

* More css tweaks and adjustments

* couple more styling tweaks, making it responsive

* fixed some flexbox issues. made it more responsive

* feat: add github badge button

* styled the added the github stars widget to live in the nav and float in mobile

* added the icons and styled them for the quick links

* added logos and some styling, might need a last pass on the styles

* changed some copy on buttons, made links open in new tab

* fixing last responsive details on user logos

* wip

* feat: missing files + open graph data

* fix: switch styles over to main.css

* fix: site name

* fixed the issues that the docs css was creating, should be good to go on mobile responsiveness (#43)

docs :fixing CSS

* feat: add 404 page (#44)

* fixed the issues that the docs css was creating, should be good to go on mobile responsiveness

* created a 404 page

* added in / for 404 page

* docs: fix link in 404 page (#45)

* docs: add maryampaz as a contributor (#46)

* docs: update README.md

* docs: update .all-contributorsrc

* feat: add analytics tracking to root pages, and 404s (#47)

* docs: hawaii banner for conference (#50)

* feat: link to jsconfhi

* infra: add 404 check for CI (#52)

* infra: add 404 check for CI

* ci

* background

* sleep less

* check 404s only after release

* infra: smart sleep

* infra: smart sleep

* formik 404

* docs: locallized badge

* docs: switch badge around

* docs: link to build workflow

* docs: few fixes and tweaks (#54)

* aloha, remove use of h1

* docs: move over docs, style footer

* docs: typo fixes (#53)

* docs(emoji-key): added missing `business` and `content`

I added the missing types and removed the uneeded `npx` call in the `commit` NPM script.

* docs: improvements and fixes

Made some bits more comprehensible and fixed some typos

* docs(repository-maintainers): fixed minor typo

* docs(bot): fixed a typo

Fixed the typo in the _overview_ page to the _usage_ one

* docs: fix readme

* website uses allcontributrs.org

* build

* lock file and checkout
@RichardLitt
Copy link

That is less than ideal. I think we can change that. Let's keep working in mntnr/name-your-contributors#45?

@Berkmann18
Copy link
Member

@RichardLitt Yeah, ideally we could get the auto-fetching on the CLI which would then help the bot (which relies on the CLI) to then create a PR with all the existing contributors.

@mrchief
Copy link

mrchief commented May 10, 2019

This sounds similar to all-contributors/cli#117. Is this being worked on? I was planning to pick up on 117 but don't want to duplicate efforts.

@Berkmann18
Copy link
Member

@mrchief Yes it is and we're both working on it (cf. my comments on your draft PR).

@akhilmhdh
Copy link

Hey guys. I came across this problem recently so I thought of building an easier one. So here is GitHub action made by me to automate contributors list in readme.
(https://github.com/marketplace/actions/contribute-list)

@Berkmann18
Copy link
Member

@akhilmhdh What does it do that the bot/cli doesn't do?

@akhilmhdh
Copy link

akhilmhdh commented Apr 22, 2020

@Berkmann18 hey no offence the bot is really amazing. I build the action it if someone wants a more easier way to automate the readme in a more simpler way, in which nothing has to be done later. Contributors list gets updated automatically as contribution increase. And it doesn't have as feature rich as that one.

@Berkmann18
Copy link
Member

@akhilmhdh That's nice but it seems to lack an important thing that All Contributors have which makes the action currently unsuitable for those who want to follow the specification.
You're more than welcome to provide and contribute with your GHA knowledge tho 😀.

@PGijsbers
Copy link

Sorry if this isn't the place to ask, or if it's already answered elsewhere. We just adopted the bot over at openml-python! Like others before, we now face to task of adding each existing contributor to the list. As far as I can tell automatically detecting and crediting contributors to an existing project is still work in progress (even when limiting to e.g. code contributions)?

@Berkmann18
Copy link
Member

@PGijsbers Yup, have you looked at all-contributors/cli#196 (comment)?

@PGijsbers
Copy link

PGijsbers commented Nov 3, 2020

Thanks @Berkmann18! So the advice is to use that feature branch? (with its caveats)

@Berkmann18
Copy link
Member

@PGijsbers Yes, and contribute (if you can and want to).

@JoshuaKGoldberg
Copy link

In the meantime, I wrote:

Note that as has been mentioned, some contributions can't be auto-detected from Git history or even changelogs. Copying this warning from both tools' READMEs:

Warning This tool only sees contributions that can be detected from the last 500 events in GitHub's API. Don't forget to manually add in other forms of contributions!

@Berkmann18
Copy link
Member

@JoshuaKGoldberg Does that handle all of the contribution types you can get off a repo?

@JoshuaKGoldberg
Copy link

JoshuaKGoldberg commented May 19, 2023

@Berkmann18 please re-read my comment in full 😛 (typing something more useful now)

@JoshuaKGoldberg
Copy link

JoshuaKGoldberg commented May 19, 2023

OH I just reparsed your comment, sorry if I was being too snarky @Berkmann18. I thought you were asking if it handles all possible contribution types in general. But now I think you were actually asking if it handles all the types one could glean from a repo?

If so: it's somewhat comprehensive but I'm sure there are points I missed. It hasn't (yet?) had any logic added around file extensions mapping to contribution types, e.g. .md -> docs.

@Berkmann18
Copy link
Member

@JoshuaKGoldberg Yup, that's what I meant (just realised I didn't reply to that comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion Discussion relevant to the repo help wanted Extra attention is needed pinned status: in progress
Projects
None yet
Development

No branches or pull requests