Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Measure quality on several organizations #17

Closed
3 tasks done
EgorBu opened this issue Jul 17, 2019 · 2 comments
Closed
3 tasks done

Measure quality on several organizations #17

EgorBu opened this issue Jul 17, 2019 · 2 comments
Assignees
Labels
enhancement New feature or request

Comments

@EgorBu
Copy link

EgorBu commented Jul 17, 2019

To have a better understanding of the quality of the approach let's measure performance in near-real life scenario:

  • collect a list of organization with different scale
  • apply eee-identity-matching to collect identities based on the current approach
  • evaluate it using ground truth data from ghtorrent
@zurk zurk added the enhancement New feature or request label Jul 17, 2019
@warenlg
Copy link
Contributor

warenlg commented Aug 27, 2019

I dealt with this issue while solving #30, and here are the resulting visuals, for 20 open source stacks.

unique_contributors

idmatching_bubble_chart

The bubble size being here proportional to the number of unique identities output by the identity matching algorithm.
I removed IBM and intel from the chart because we performed badly on their codebase (~60% precision and recall) and it squashed the rest of the chart. This is under investigation.
These results does not include the current work on bot detection.

@vmarkovtsev vmarkovtsev assigned warenlg and unassigned EgorBu Aug 27, 2019
@vmarkovtsev
Copy link
Collaborator

I consider this task as done. @warenlg can you please create a follow-up with the investigation of IBM and Intel.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants