New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
StackOverflow metrics #7
Comments
Just updated the report - https://gist.github.com/tonkonogov/83b8f704aac266e53781398bfd89aea1 The worst thing with advanced search is that it requires all tags to be marked on a question, despite API docs which state In this round I used a previous result for The first thing to admit - Recall metric is subtly higher: 61.5% instead of 57%. Could be even better if not that bug with 2 tags. Also with this search I would propose to use a presence of questions as a crucial feature for class A & B projects, and much less important for other classes. Now we need to reach a consensus about a way of fetching the data. I'm open for new proposals, but right now I advocate for advanced strict search with the only tag UPDATE: apparently the bug with tags won't be fixed soon as it is known since 2013 |
Hi, guys. @tonkonogov @likeath I think that best way is to fetch data using advanced search with tag Number of matches on stack overflow has a big weight in case of medium and high mature projects. And there is nothing bad that small and experimental project gain C, D or even E. Let's try to implement it, calculate for current projects (about 5000 top downloaded) and see what thresholds we'll come to (using current algorithm). |
Exactly what I am working on now. I will add a ratio of resolved questions as well to see whether it correlates with a grade or not. |
Finally, have some results - https://gist.github.com/tonkonogov/a514f75d571ec6a2dae4d3d447996f52 |
@tonkonogov great stuff! That is exactly what is the purpose of the project. |
The next part of stat from SO API. This time on Google Spreadsheets - https://docs.google.com/spreadsheets/d/1yhLHLi8av24mMJ1TtIncik12pex9tKANsh8l-a0OU-M |
After a lasting contemplating of the data I opt out several features which seems the most logic for me. The first thing to notice is that all features have almost perfect breakdown by classes. It could be much better but there are too much anomalies. I think I need to devote one of the next phases to find a way to avoid it. I assume, that it doesn't make sense to test against several characteristics of the same metric (like average, median, sum), thus I proceeded with the only one in each group. Also the data is from
That's it. Any feedback & corrections & proposes are welcome. |
Pulls are merged and I don't see any activity here, so I'll close it for now. |
Hello!
Here I'd like to discuss what metric should be used for StackOverflow fetcher.
In my solution I used total count of questions tagged be project's name and percent of questions with at least one answer. From all things I tried, this approach allow to get more precise results (search just by project name give many false positive results, see multi_json, for example).
In general, this approach works good for big projects such a frameworks, but there are some troubles with smaller ones.
I think that this is OK, as this is StackOverflow particularity – there are more general questions about languages and technologies, than specific questions about some library. Maybe these metrics may be used for classifier, if total count of questions would be threshold by some value.
One more thing: as for now Ossert deals only with ruby projects, it should be safe to add 'ruby' tag for all requests, it would allow to filter projects with too general namesWhat do you think?
The text was updated successfully, but these errors were encountered: