Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

StackOverflow metrics #7

Closed
likeath opened this issue Nov 15, 2016 · 8 comments
Closed

StackOverflow metrics #7

likeath opened this issue Nov 15, 2016 · 8 comments

Comments

@likeath
Copy link
Contributor

likeath commented Nov 15, 2016

Hello!
Here I'd like to discuss what metric should be used for StackOverflow fetcher.

In my solution I used total count of questions tagged be project's name and percent of questions with at least one answer. From all things I tried, this approach allow to get more precise results (search just by project name give many false positive results, see multi_json, for example).
In general, this approach works good for big projects such a frameworks, but there are some troubles with smaller ones.

I think that this is OK, as this is StackOverflow particularity – there are more general questions about languages and technologies, than specific questions about some library. Maybe these metrics may be used for classifier, if total count of questions would be threshold by some value.

One more thing: as for now Ossert deals only with ruby projects, it should be safe to add 'ruby' tag for all requests, it would allow to filter projects with too general names

What do you think?

@timtonk
Copy link
Contributor

timtonk commented Nov 18, 2016

Just updated the report - https://gist.github.com/tonkonogov/83b8f704aac266e53781398bfd89aea1

The worst thing with advanced search is that it requires all tags to be marked on a question, despite API docs which state at least one will be present on all returned questions. I will create a questions for them on SO later.

In this round I used a previous result for /search call, and additionally two /search/advanced calls where (1) a project name is searched through the whole question (2) specified one or two tags (3) search requires a strict match.

The first thing to admit - Recall metric is subtly higher: 61.5% instead of 57%. Could be even better if not that bug with 2 tags. Also with this search I would propose to use a presence of questions as a crucial feature for class A & B projects, and much less important for other classes.

Now we need to reach a consensus about a way of fetching the data. I'm open for new proposals, but right now I advocate for advanced strict search with the only tag ruby

UPDATE: apparently the bug with tags won't be fixed soon as it is known since 2013

@sclinede
Copy link
Member

Hi, guys. @tonkonogov @likeath
At first - great work! Thank you.

I think that best way is to fetch data using advanced search with tag ruby, as it is stable enough.
Now about classification.
I don't see metric name actually, but I suspect that it was "Number of questions on StackOverflow".

Number of matches on stack overflow has a big weight in case of medium and high mature projects. And there is nothing bad that small and experimental project gain C, D or even E.

Let's try to implement it, calculate for current projects (about 5000 top downloaded) and see what thresholds we'll come to (using current algorithm).
If thresholds would be strange we can make them synthetic upon real values.
WDYT?

@timtonk
Copy link
Contributor

timtonk commented Nov 21, 2016

Exactly what I am working on now. I will add a ratio of resolved questions as well to see whether it correlates with a grade or not.

@timtonk
Copy link
Contributor

timtonk commented Nov 27, 2016

Finally, have some results - https://gist.github.com/tonkonogov/a514f75d571ec6a2dae4d3d447996f52
Unfortunately, it is not really complete for class E (without ~400 entries) as my quota for SO has been depleted.
It seems like synthetic values may be a more valid way to classify, as low grade project categories contain too much anomalies (gems with name like vagrant, firebase, text, t, osm etc). What is more amazing to me is a stable correlation of percentage of resolved questions. I would get round to this feature without synthetics.

@sclinede
Copy link
Member

@tonkonogov great stuff! That is exactly what is the purpose of the project.
We try to find interesting and stable correlations in different activity around the project.
So I'm happy to hear about those results.

@timtonk
Copy link
Contributor

timtonk commented Dec 9, 2016

The next part of stat from SO API. This time on Google Spreadsheets - https://docs.google.com/spreadsheets/d/1yhLHLi8av24mMJ1TtIncik12pex9tKANsh8l-a0OU-M
Will analyse it in details tomorrow, but after a quick look first six metrics seem more reliable than others

@timtonk
Copy link
Contributor

timtonk commented Dec 10, 2016

After a lasting contemplating of the data I opt out several features which seems the most logic for me. The first thing to notice is that all features have almost perfect breakdown by classes. It could be much better but there are too much anomalies. I think I need to devote one of the next phases to find a way to avoid it.

I assume, that it doesn't make sense to test against several characteristics of the same metric (like average, median, sum), thus I proceeded with the only one in each group. Also the data is from community_quarter, not community_last_year.

questioner reputation median - reliable, I guess. A peculiar thing is that this number is actually average of medians. I tried to compute a median of median instead and got 0, even within a single grade (without values of the juxtaposed grade). I will add synth-values 160 - 140 - 85 - 43 - 0 until anomalies are still the the case.

question view sum - reliable, obviously. More people looking for the question - more people getting the answers for questions about the project. Grade A is the winner, but other grades require a little tuning. 300 - 140 - 100 - 50 - 0 should be ok.

num of answers average - only by exception. Median 0.23 is really funny, and the sum represents nothing interesting for analysis, I think. Values will be 0.25, 0.12, 0.8, 0.1, 0. With decision trees it would be nice to add a sum > 3 condition as well, but right now I think it should be skipped.

question score sum - to be honest, I can't imagine how this metric is useful for the project community. The closest idea is that more people like - more likely they will most interesting answers from the community. The sum is the closest for this idea. Values will be 1.7, 0.7, 0.5, 0.1, 0.

num of questions - no comments, but I would floor and increase the numbers. 10 - 2 - 1 - 0 - 0

num of questioners - obviously, it's reliable, though can't estimate reliable values. Specified numbers are ridiculous. Let's try 9 - 2 - 1 - 0 - 0

percent of resolved questinos - I feel like it should be the same values as for total.

That's it. Any feedback & corrections & proposes are welcome.

@sclinede
Copy link
Member

sclinede commented Aug 3, 2017

Pulls are merged and I don't see any activity here, so I'll close it for now.

@sclinede sclinede closed this as completed Aug 3, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants