Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stack Overflow API Integration #211

Closed
patt0 opened this issue Mar 18, 2015 · 17 comments
Closed

Stack Overflow API Integration #211

patt0 opened this issue Mar 18, 2015 · 17 comments

Comments

@patt0
Copy link
Collaborator

patt0 commented Mar 18, 2015

We need to add the SO user_id to the Accounts entity in the backend, and in the Accounts master spreadsheet to update the datastore. We should also identify the SO tags that are relevant for various product groups and teams.

This assumes that an aggregate monthly score is sufficient.
Using the SO https://api.stackexchange.com/docs/top-user-answers-in-tags and the date range query operator, we should be able to automatically create a monthly (other interval?) record for each GDE.

The activity record will not have an associated G+ activity post, and could be entitled
"Monthly SO Activity Record - April 2015"

Perhaps we can define the usage of the impacts for SO as follows:

What we now call Social Impact could hold the number of questions answered / accepted
What we call Total Impact could hold the number of views on the questions, with an understanding that this will grow and that impact is a rolling window.

Its probable that as we move out of G+ into harvesting, we may need to refactor the way we name and collect the metrics we want, killing the one size fits all social metric associated with google+ posts.

@SmokyBob
Copy link
Collaborator

+1 on Everything.

I was thinking that we might want to add an "Option" page to enable GDEs to update the SO user_id and in the future other fields as more sources get harvested, this way we can avoid having to manage all this ids in the master_list and leave the GDE the ability to choose which plug-ins to use.
What do you think?

P.S. Do you think it's a good idea to ask for feedback on how to integrate SO in the GDE community?

@patt0
Copy link
Collaborator Author

patt0 commented Mar 19, 2015

Yes an options page will be of use and the field can provide a marker for processing data extraction.

We will request feedback when we are a little further down the line, our previous attempts have not been very fruitful, so I think we are better off doing this in consultation with the team at google who want to get some measurements.

@patt0
Copy link
Collaborator Author

patt0 commented Apr 12, 2015

@SmokyBob @Scarygami

Finally got this going, changing the approach to getting answers for a period and then finding out the tags from the associated questions and created an activity record. Harvest Interval is weekly with a possibility to harvest retro actively from a particular date, while this is going to duplicate some tasks added manually its probably worth asking individual GDE's to delete those. ( This is done on Firefly to get an idea of what is looks like on the front end and in the raw data extraction )

patt0@647653e

Pushed the code to OMEGA in order to update the Product Group tags so harvesting can happen for those that have supplied their SO id. We can get the GDE to fill up our spreadsheet and push that number higher when we launch. Once the PG where updated, I copied the data from OMEGA 2 FIREFLY and ran the harvest from 1st January 2015.

Check out your harvest tasks https://10-dot-gdetracking.appspot.com/#/

Raw export test available (242 activity record creation from 75 gde's with SO id's) https://docs.google.com/spreadsheets/d/1p1goP2PKCjbd7XvqCKvDwGKpeeTfNON-SKILarc1oL0/edit#gid=729453407

@SmokyBob
Copy link
Collaborator

Everything looks good to me

@LindaLawton
Copy link
Collaborator

more tags

Google-api
Google-Analytics-api
Google-drive-sdk
Google-docs-api
Google-visualization
Google-oauth
Google-api-dotnet-client
google-api-php-client
google-calendar
google-drive-realtime-api
google-maps-api-3
google-maps-api-2
google-spreadsheet-api
gmail
gmail-imap
google-glass
google-mirror-api
youtube-api
google-search
google-addwords
google-gdk
google-compute-engine
google-apps-scripts

you could just do a search on Google in tags http://stackoverflow.com/tags?tab=name

@LindaLawton
Copy link
Collaborator

Looks like you are only checking from Jan 1, something up with it. Even looking only at the tags you appear to be grabbing. I have answered 3 Google+ (#googleplus), 3 app script (#googleappsscript) and an android (#android) question. They aren't listed. in your sheet. Nor are the older questions that get new +1's or accepts.

@patt0
Copy link
Collaborator Author

patt0 commented Apr 15, 2015

@LindaLawton at this time we are only harvesting the questions for the ProductGroup of the particular GDE which explains why the routine may have not picked up these answers. We will need to see with Program Management how they want to deal with impact measurement with relation to GDE being polyvalent. In this harvest I started at Jan 1 2015 indeed, I need to check with Marie what she wants to do with previous periods data, when we may have a duplicate entry issue.

I will be launching the feature over the weekend with a FAQ and a Survey and will solicit feedback. That is a good point to raise.

@LindaLawton
Copy link
Collaborator

So I am locked into a single product group being Google-Analytics? At the very least please add Google-Analytics-api I don't even think 5% of my answers are analytic's related. Guess I am not a very productive GDE .

@patt0
Copy link
Collaborator Author

patt0 commented Apr 15, 2015

I'll take it up for discussion with Ola and Gang in my next meeting.
On 15 Apr 2015 23:40, "Linda Lawton" notifications@github.com wrote:

So I am locked into a single product group being Google-Analytics? At the
very least please add Google-Analytics-api I don't even think 5% of my
answers are analytic's related. Guess I am not a very productive GDE .


Reply to this email directly or view it on GitHub
#211 (comment).

@LindaLawton
Copy link
Collaborator

I don't think it really matters. This is just for Google to track right. It does make sense that the Analytics team would only be interested in what I do analytics related. Anything I do in the other tags probably isn't valid info for them.

That being said looks good :)

@Scarygami
Copy link
Contributor

One thing I noticed looking at my data on staging: For historic data it would be good to have the post_date set to a date (first or last) of the month the activity happened instead of the date the job has run. At the moment my April looks like I have been really active :)

@patt0
Copy link
Collaborator Author

patt0 commented Apr 20, 2015

Yes that makes sense, I will make the record take the date of the last day
of the period being harvested.

Patrick Martinent

On 20 April 2015 at 14:04, Gerwin Sturm notifications@github.com wrote:

One thing I noticed looking at my data on staging: For historic data it
would be good to have the post_date set to a date (first or last) of the
month the activity happened instead of the date the job has run. At the
moment my April looks like I have been really active :)


Reply to this email directly or view it on GitHub
#211 (comment).

@LindaLawton
Copy link
Collaborator

Are you recording all the tags or just the first tag? I seam to be very Android active for a non android person.

What happens with a question tagged #android #goggle-Analytics ? What happens if its the other way around?

@Scarygami
Copy link
Contributor

Best to wait for @patt0 to answer, but looking at his source it will be counted for both product groups. So if you have a question like in your example you will have one SO activity in #android and one in #google-analytics no matter in what order they appear in the question.

@Scarygami
Copy link
Contributor

@patt0 could you create a PR for your pending changes (even if you still have some additional changes planned before merging), just so it's easier to find the way there :)

@patt0
Copy link
Collaborator Author

patt0 commented Apr 20, 2015

Yeah its a little strange, running some test, as you said it should create
for both identified tags in their respective product group.

I did a sanity check against this and got 11 answers for the period Jan
2014 March 2015

http://stackoverflow.com/search?q=user:1841839+[android]

So it got the Android tags OK but it did not get Analytics in some case.

Been running a test on a month for Linda only and it does seem to work ...
I am cleaning up the database and running it again ... then we can have a
look.

I will also do a push of my fork and a PR if I don't find anything in the
next 30 minutes.

Thanks both.

Patrick Martinent

Patrick Martinent

On 20 April 2015 at 19:30, Gerwin Sturm notifications@github.com wrote:

@patt0 https://github.com/patt0 could you create a PR for your pending
changes (even if you still have some additional changes planned before
merging), just so it's easier to find the way there :)


Reply to this email directly or view it on GitHub
#211 (comment).

@patt0
Copy link
Collaborator Author

patt0 commented Apr 21, 2015

Found the issue, was a classic case of Eventual Consistency as I moved to
multi product groups, but using the same url for the link across some AR.
In a close loop the query might not find the record and a second would be,
but after a while, the indexes might be flushed and the query would find a
record and think it was existing, while it might be for another PG. So I
changed the query to use the title which contains the date interval and the
product group.

I have run the harvest from 2014 Jan on stage for Gerwin and Linda so we
can do some sanity checks, looks pretty good to me now.

https://docs.google.com/spreadsheets/d/1p1goP2PKCjbd7XvqCKvDwGKpeeTfNON-SKILarc1oL0/edit#gid=729453407

Patrick Martinent

On 20 April 2015 at 19:36, Patrick Martinent patrick.martinent@gmail.com
wrote:

Yeah its a little strange, running some test, as you said it should create
for both identified tags in their respective product group.

I did a sanity check against this and got 11 answers for the period Jan
2014 March 2015

http://stackoverflow.com/search?q=user:1841839+[android]

So it got the Android tags OK but it did not get Analytics in some case.

Been running a test on a month for Linda only and it does seem to work ...
I am cleaning up the database and running it again ... then we can have a
look.

I will also do a push of my fork and a PR if I don't find anything in the
next 30 minutes.

Thanks both.

Patrick Martinent

Patrick Martinent

On 20 April 2015 at 19:30, Gerwin Sturm notifications@github.com wrote:

@patt0 https://github.com/patt0 could you create a PR for your pending
changes (even if you still have some additional changes planned before
merging), just so it's easier to find the way there :)


Reply to this email directly or view it on GitHub
#211 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants