New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Email updates? #83

Closed
therebelrobot opened this Issue Jan 9, 2015 · 25 comments

Comments

Projects
None yet
5 participants
@therebelrobot

therebelrobot commented Jan 9, 2015

Hey, I love your project! I've been using your daily digest emails to stay on top of what's trending, but I haven't gotten any emails since Dec 31st. Are you still running those? Cuz you should. They rock.

@igrigorik

This comment has been minimized.

Owner

igrigorik commented Jan 9, 2015

Indeed, I stopped sending those after we switched to the new API endpoint on Jan 1st. Given that GitHub now provides their own version [1] with similar coverage... I'm not sure it's worth it. Have you tried the GH version?

[1] https://github.com/explore/subscribe

@therebelrobot

This comment has been minimized.

therebelrobot commented Jan 9, 2015

Ooooo, I'll definitely take a look at that. Thanks a million, and keep up the good work!

@jerodsanto

This comment has been minimized.

jerodsanto commented Jan 11, 2015

I've been missing the daily emails, too. I do get GitHub's Explore email as well, but it doesn't surface as much (or as interesting) repos.

Any chance we can get it sending again? im happy to help out however possible.

@igrigorik

This comment has been minimized.

Owner

igrigorik commented Jan 11, 2015

@jerodsanto what in particular are you missing in GH version? /cc @briandoll

@jerodsanto

This comment has been minimized.

jerodsanto commented Jan 11, 2015

GH version just gives me 5 repos that are trending. Super trending. It does not surface repos early like your email did.

It also shows me stuff my friends have been starring, which is cool. I liked getting both emails.

Maybe I'm the only one. If so, is there an easy way I can set one up myself that works a lot like yours did?

-Jerod
(sent on the go)

On Jan 11, 2015, at 1:03 PM, Ilya Grigorik notifications@github.com wrote:

@jerodsanto what in particular are you missing in GH version? /cc @briandoll


Reply to this email directly or view it on GitHub.

@igrigorik

This comment has been minimized.

Owner

igrigorik commented Jan 11, 2015

The report itself is just a ruby script that ran each day, see: https://github.com/igrigorik/githubarchive.org/tree/master/reports

@jerodsanto

This comment has been minimized.

jerodsanto commented Jan 12, 2015

@igrigorik awesome! If you are set on keeping this turned off, I can roll my own using yours as an example. 🍻

@therebelrobot

This comment has been minimized.

therebelrobot commented Jan 12, 2015

@jerodsanto, if you make it public, lemme know too :)

@jerodsanto

This comment has been minimized.

jerodsanto commented Jan 16, 2015

@igrigorik I tried running the queries in your daily.rb in the BigQuery query editor, but they both return 0 results each time. Here is the exact query I ran, which always returns 0 results:

SELECT repository_name, repository_language, repository_description, COUNT(repository_name) as cnt, repository_url
FROM githubarchive:github.timeline
WHERE type="WatchEvent"
  AND PARSE_UTC_USEC(created_at) >= PARSE_UTC_USEC("2015-01-15 20:00:00")
        AND repository_url IN (
      SELECT repository_url
      FROM githubarchive:github.timeline
      WHERE type="CreateEvent"
              AND PARSE_UTC_USEC(repository_created_at) >= PARSE_UTC_USEC('2015-01-15 20:00:00')
              AND repository_fork = "false"
              AND payload_ref_type = "repository"
      GROUP BY repository_url
    )
GROUP BY repository_name, repository_language, repository_description, repository_url
HAVING cnt >= 5
ORDER BY cnt DESC
LIMIT 25

Has something changed or am I doing it wrong? Thanks!

@igrigorik

This comment has been minimized.

Owner

igrigorik commented Jan 16, 2015

@jerodsanto correct, because it's querying old table that's no longer being updated (see githubarchive.org). You'd have to update the queries to go against the new daily tables.

@jerodsanto

This comment has been minimized.

jerodsanto commented Jan 16, 2015

@igrigorik that makes sense, thanks. I'll give it a go. First time using BigQuery so we'll see how I do. :)

@jerodsanto

This comment has been minimized.

jerodsanto commented Jan 20, 2015

@igrigorik I see you can extract JSON fields w/ BigQuery, but can you query on them? If not, I'm not sure how to reproduce your previous queries. The payload_ref_type is now inside the payload and the repository_fork is missing all together.

@igrigorik

This comment has been minimized.

Owner

igrigorik commented Jan 20, 2015

Yes you can, you just need to wrap the subquery which does the extract.. For example:

SELECT event as issue_status, COUNT(*) as cnt FROM (
  SELECT type, repo.name, actor.login,
    JSON_EXTRACT(payload, '$.action') as event, 
  FROM (TABLE_DATE_RANGE(day.events_, 
    TIMESTAMP('2015-01-01'), 
    TIMESTAMP('2015-02-01')
  )) 
  WHERE event = '...'
)
GROUP by issue_status;

Note that the WHERE clause is applied in an outer select and is run against the extracted field.

@jerodsanto

This comment has been minimized.

jerodsanto commented Jan 20, 2015

Hmm, maybe I'm just dense, but this appears to mirror your structure (where the WHERE clause is nested just outside the TABLE_DATE_RANGE function):

SELECT repo.name, repo.url, 
  JSON_EXTRACT(payload, '$.ref_type') as ref_type,
FROM (TABLE_DATE_RANGE(githubarchive:day.events_, 
  TIMESTAMP("2015-01-19"), 
  TIMESTAMP("2015-01-19")
))
WHERE type="CreateEvent"
AND ref_type="repository"

When I run that, I get:

Field 'ref_type' not found in table 'githubarchive:day.events_20150119'.

What I'm trying to say is: "give me the repositories that were created on this day". I will then use that result as a subquery to get the most "watch" events on that set of repos.

(Thanks for all your help, btw!)

@igrigorik

This comment has been minimized.

Owner

igrigorik commented Jan 20, 2015

You can't extract and filter within the same subquery. This should do the trick:

SELECT * FROM (
  SELECT repo.name, repo.url, 
    JSON_EXTRACT(payload, '$.ref_type') as ref_type,
  FROM (TABLE_DATE_RANGE(githubarchive:day.events_, 
    TIMESTAMP("2015-01-19"), 
    TIMESTAMP("2015-01-19")
  ))
  WHERE type="CreateEvent"
)
WHERE ref_type='"repository"'
@jerodsanto

This comment has been minimized.

jerodsanto commented Jan 20, 2015

@igrigorik yup, that does it! Thanks so much for the help. I'll let you know when I have something worth talking about!

@hnq90

This comment has been minimized.

Contributor

hnq90 commented Jan 21, 2015

I created new repository here: https://github.com/hnq90/GitHubArchive
@jerodsanto Do you have any suggestion?

@hnq90

This comment has been minimized.

Contributor

hnq90 commented Jan 21, 2015

You can subscribe here: http://eepurl.com/bb4EFL

@jerodsanto

This comment has been minimized.

jerodsanto commented Jan 21, 2015

@hnq90 cool! I wish I'd known you were working on this, too. I have repurposed about 90% of this. I plan to rewrite the view layer and release it under The Changelog umbrella. Seemed like a good fit.

@hnq90

This comment has been minimized.

Contributor

hnq90 commented Jan 22, 2015

@jerodsanto Nice, please notice me if you release it. 👍

@therebelrobot

This comment has been minimized.

therebelrobot commented Jan 22, 2015

@jerodsanto +1 to that too! I'd love to take a look once released :)

@hnq90

This comment has been minimized.

Contributor

hnq90 commented Feb 2, 2015

@igrigorik
I tried to run this query but it's give error. Do you have any suggestion?

Error: (L2:1): JOIN (including semi-join) and UNION ALL (comma) may not be combined in a single SELECT statement. Either move the UNION ALL to an inner query or the JOIN to an outer query.
SELECT repo.id, repo.name, COUNT(repo.name) as starringCount
FROM (TABLE_DATE_RANGE(githubarchive:day.events_, 
    TIMESTAMP('2015-01-01'), 
    TIMESTAMP('2015-02-01')
  )) 
WHERE type = 'WatchEvent'
  AND repo.id IN (
    SELECT repo.id FROM (
      SELECT repo.id,
        JSON_EXTRACT(payload, '$.ref_type') as ref_type,
      FROM (TABLE_DATE_RANGE(githubarchive:day.events_, 
    TIMESTAMP('2015-01-01'), 
    TIMESTAMP('2015-02-01')
  )) 
      WHERE type='CreateEvent'
    )
    WHERE ref_type CONTAINS 'repository'
  )
GROUP BY repo.id, repo.name
HAVING starringCount >= 5
ORDER BY starringCount DESC
LIMIT 25;
@hnq90

This comment has been minimized.

Contributor

hnq90 commented Feb 2, 2015

Sorry, I resolved it by adding a new sub-query

SELECT repo.id, repo.name, type, COUNT( repo.name) as starringCount
FROM (
  SELECT repo.id, repo.name, type
  FROM 
  (TABLE_DATE_RANGE(githubarchive:day.events_, 
    TIMESTAMP('2015-01-01'), 
    TIMESTAMP('2015-02-01')
  )) 
  WHERE type = 'WatchEvent' 
)
WHERE repo.id IN (
    SELECT repo.id FROM (
      SELECT repo.id,
        JSON_EXTRACT(payload, '$.ref_type') as ref_type,
      FROM (TABLE_DATE_RANGE(githubarchive:day.events_, 
    TIMESTAMP('2015-01-01'), 
    TIMESTAMP('2015-02-01')
  )) 
      WHERE type='CreateEvent'
    )
    WHERE ref_type CONTAINS 'repository'
  )
GROUP BY repo.id, repo.name, type
HAVING starringCount >= 5
ORDER BY starringCount DESC
LIMIT 25;
@adamstac

This comment has been minimized.

Contributor

adamstac commented Feb 27, 2015

Howdy! Adam here from The Changelog.

Just wanted to give everyone a heads up that we launched Changelog Nightly in collaboration with @igrigorik and GitHub Archive. Listen to episode #144 of The Changelog for an in-depth interview with Ilya about the history of GitHub Archive, logging and archiving GitHub’s public event data with Google BigQuery, and all the details of The Changelog taking over the GitHub Archive daily email reports.

Changelog Nightly is the new and improved GitHub Archive daily email reports. It ships every night at 10pm CT -- and unearths the hottest new repos on GitHub before they blow up. It's nerd to the core and in your inbox each night.

We'll continue to work closely with @igrigorik as well on future email report ideas, so subscribe here to keep up. We also open sourced the code too. Contributions are welcome.

@igrigorik you can probably leave this issue open for a bit for anyone who comes here looking for an update on the email.

Thanks y'all!

(cc @hnq90)

@igrigorik

This comment has been minimized.

Owner

igrigorik commented Feb 27, 2015

\o/ ... and on that happy note, closing this issue :)

@igrigorik igrigorik closed this Feb 27, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment