Permalink
Browse files

update readme in bigquery section

  • Loading branch information...
igrigorik committed Apr 28, 2012
1 parent 5f10cc6 commit a9e1952e82086de35060d4bc80400807e4021156
Showing with 11 additions and 11 deletions.
  1. +3 −11 bigquery/README.md
  2. +8 −0 bigquery/queries/top_watches_by_language.sql
View
@@ -2,10 +2,12 @@
[Google BigQuery](https://developers.google.com/bigquery/) is a web service that lets you do interactive analysis of massive datasets—up to billions of rows.
-The Github Activity stream is automatically uploaded to BigQuery sevice to enable interactive analysis.
+The Github Activity stream is automatically uploaded to BigQuery sevice to enable interactive analysis. Follow the [instructions to access the dataset](http://www.githubarchive.org/).
## Sample Queries
+Have a clever query you would like to share? Fork the project, add it to the project under **queries/name.sql** and send a pull request!
+
```sql
/* distribution of different events on GitHub */
SELECT type, count(type) as cnt
@@ -58,13 +60,3 @@ ORDER BY date DESC
```
For full schema of available fields to select, order, and group by, see schema.js.
-
-## Manually loading the data
-
-If you want to load the archive data into your own BigQuery project:
-
-```bash
-$> wget http://data.githubarchive.org/2012-03-11-15.json.gz
-$> ruby transform.rb -i 2012-03-11-15.json.gz
-$> python bq.py --apilog true load github.events 2012-03-11-15.json.gz-out.csv.gz schema.js
-```
@@ -0,0 +1,8 @@
+/* watches for a specific language + date range */
+SELECT repository_name, count(repository_name) as watches, repository_description, repository_url
+FROM github.events
+WHERE type="WatchEvent"
+ AND repository_language="Ruby"
+ AND PARSE_UTC_USEC(created_at) >= PARSE_UTC_USEC('2012-04-01 00:00:00')
+GROUP BY repository_name, repository_description, repository_url
+ORDER BY watches DESC

0 comments on commit a9e1952

Please sign in to comment.