# Charting - Interactive Charts

This notebook demonstrates how you can use the %chart magic to quickly chart data in BigQuery tables, and other sources such as Python lists.

This notebook uses several data sets to examine how to quickly draw different style charts.


Related Links:

* [BigQuery](https://cloud.google.com/bigquery/)
* BigQuery [SQL reference](https://cloud.google.com/bigquery/query-reference)
* [Google Charts](https://developers.google.com/chart/)


In [1]:
import gcp.bigquery as bq

All the chart magics take a data source argument, which can be a table name, or a reference to a table, query or view.

The magic uses Google Charts to easily chart data without having to write code. The first argument is the type of chart we want, and as you can see from the help below, numerous types are supported:

In [2]:
%chart --help

usage: %%chart [-h]
               {annotation,area,bars,bubbles,calendar,candlestick,columns,combo,gauge,geo,histogram,line,map,org,paged_table,pie,sankey,scatter,stepped_area,table,timeline,treemap}
               ...

Generate an inline chart using Google Charts using the data in a Table, Query,
dataframe, or list. Numerous types of charts are supported. Options for the
charts can be specified in the cell body using YAML or JSON.

positional arguments:
  {annotation,area,bars,bubbles,calendar,candlestick,columns,combo,gauge,geo,histogram,line,map,org,paged_table,pie,sankey,scatter,stepped_area,table,timeline,treemap}
                        commands
    annotation          Generate a annotation chart.
    area                Generate a area chart.
    bars                Generate a bars chart.
    bubbles             Generate a bubbles chart.
    calendar            Generate a calendar chart.
    candlestick         Generate a candlestick chart.
    columns             Generate a co

Following the chart type, we can specify the data to be charted. However, Google Charts requires columns in data to be in a specific order so we are often going to need to specify these as well, unless our data happens to be in the order we want. We can do this with the --field parameter which takes a comma-separated list of fields. Finally, the body of the cell can include additional options to be passed to the chart; these are chart-specific.



For our first example, lets look at the popularity of programming languages. We can use the public github_timeline data for this. First we should look at the data which we can do with a paged_table chart magic (there is also a table chart magic which will show the entire table; use that with caution with tables you know to be small only):

In [3]:
%chart paged_table publicdata:samples.github_timeline

We will get a rough view by counting the incidences of pushes bucketed by the repository_language field. Unlike in our other notebook where we explored github data, we won't limit the set of language in the query, but will instead use the sliceVisibilityThreshold option to the pie chart to bucket all languages with less than 6 degree pie slices (i.e. less than 1/60th of the pushes) into an 'Other' bucket. Chart options are specified in the %%chart cell boduy and can be expressed in JSON or YAML. We will use JSON in this chart but YAML in the remaining examples.

Note in the example below our SELECT statement returns the fields in the order we want them so we don't really need the --field argument but we include it for illustrative purposes. Also note that we can use variable replacement in the options.

In [4]:
%%sql --module github_events
SELECT repository_language, COUNT(repository_language) as pushes
FROM [publicdata:samples.github_timeline]
WHERE type = 'PushEvent'
  AND repository_language != ''
GROUP BY repository_language
ORDER BY pushes DESC

In [5]:
slice_threshold = 6 / 360

In [6]:
%%chart pie --field repository_language,pushes github_events
{
  "title": "Language Popularity",
  "sliceVisibilityThreshold": "$slice_threshold"
}

For a third example, let's look at weather data. The gsod sample data has weather measurements from many stations around the world. We will look at weather station 471270, which is in South Korea, from 2000 to 2010. In this example we will show that we can chart Python data, not just BigQuery tables and query results. For this station only max_temperature was measured, not min_temperature.

In [7]:
%%sql --module skweather
SELECT year, month, day, max_temperature
FROM [publicdata:samples.gsod]
WHERE station_number = 471270 AND year > 2000

We need to change the separate day, month and year fields into a single datetime
to plot the temperature over time. To do this we will read the data into a Python data structure and then use that for the chart. We will use an annotation chart, which is useful for time series:

In [8]:
import time
import datetime
weather = [{'date': datetime.datetime(year=row['year'], month=row['month'], day=row['day']), 
            'max': row['max_temperature']} for row in bq.Query(skweather).results()]

In [9]:
%chart annotation --field date,max weather

For our last example, let's look at a scatter plot. This time we will look at the public natality data and plot a sample of the data, showing gestation weeks vs birth weight. We'll add some axes titles, a title and a trend line via chart options.

In [10]:
%%sql --module babies
SELECT gestation_weeks, weight_pounds FROM [publicdata:samples.natality] WHERE gestation_weeks < 99 LIMIT 1000

In [11]:
%%chart scatter babies
title: Birth Weight vs Gestation Weeks
hAxis:
  title: Gestation Weeks
vAxis:
  title: Weight
legend: none
trendlines:
  0: {}

For our next example, we will use a map, and draw the location of London's railway stations on the Bakerloo line.
This data isn't in BigQuery; we'll just create it in the notebook. The source for this is the public domain
station data from here: http://wiki.openstreetmap.org/wiki/List_of_London_Underground_stations

In [12]:
stations = [
{'name': 'Baker Street', 'latitude': 51.52265, 'longitude': -0.15704},
{'name': 'Charing Cross', 'latitude': 51.507108, 'longitude': -0.122963},
{'name': 'Edgware Road (Bakerloo Line)', 'latitude': 51.519560, 'longitude': -0.169068},
{'name': 'Elephant & Castle', 'latitude': 51.49467, 'longitude': -0.10047},
{'name': 'Embankment', 'latitude': 51.50717, 'longitude': -0.12195},
{'name': 'Harlesden', 'latitude': 51.53628278, 'longitude': -0.257622488},
{'name': 'Harrow & Wealdstone', 'latitude': 51.59205973, 'longitude': -0.334725352},
{'name': 'Kensal Green', 'latitude': 51.53060655, 'longitude': -0.224253545},
{'name': 'Kenton', 'latitude': 51.58173809, 'longitude': -0.316870809},
{'name': 'Kilburn Park', 'latitude': 51.53495818, 'longitude': -0.193963023},
{'name': 'Lambeth North', 'latitude': 51.49894, 'longitude': -0.11216},
{'name': 'Maida Vale', 'latitude': 51.52989409, 'longitude': -0.185888819},
{'name': 'Marylebone', 'latitude': 51.522660, 'longitude': -0.162996},
{'name': 'North Wembley', 'latitude': 51.56258091, 'longitude': -0.304072648},
{'name': 'Oxford Circus', 'latitude': 51.51517, 'longitude': -0.14119},
{'name': 'Paddington', 'latitude': 51.5151846554, 'longitude': -0.17553880792},
{'name': 'Piccadilly Circus', 'latitude': 51.51022, 'longitude': -0.13392},
{'name': 'Queen\'s Park', 'latitude': 51.534179, 'longitude': -0.205257721},
{'name': 'Regent\'s Park', 'latitude': 51.52344, 'longitude': -0.14713},
{'name': 'South Kenton', 'latitude': 51.57044666, 'longitude': -0.308566354},
{'name': 'Stonebridge Park', 'latitude': 51.54402388, 'longitude': -0.275978856},
{'name': 'Warwick Avenue', 'latitude': 51.52329728, 'longitude': -0.183777837},
{'name': 'Waterloo', 'latitude': 51.50322, 'longitude': -0.11328},
{'name': 'Wembley Central', 'latitude': 51.55122817, 'longitude': -0.29577538},
{'name': 'Willesden Junction', 'latitude': 51.53181, 'longitude': -0.242350}]


For our map we will specify the showTip option which will show the 3rd column (the station name) when hovering over a map pushpin. The useMapTypeControl option lets use switch between regular and satellite maps. We use YAML for the options this and following times.



In [13]:
%%chart map --field latitude,longitude,name stations
showTip: true
mapType: normal
useMapTypeControl: true

Finally, let's look at a different type of map chart. We will use a 'geo' chart to show the number of births in different US states during the 1980s. To do this we will make use of the public natality data set. Let's first get define a query to get the data we want to plot: 

In [14]:
%%sql --module births
SELECT state, COUNT(*) count_babies
FROM [publicdata:samples.natality]
WHERE year >= 1980 AND year < 1990
GROUP BY state

Now we can draw the chart. We add some JSON options to the chart to specify we want a US map at state (province) resolution (for more options see the documentation here: https://developers.google.com/chart/interactive/docs/gallery/geochart).

In [15]:
%%chart geo births
region: US
resolution: provinces