# Visualizing Data 

This notebook demonstrates how to connect SQL queries with data visualization tools written in Python. Visualization is among the most important features of the Jupyter Notebook because it combines the power of Python, SQL and the browser in a way that's easy to understand. 

## Setup 

The code cells in a notebook are meant to run in order from top to bottom. Actions taken in a code cell are available to other code cells. The setup cell below must run first so that all features are enabled in subsequent code cells. 


In [85]:
# Import these libraries into the notebook.
import pathlib
import subprocess 
import pandas as pd 
import folium
import folium.plugins
import re 

# Download the datases
if not pathlib.Path('flights.sqlite3').exists():
    subprocess.run('wget http://www.lifealgorithmic.com/_static/databases/flights.sqlite3', shell=True)
if not pathlib.Path('population.sqlite3').exists():
    subprocess.run('wget http://www.lifealgorithmic.com/_static/databases/population.sqlite3', shell=True)
if not pathlib.Path('59d92c1bf84a438d83f78465dce02c61_0.geojson').exists():
    subprocess.run('wget https://opendata.arcgis.com/datasets/59d92c1bf84a438d83f78465dce02c61_0.geojson', shell=True)

# Enable SQL in the notebook.
%load_ext sql
%config SqlMagic.autolimit=500

The sql extension is already loaded. To reload it, use:
  %reload_ext sql


## Python Code 

The code in the next cell is written in Python. There are four functions that you can use. Each function is given an SQL query that produces results that are graphed or plotted. The function's documentation explains what the query result should look like. 


In [99]:
def lineplot(query):
  """
  The query should produce a table with three columns. The first column
  will be the X-axis and subsequent columns contain plotted series. 
  The series name should be in the second colum and the series data in 
  the third.

    1. x-axis 
    2. Series name 
    3. Series data

  Examples:

    select year, 'California', sum(pop_total) from population group by year; 

    select year, county, sum(pop_total) 
      from population group by year, county; 

  """
  m = re.search(r'%%sql\s+(\S+)', query)
  url = m.group(1)
  query = re.sub(r'%%sql\s+(\S+)', '', query)
  df = pd.read_sql_query(query, url)
  df.pivot(index=df.columns[0], columns=df.columns[1]).plot(figsize=(16,9))


def barplot(query):
  """
  Make a bar plot. The colum definition is the same as the line plot

  Example: 

    select age, county, pop_total 
      from population
      where county = 'SANTA CRUZ'
        and year = 2020

  """
  m = re.search(r'%%sql\s+(\S+)', query)
  url = m.group(1)
  query = re.sub(r'%%sql\s+(\S+)', '', query)
  print('U', url, 'Q', query)
  df = pd.read_sql_query(query, url)
  df.pivot(index=df.columns[0], columns=df.columns[1]).plot(kind='bar', figsize=(16,9))


def worldplot(query):
  """
  Draw a world map with pins at specified locations. The arugment should be a 
  query that produces a table with the following columns: 

    1. label - The label to be put in the pin. 
    2. latitude - The latitude of the pin.
    3. longitude - The longitude of the pin.

  Example:

    select icao as label, latitude, longitude from airports where country = 'United States'

  """
  m = re.search(r'%%sql\s+(\S+)', query)
  url = m.group(1)
  query = re.sub(r'%%sql\s+(\S+)', '', query)
  df = pd.read_sql_query(query, url)
  m = folium.Map(zoom_start=3,)
  markers = folium.plugins.MarkerCluster(
          options={
              'disableClusteringAtZoom': 6,
              'showCoverageOnHover': True,
          }
      ).add_to(m)
  for index, row in df.iterrows():
      folium.Marker(
          location=(row['latitude'], row['longitude']),
          popup=row['label'],
      ).add_to(markers)

  display(m)

def ca_county_map(query):
  """
  Plot California county data as a Choropleth. The function takes a query 
  that provides the following columns: 

    1. County (The name of the county)
    2. <Any Name> - The data to plot. 

  Example:

    select county as County, sum(total_population) as Population 
      from population where year = 2020 
      group by county;

  """
  m = re.search(r'%%sql\s+(\S+)', query)
  url = m.group(1)
  query = re.sub(r'%%sql\s+(\S+)', '', query)
  m = folium.Map(location=[36.9758708,-122.11752], zoom_start=7,)
  df = pd.read_sql_query(query, 'sqlite:///population.sqlite3')
  df['County'] = df['County'].str.title()
  folium.Choropleth(
      geo_data='59d92c1bf84a438d83f78465dce02c61_0.geojson',
      name="choropleth",
      data=df,
      columns=["County", df.columns[1]],
      key_on="feature.properties.CountyName",
      fill_color="YlGn",
      fill_opacity=0.7,
      line_opacity=0.2,
      legend_name=df.columns[1],
  ).add_to(m)
  folium.LayerControl().add_to(m)
  display(m)

## Question 1: California Population

Run the example query for the line plot and plot it.

In [None]:
%%sql sqlite:///population.sqlite3


In [None]:
lineplot(_i)

## Question 2: Multiple Lines 

Plot the population of Santa Cruz county and Monterey County by year using separate lines. 

In [None]:
%%sql sqlite:///population.sqlite3


In [None]:
lineplot(_i)

## Qestion 3: Bar Plot

Use the query example to show the population of Santa Cruz county by age in 2020. 

In [None]:
%%sql sqlite:///population.sqlite3


In [None]:
barplot(_i)

## Question 4: Demographics 

Update the previous plot to show Santa Cruz County's population by age where the population number is represented as a fraction of the total population. 

When you have the query working use the `union` function to overlay the same plot for all of California.

In [None]:
%%sql sqlite:///population.sqlite3


## Question 5: U.S. Airports 

Use the exmple query to show all of the airports in the United States.

In [None]:
%%sql sqlite:///flights.sqlite3


In [None]:
worldplot(_i)

## Question 6: Improved Labels 

Use update the column label in the last query to include the code for the airport as well as the city so that the pin labels show more useful information. 

**Notice anything weird??** 

In [None]:
%%sql sqlite:///flights.sqlite3


In [None]:
worldplot(_i)

## Question 7: California Visualized 

Use the example query to show a county map of California population in 2020.

In [None]:
%%sql sqlite:///population.sqlite3


In [None]:
ca_county_map(_i)

## Question 8: Oldest Counties 

Update the last query to color code counties by the percentage of the population that's age 65 and older.

In [None]:
%%sql sqlite:///population.sqlite3


In [None]:
ca_county_map(_i)