Altair is a visualisation library for Python that is based on the JavaScript Vega-Lite library. Its output is actually HTML/JavaScript that is can be viewed in a Jupyter Notebook. Spyder doesn't have an HTML viewer - instead the output is send to your web browser. For this reason, it's best to you use JupyterLab.

Altair Tutorials can be found:
<ul>
    <li>Basic Tutorials: - <a href="https://altair-viz.github.io/getting_started/starting.html"> tutorial </a>, the altair documentation can also be found here. </li>
    <li>Web Tutorials: <a href="https://nextjournal.com/sdanisch/data-types-graphical-marks-and-visual-encoding-channels"> Data Types, Graphical Marks, and Visual Encoding Channels</a></li>
</ul>
 <b>Below</b> we have imported the libraries and data.
 
 Since this is my first notebook with markup cells, I have also added a <a href="https://jupyter-notebook.readthedocs.io/en/stable/examples/Notebook/Working%20With%20Markdown%20Cells.html">link</a> to a cheat sheet for juypter notebooks.

In [209]:
#Datasets 
from vega_datasets import data
import altair as alt

cars=data.cars()
weather=data.seattle_weather()
iris=data.iris()

In [391]:
#here is the dataset for cars visualised.
#Note the class is a pandas dataframe
cars.head()

Unnamed: 0,Name,Miles_per_Gallon,Cylinders,Displacement,Horsepower,Weight_in_lbs,Acceleration,Year,Origin
0,chevrolet chevelle malibu,18.0,8,307.0,130.0,3504,12.0,1970-01-01,USA
1,buick skylark 320,15.0,8,350.0,165.0,3693,11.5,1970-01-01,USA
2,plymouth satellite,18.0,8,318.0,150.0,3436,11.0,1970-01-01,USA
3,amc rebel sst,16.0,8,304.0,150.0,3433,12.0,1970-01-01,USA
4,ford torino,17.0,8,302.0,140.0,3449,10.5,1970-01-01,USA


In [392]:
print(len(cars)) #No of recs

406



Create a Chart class which creates chart objects in a functional way, allows you to add modifications and chained functions, to truly create personalised settings.
E.g. Scatter plot with Horsepower x var and mile per galon as y var and ordinal var as cylinders. The data has been grouped also by origin using the "column" property in the encode function as Origin #NOTE THIS IS an NOMINAL Data type#

<b>Shown below:</b>

In [135]:
scatter = alt.Chart(cars).mark_circle(filled=True).encode(
    x='Horsepower',
    y='Miles_per_Gallon',
    color= alt.Color('Cylinders:Q',scale=alt.Scale(range = ['yellow', 'red', 'purple']), legend=alt.Legend(orient="left")), #Altair.Color is a class, which can be specified.
    column='Origin:N'
)
scatter

<h>Facetting Data</h>
<p>Data can be facetted and repeated based on categoric variable. 
    Using the <font color="darkpink">.repeat()</font> function, example shown below. </p>

In [134]:
#Facetting data, 
alt.Chart(cars).mark_circle().encode(
    alt.X(alt.repeat("column"), type='quantitative'),
    alt.Y(alt.repeat("row"), type='quantitative'),
    color='Origin:N'
).properties(
    width=150,
    height=150
).repeat(
    row=['Horsepower', 'Acceleration', 'Miles_per_Gallon'],
    column=['Miles_per_Gallon', 'Acceleration', 'Horsepower']
)

In [183]:
scatter = alt.Chart(cars, width= 250).mark_square(filled=True).encode(
    alt.X('Horsepower:Q', scale=alt.Scale(zero=False),title='Horse Power'),   #Q is quantitative
    alt.Y('Miles_per_Gallon', scale= alt.Scale(zero=False)),
    color='Cylinders:Q',
    column='Origin:N'
).properties(title='Comparison of Horse Power with respect to Miles per Gallon') #Setting a title for the graph is done using properties.

scatter

In [181]:
#Pandas has to be used to preprocess data
USA_cars = cars.loc[cars['Origin']=='USA']

scatter_USA = alt.Chart(USA_cars, width= 250).mark_circle(filled=True).encode(
    alt.X('Horsepower:Q', scale=alt.Scale(zero=False),title='Horse Power'),   #Q is quantitative
    alt.Y('Miles_per_Gallon', scale= alt.Scale(zero=False)),
    color='Cylinders:O'
).properties(title='USA')

scatter_USA

In [131]:
hist = alt.Chart(cars).mark_bar().encode(
        alt.X('Year'),
        alt.Y('Displacement'),
        color='Cylinders:Q')

alt.vconcat(scatter,hist)

In [154]:
scatter1 = alt.Chart(cars).mark_point(filled=True).encode(
    x='Horsepower',
    y='Miles_per_Gallon',
    color='Cylinders:Q',
    column='Origin:N'    
).interactive()

scatter1


In [150]:
#Binned data example
alt.Chart(cars).mark_circle().encode(
    alt.X('Horsepower', bin=True),
    alt.Y('Miles_per_Gallon', bin=True),
    size='count()',
    color='average(Acceleration):Q'
).interactive()



In [192]:
#Capitalising the first letter in the weather dataframe, to make it look neater.
weather["weather"] = weather["weather"].str.capitalize()

#making a strip plot
alt.Chart(weather).mark_circle().encode(
    x = alt.X("weather:N", title="Weather"), y=alt.Y("precipitation:Q",title="Precipitation (mm)"),
    color= alt.Color("weather:N",legend=None, 
    scale= alt.Scale(range = ["lightblue","grey","blue","lightgrey","orange"]))).properties(width=400)


In [262]:
#Now trying to create a histogram
alt.Chart(weather).mark_bar().encode(
    alt.X("precipitation",bin=alt.Bin(maxbins=100), scale= alt.Scale(zero=False), title="Precipitation (mm)"), #Max bins is how much the records get split up by.
    y=alt.Y("count()",title="Frequency"))



In [266]:
alt.Chart(weather).mark_circle().encode(
    x = "date:T",
    y = "precipitation")

In [53]:
weather_scatter = alt.Chart(weather).mark_point().encode(
    x='date',
    y='precipitation',
    column='weather')
weather_scatter

In [323]:
import altair as alt
from vega_datasets import data

# Since the data is more than 5,000 rows we'll import it from a URL
source = data.zipcodes.url
states = alt.topo_feature(data.us_10m.url, feature='states')

# US states background
USA_map = alt.Chart(states).mark_geoshape(
    fill='darkgrey',
    stroke='white'
).properties(
    width=650,
    height=400
).project('albersUsa')

Zip_codes = alt.Chart(source).transform_calculate(
    "First Two Leading digits", alt.expr.substring(alt.datum.zip_code, 0, 2)
).mark_circle(size=3).encode(
    longitude='longitude:Q',
    latitude='latitude:Q',
    #Adding a colour by a field, and then you can set a color scheme, whether it is ordinal, nominal or quantitative
    color= alt.Color('First Two Leading digits:Q', scale = alt.Scale(scheme="yelloworangered")),
    #You can essentially rename the tooltip text like so
    tooltip=[alt.Tooltip('state:O',title='State'),alt.Tooltip('zip_code:N',title="Zip-Code"),alt.Tooltip('county:N',title="County")]
).project(
    type='albersUsa'
).properties(
    width=650,
    height=400,
    title="The First Two Leading Digits of USA Zip-Codes"
)
#To Lay them on top of each other, can be done like so.
USA_map + Zip_codes


<h2>Part 3- The Bikes Dataset</h2>

In [228]:
#Data found at this URL
import pandas as pd
bikes_data = pd.read_csv("http://staff.city.ac.uk/~sbbb717/tfl_bikes/last24h")
station_data = pd.read_csv("http://staff.city.ac.uk/~sbbb717/tfl_bikes/latest")

In [302]:
#SQL like inner join to get a new dataframe
bike_joined = bikes_data.merge(station_data, left_on="stationId", right_on="id",how="inner")


In [407]:
bike_joined.head()

Unnamed: 0,stationId,availableBikes,availableDocks,t,id,name,lat,long,updatedDate,numBikes,numEmptyDocks,installed,locked,installedDate
0,1,9,9,2020-10-05 19:30:01,1,"River Street , Clerkenwell",51.529163,-0.109971,2020-10-06 19:25:01,5,13,True,False,2010-07-12 16:08:00
1,1,9,9,2020-10-05 19:40:01,1,"River Street , Clerkenwell",51.529163,-0.109971,2020-10-06 19:25:01,5,13,True,False,2010-07-12 16:08:00
2,1,10,8,2020-10-05 19:50:02,1,"River Street , Clerkenwell",51.529163,-0.109971,2020-10-06 19:25:01,5,13,True,False,2010-07-12 16:08:00
3,1,10,8,2020-10-05 20:00:01,1,"River Street , Clerkenwell",51.529163,-0.109971,2020-10-06 19:25:01,5,13,True,False,2010-07-12 16:08:00
4,1,10,8,2020-10-05 20:10:01,1,"River Street , Clerkenwell",51.529163,-0.109971,2020-10-06 19:25:01,5,13,True,False,2010-07-12 16:08:00


In [409]:
#No of records is same as bikes_data
len(bike_joined)

113184

In [396]:
#we want the bike data at the latest time, where t is greatest.
bike_tmax = bike_joined[bike_joined.t == bike_joined.t.max() ]
bike_tmax

bike_tmax.head()
len(bike_tmax)


In [397]:
boroughs = alt.topo_feature(data.londonBoroughs.url, 'boroughs')

background = alt.Chart(boroughs).mark_geoshape(
    fill = "darkgrey",
    stroke='white',
    strokeWidth=2
).properties( #Dont need to encode if encode is empty.
    width=700,
    height=500
)
background

In [414]:
#A plot of all the bikes in london with number of bikes
bikes_London = alt.Chart(bike_tmax).mark_circle().encode(
    longitude='long:Q',
    latitude='lat:Q',
    tooltip="name:N",
    size = alt.Size("availableBikes", bin = alt.Bin(extent=[0,25],step=5), title="Num of Bikes Available" ) #scale=alt.Scale(bins=[1,5,10,15,20,25]))
    ).properties(width=700,
    height=500)
bikes_London

In [415]:
 background + bikes_London 

In [416]:
#Trying to add my location
dict_myloc = {"latitude":51.508033, "longitude":-0.106824}
df_myloc = pd.DataFrame(list(dict_myloc.items()),columns = ["longitude","latitude"])

my_location = alt.Chart(df_myloc).mark_circle(size=50).encode(
            latitude="latitude:Q",
            longitude="longitude:Q",
            color = alt.value("red")).properties(width=700, height=500)
