# Welcome to the Altair party! 🎈

We are going to learn to do quick viz while you're going through analysis in a jupyter notebook.

To demonstrate viz with Python in Altair we're using...

## bee colony losses 🐝💔

#savethebees

In [12]:
#import libraries

import pandas as pd
import altair as alt

Up first, we're going to read in our data. I've chosen bee colony losses from the [USDA](https://usda.library.cornell.edu/concern/publications/rn301137d?locale=en) that I found while exploring datasets in the [TidyTuesday Github](https://github.com/rfordatascience/tidytuesday/tree/master/data). 

Usually in notebooks we'll grab data from our local machine that we've downloaded, or from an API. Then we clean it and do some analysis. But this is clean, analyzed data for the sake of simplicity.

In [90]:
# bee colony losses from the USDA via TidyTuesday challenge

bees = pd.read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-01-11/colony.csv")

Handy tool: [Bee colony loss data dictionary](https://github.com/rfordatascience/tidytuesday/blob/master/data/2022/2022-01-11/readme.md)

In [91]:
bees.head()

Unnamed: 0,year,months,state,colony_n,colony_max,colony_lost,colony_lost_pct,colony_added,colony_reno,colony_reno_pct
0,2015,January-March,Alabama,7000.0,7000.0,1800.0,26.0,2800.0,250.0,4.0
1,2015,January-March,Arizona,35000.0,35000.0,4600.0,13.0,3400.0,2100.0,6.0
2,2015,January-March,Arkansas,13000.0,14000.0,1500.0,11.0,1200.0,90.0,1.0
3,2015,January-March,California,1440000.0,1690000.0,255000.0,15.0,250000.0,124000.0,7.0
4,2015,January-March,Colorado,3500.0,12500.0,1500.0,12.0,200.0,140.0,1.0


Let's start off with a line chart to picture trends in our data. We just want to see what we can expect the data to tell us, overall.

In [92]:
# group the original data by year
grouped_bees = bees.groupby(['year']).sum().reset_index()

# filter to only get the year and amount fo colonies lost
grouped_bees = grouped_bees[['year', 'colony_lost']]

grouped_bees.head()

Unnamed: 0,year,colony_lost
0,2015,3444720.0
1,2016,3316520.0
2,2017,2814400.0
3,2018,3034140.0
4,2019,2483820.0


In [93]:
alt.Chart(grouped_bees).mark_line(color='green').encode(
    alt.X('year:Q', title='Year'),
    alt.Y('colony_lost:Q', title='Number of colonies lost')
)

Let's bring back the original dataset.

In [94]:
bees.head()

Unnamed: 0,year,months,state,colony_n,colony_max,colony_lost,colony_lost_pct,colony_added,colony_reno,colony_reno_pct
0,2015,January-March,Alabama,7000.0,7000.0,1800.0,26.0,2800.0,250.0,4.0
1,2015,January-March,Arizona,35000.0,35000.0,4600.0,13.0,3400.0,2100.0,6.0
2,2015,January-March,Arkansas,13000.0,14000.0,1500.0,11.0,1200.0,90.0,1.0
3,2015,January-March,California,1440000.0,1690000.0,255000.0,15.0,250000.0,124000.0,7.0
4,2015,January-March,Colorado,3500.0,12500.0,1500.0,12.0,200.0,140.0,1.0


What if we group by state now, and plot the results as a bar chart?

In [95]:
# group original dataset by state
state_bees = bees.groupby(['state']).sum().reset_index()

# filter to only get the state and amount of colonies lost
state_bees = state_bees[['state', 'colony_lost']]

# drop the column for the US
state_bees = state_bees[state_bees.state != 'United States']

state_bees.head()

Unnamed: 0,state,colony_lost
0,Alabama,32710.0
1,Arizona,132300.0
2,Arkansas,84300.0
3,California,3456000.0
4,Colorado,94080.0


In [96]:
alt.Chart(state_bees).mark_bar(color='gold').encode(
    alt.Y('state:O', title='State'),
    alt.X('colony_lost:Q', title='Number of colonies lost')
)

# add sort='-x' to sort the chart