## Part 1: Altair interactive plots

Gapminder is a non-profit organization that promotes global sustainable development and seeks to bridge the gap between misconceptions and data-driven understanding. We are going to explore the a small subset of its data with the information of average income, health score and population for each country in the world. The region information is also provided. 


In [1]:
import altair as alt
import pandas as pd

gapminder = pd.read_csv('gapminder-health-income.csv')
gapminder.head()

Unnamed: 0,country,income,health,population,region
0,Afghanistan,1925,57.63,32526562,south_asia
1,Albania,10620,76.0,2896679,europe_central_asia
2,Algeria,13434,76.5,39666519,middle_east_north_africa
3,Andorra,46577,84.1,70473,europe_central_asia
4,Angola,7615,61.0,25021974,sub_saharan_africa


### Part 1.1 Add selection (5 points)

Make a scatter plot to show the relationship between average personal income and average health score. Please add tooltip to it to show the information about the country name and population. Also, allow the user to select a single country to highlight while all the other become light grey. 

In [2]:
selection = alt.selection_point()

alt.Chart(gapminder).mark_circle().add_selection(
    selection
).encode(
    x = 'income:Q',
    y = 'health:Q',
    color = alt.condition(selection, 'blue', alt.value('grey')),
    opacity = alt.condition(selection, alt.value(0.5), alt.value(0.1)),
    tooltip = ['country', 'population']
)



### Part 1.2 Customize the color (10 points)

Now choose a customized color map for the previous question. Explain how you choose the color map and apply it to the plot. 


In [3]:
selection = alt.selection_point()

alt.Chart(gapminder).mark_circle().add_selection(
    selection
).encode(
    x = 'income:Q',
    y = 'health:Q',
    color = alt.condition(selection, 'region:N', alt.value('grey')),
    opacity = alt.condition(selection, alt.value(0.5), alt.value(0.1)),
    tooltip = ['country', 'population']
)

I chose to make the color map on the region of the country that the data is taken in. Country was my first thought, but there are too many countries included to make it nominal. Region is perfect, because the countries are in different regions, and there's six of them that you can easily break the graph into. It also will show trends beyond countries that might have to do with the geographical locations of the countries.

### Part 1.3 Select across multiple panels (5 points)

Now add an interval selection such that the user can select over any income range, such that we can generate a second plot to show the relationship between income and population for the given range. 

In [4]:
selection = alt.selection_interval()

chart1 = alt.Chart(gapminder).mark_circle().add_selection(
    selection
).encode(
    x = 'income:Q',
    y = 'health:Q',
    tooltip = ['country', 'population'],
    color = alt.condition(selection, 'region:N', alt.value('grey')),
    opacity = alt.condition(selection, alt.value(0.5), alt.value(0.1))
)

chart2 = alt.Chart(gapminder).mark_circle().encode(
    x = 'income:Q',
    y = 'health:Q',
    color = 'region:N',
).transform_filter(
    selection
)

chart1 & chart2

### Part 1.4 Data binding (10 points)

Instead of the using the legend, now include a radio button or the region such that each selection only highlights one region and make the other points to be grey. 

In [5]:
options = [None, 'america', 'east_asia_pacific', 'europe_central_asia', 'middle_east_north_africa', 'south_asia', 'sub_saharan_africa']
labels = ['All', 'America', 'East Asia Pacific', 'Europe Central Asia', 'Middle East North Africa', 'South Asia', 'Sub Saharan Africa']

input_dropdown = alt.binding_radio(options = options,
                                   labels = labels,
                                   name = 'Region: ')

selection = alt.selection_point(fields = ['region'], bind = input_dropdown)

alt.Chart(gapminder).mark_circle().encode(
    x = 'income:Q',
    y = 'health:Q',
    color = alt.condition(selection, alt.Color('region:N').legend(None), alt.value('lightgrey')),
    tooltip = ['country', 'population'],
    opacity = alt.condition(selection, alt.value(0.5), alt.value(0.1))
).add_params(
    selection
)

### Part 1.5 Add filter with bars (10 points)

Add a slider bar such that for a given value on the bar, we only show the data such that the population of the country is less than the value. 

In [6]:
slider = alt.binding_range(min=50000, max=380000000, step=10000, name='Population')

selection = alt.selection_single(fields=['population'], bind=slider)

alt.Chart(gapminder).transform_filter(
    alt.datum.population < selection.population
).mark_circle().encode(
    x = 'income:Q',
    y = 'health:Q',
    color = 'region:N',
    tooltip = ['country', 'population']
).add_params(
    selection
)

#Note: this appears blank but once you use the slider the visualization appears!



## Part 2: D3 basic plots: 

In this question, we provide a CSV file with the penguin data. You need to make two plots by filling the given templete. When submit the homework, please submit both .html and .js. 

### Part 2.1 Scatter plot with groups (30 points)

Use the penguin data, make a scatter plot to show the relationship between flipper length and bill length. Use differnt color for each species and add a legend. 

Here is a general approch here: 

1. In the templete, we provide you a way to read the csv file. Once the data is read into d3. All the inputs are considered as strings. Therefore, the first thing we need to do is to convert the data to numeric type. The code is also provided.
2. Define the dimensions and margins for the SVG  and create the SVG canvas. (5 points)
3. Set up scales for x and y axes. Set the range of X and Y to be the range of bill length and flipper length plus 5 on each side. One example of the .min function is provided. Color scale is also provided. (5 points)
5. Add scales to the plot. (5 points)
6. Add circles for each data point (5 points)
7. Add x-axis and y-axis label. (5 points)
8. Add legend. Legend has two parts. The circle and the text. First, we need to set up a layout for the legend, and then add circle and text to this legend. (5 points)

### Part 2.2 Side-by-side boxplot (30 points)

Use the penguin data, make a side by side boxplot to show the distribution of flipper length across three species. To make things easier, we can ignore the outliers first. 

Here is a general approch here: 

1. First convert the strings into numeric data as we did in previous question. Setup the SVG canvas, scales and add the scales to the canvas and also add labels for the scales. (5 points) 
2. In order to make a boxplot, we need to calculate some basic metrics for the data. For each species, we need to calcualte the q1, median and q3. We first define a fundtion called `rollupFunction` to list all the variables we need to calculate. Follow the example for q1 to setup for median and q3, or any other values you need. (5 points)
3. Add comments for the following two lines (add in the .js file) to explain what those codes are doing. (5 points) 
    
    ```js
    const quartilesBySpecies = d3.rollup(data, rollupFunction, d => d.species);

    quartilesBySpecies.forEach((quartiles, species) => {
        const x = xScale(species);
        const boxWidth = xScale.bandwidth();
    ```
4. Inside the `.forEach` function, draw the boxes. There are three things you need to draw for the box plot: 
    - The vertical line in the middel from the q1-1.5 * IQR to q3+1.5 * IQR (5 points)
    - The rectangular shape from q1 to q3. You can add some color to hide the vertical line in the back.  (5 points)
    - The horizental line for median (5 points)


## Submission

Once you finish all the questions. Submit the jupyter notebook file for the Altair part, as well as the .html and .js for the D3 part to Gradescope. 