![Callysto.ca Banner](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-top.jpg?raw=true)

# Choosing the right type of visualization for your data

Now that we have learned about a few different kinds of visualizations, it is time to apply what we have learned to a dataset. 

We will use a few Python-based commands to work with the pets dataset. This dataset contains information on 31 pets, the pet’s name, species, age, gender and time it took for the pets to be adopted, among other information. 

Let’s take a look at the first few rows.

Run the cell below to get the data.

In [None]:
#load "pandas" library under short name "pd"
import pandas as pd
# we have csv file of data related to hypothetical pets for adoption from https://www.bootstrapworld.org/materials/data-science/
url = "https://tinyurl.com/y9l6axtz/pets.csv"
#read csv file from url and save it as dataframe
pets = pd.read_csv(url)
#print data on the screen
pets.head()

Let’s manipulate the DataFrame to get counts for specific fields. We will use the pandas method “groupby”. This method will allow us to reformat the DataFrame to count the total number of individuals in each category. We will work with three different categories:

1. Gender
2. Species
3. Age (in years)

Run the code below to group by each category.

In [None]:
# Group by different Categories: Gender, Species, Age (years)
gender = pets.groupby("Gender").size().reset_index(name="Count")
species = pets.groupby("Species").size().reset_index(name="Count")
age = pets.groupby("Age (years)").size().reset_index(name="Count")

Once that is done, let’s see what the groupings look like. Run the code below to see the table. 

In [None]:
gender

In [None]:
species

In [None]:
age

We can then visualize the data in multiple ways using Plotly express.

In [None]:
import plotly.express as px

In [None]:
# Display the data in multiple ways

# Visualizing the Species table
fig1 = px.scatter(species,x="Species", y="Count",title='Species Scatter plot')
fig1.show()

fig2 = px.bar(species,x="Species",y="Count",title="Species Bar chart")
fig2.show()

fig3 = px.pie(species,values='Count', names='Species', title="Species Pie chart")
fig3.show()

### Exercise

Change the code below to visualize the `gender` and `age` data.

In [None]:
table_to_visualize = gender
x_value = "Gender"

fig1 = px.scatter(table_to_visualize,x=x_value, y="Count",title=str(x_value)+' Scatter plot')
fig1.show()

fig2 = px.bar(table_to_visualize,x=x_value,y="Count",title=str(x_value)+" Bar chart")
fig2.show()

fig3 = px.pie(table_to_visualize,values='Count', names=x_value, title=str(x_value)+" Pie chart")
fig3.show()

We see that some visualizations communicate the data better, that is, it is easier to see patterns and draw conclusions from the plot or chart. Note that sometimes there is no unique correct answer. 


Let’s take a look at two examples, one representing data more successfully than the other. Suppose you are interested in using a bar plot to see the relationship between age and how long it took for a pet to be adopted, and that you are interested in seeing this relationship for all species. The plot below is generated with the pets data we downloaded in the previous exercise. 

Run the code.

In [None]:
# Create bar plot
bar_pet = px.bar(pets,
           y="Time to Adoption (weeks)", 
           x="Age (years)",
           title="Age (in years) and Time to Adoption (weeks) for each pet")
bar_pet.show()

From the plot above it is not clear what the height represents nor what each of the categories are. Let’s add the missing items, and colour by species. We will also add the name of the pet. Run the code below. 

In [None]:
# Create coloured bar chart
bar_pet = px.bar(pets,
           y="Time to Adoption (weeks)", 
           x="Age (years)",
           title="Age (in years) and Time to Adoption (weeks) for each pet",
            color ="Species",text="Name")

bar_pet.show()

We see that after adding the relevant information it is clearer to see what is going on. While the bar chart allows us to see when each pet was adopted, this is not the best plot to display a relationship between two variables clearly. 
Let’s try a scatter plot, and categorize by species.


In [None]:
# Create scatter plot
scatter_pet = px.scatter(pets,
           y="Time to Adoption (weeks)", 
           x="Age (years)",
           title="Age (in years) and Time to Adoption (weeks) for each pet",
            color ="Species",size="Age (years)")

scatter_pet.show()

The plot above is a scatter showing the relationship between age (x axis) and how long it took for the pet to be adopted (y axis). This plot is more effective in conveying this relationship as wel can see younger pets are adopted sooner than older ones. 


The main take away from this is: just because we can create a plot, it does not mean that it is a good plot. When we work with visual representations of data, it is good to keep in mind the question we are answering and if the visualization is getting an idea across that is relevant to the question we are asking. 

[![Callysto.ca License](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-bottom.jpg?raw=true)](https://github.com/callysto/curriculum-notebooks/blob/master/LICENSE.md)