## Introduction
In workspaces like this one, you will be able to practice visualization techniques you've seen in the course materials. In this particular Jupyter Notebook, you'll practice creating single-variable plots for categorical data.

The cells where you are expected to contribute, are highlighted with **TO DO** markdown. 

In [None]:
# prerequisite package imports
import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
import seaborn as sns

# The `solutions_univ.py` is a Python file available in the Notebook server that contains solution to the TO DO tasks.
# The solution to each task is present in a separate function in the `solutions_univ.py` file. 
# Do not refer to the file untill you attempt to write code yourself. 
from solutions_univ import *

## About the Dataset
In this workspace, you'll be working with the dataset comprised of attributes of almost 54,000 diamonds. Characteristics such as size (`carat`), cut, color, and physical size (`x`, `y`, and `z` size in mm) are present.

This is a classic dataset which can be found online, such as kaggle (https://www.kaggle.com/datasets/shivam2503/diamonds).

In [None]:
df = pd.read_csv('data/diamonds.csv')
df.shape

In [None]:
df.head(4)

## Exercise 1: Bar charts

Create a count plot showing the distribution of diamond `cut`. Pick a color (e.g., blue) and make sure all bars show that color. Order the bars from lowest quality to highest quality.

In [None]:
# YOUR CODE HERE

### Expected Output: Exercise #1

Once you've created your chart, run the cell below to check the output from our solution. **Your visualization does not need to be exactly the same as ours, but it should be able to come up with the same conclusions.**

In [None]:
bar_chart_solution_1()

## Exercise 2: Bar charts

Visualize a _relative frequency plot_ of diamond `color`, and plot the results using either `matplotlib` or `seaborn`. The order should be from highest quality color grade to lowest (diamond color is graded on an alphabetical scale, with `D` the highest grade and `J` the lowest). Make sure the bar plot is plotted using a horizontal orientation.

In [None]:
# YOUR CODE HERE

### Expected Output: Exercise #2

In [None]:
bar_chart_solution_2()

## Exercise 3: Histograms

Using matplotlib, plot the diamond `x` size using matplotlib. Use `0.25` mm bins, from 0 to 11 mm. Feel free to experiment and see if you can find a better bin size!

In [None]:
# YOUR CODE HERE

### Expected Output: Exercise #3

In [None]:
histogram_solution_1()

## Exercise 4: Histograms

Now using seaborn, plot the diamond `x`, `y,` and `z` sizes. The same binning as before.

### Tip move from "wide" to "long" format
Seaborn was built to work with long, "tidy" format. Oftentimes then one may find it easier to convert data into long form. The pandas function `melt` does this conversion. For this exercise, you can use either format, but if you convert to long format, you only have to make one call to seaborn, instead of 3.

In [None]:
# use to bring your data into the "tidyverse"
df_tidy = pd.melt(df, value_vars=['x', 'y', 'z'])

In [None]:
# YOUR CODE HERE

### Expected Output: Exercise #4

In [None]:
histogram_solution_2_long_format()

In [None]:
histogram_solution_2_wide_format()

## Exercise 5: Histogram normalization

Using seaborn, plot the diamond price distribution. Using approximately 50 bins, what rough percentage of diamonds cost `~$2500`?

In [None]:
# YOUR CODE HERE

### Expected Output: Exercise #5

In [None]:
histogram_solution_3()

If you're interested in seeing the code used to generate the solution plots, you can find it in the `solutions_univ.py` script in the workspace folder. You can navigate there by clicking on the Jupyter icon in the upper left corner of the workspace. Spoiler warning: the script contains solutions for all of the workspace exercises in this lesson, so take care not to spoil your practice!