In [None]:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt

# special imports for computing mds and dendrograms
from representations import mds, plot_dendrogram

----

## Part A (1 point)

Louis Reasoner thinks that Alyssa's representations of the kinship data using MDS and the dendrogram is pretty cool. He decides to try modeling some different data that he has: similarity judgments of colors. Loading in the data, Louis sees that it has three keys:

In [None]:
color_data = np.load("data/color.npz")
list(color_data.keys())

The array `rgba` corresponds to RGBA (red-green-blue-alpha) color values:

In [None]:
rgba = color_data['rgba']
rgba

The array `wavelengths` are the nanometer wavelengths corresponding to the RGBA values:

In [None]:
wavelengths = color_data['wavelengths']
wavelengths

And finally, the array `similarities` are the similarity judgments:

In [None]:
similarities = color_data['similarities']
similarities

<div class="alert alert-success">Now that you know how MDS and hierarchical clustering work, try to think about how you would expect them to behave on similarity judgments of color. Do you think MDS would be better, or the dendrogram? Justify your answer, and make sure to describe what you think the better representation would look like.</div>

YOUR ANSWER HERE

---

## Part B (1 point)

Louis goes ahead and uses the `mds` function to compute the 2D spatial representation of the data:

In [None]:
color_points = mds(similarities)
color_points

<div class="alert alert-success">However, Louis wasn't paying very close attention to how Alyssa used MDS algorithm, and can't remember how to plot the data. Help Louis visualize the results of running MDS on the color data by plotting the 2D points `color_points`. Label the points with the wavelengths they correspond to using the `axis.text` function (like what you did for the music data in the previous problem). The points should be colored according to the color they correspond to, and should be set to size 100 so they are large enough to see easily. Also, don't forget to give your plot a title and scale your axes equally! Your solution can be done in 4 lines of code.</div>

_Important Note:_ You should use `axis.scatter` to create your plot, and you should only call it <em>once</em>. However, you can call `plt.text` multiple times. Do not use any other plotting functions.

In [None]:
# get a handle to an axis object, then close the plot
axis = plt.gca()
plt.close()

# look up documentation on axis.scatter
axis.scatter?

In [None]:
# create the figure
fig, axis = plt.subplots()

# YOUR CODE HERE
raise NotImplementedError()

<div class="alert alert-info">Note: If the colors you're seeing after implementing `plot_color_mds` don't seem quite right, don't despair! Read on for a hint on why this might be the case. </div>

In [None]:
"""Check that the color MDS data was correctly plotted."""
from numpy.testing import assert_array_equal
from nose.tools import assert_equal, assert_not_equal
from plotchecker import ScatterPlotChecker

# check that data hasn't changed
cd = np.load("data/color.npz")
assert_array_equal(rgba, cd['rgba'])
assert_array_equal(wavelengths, cd['wavelengths'])
assert_array_equal(similarities, cd['similarities'])
assert_array_equal(color_points, mds(similarities))

# check the correct data was plotted
pc = ScatterPlotChecker(axis)
pc.assert_x_data_equal(color_points[:, 0])
pc.assert_y_data_equal(color_points[:, 1])

# check that the correct size and colors were used
pc.assert_colors_equal(rgba)
pc.assert_sizes_equal(100)

# check that the wavelength labels are correct
pc.assert_textlabels_equal([str(x) for x in wavelengths])
pc.assert_textpoints_equal(color_points)

# check that a title was included
pc.assert_title_exists()

# check that dimensions are not distorted
assert axis.get_aspect() == 'equal'

print("Success!")

---

## Part C (0.5 points)

<div class="alert alert-success">After getting your help to write his code, Louis looks at the resulting plot, and is dismayed to see that the MDS representation of colors doesn't seem to make any sense. Louis goes to Alyssa for help. She takes a look at his code, and tells him that he ran the MDS algorithm on the wrong data. What is wrong with Louis' solution?</div>

YOUR ANSWER HERE

<div class="alert alert-success">Once you have figured out what is wrong with Louis' solution, copy and paste your plotting code from above into the following cell. Add a new line that re-runs the MDS algorithm, and plots those new points instead of the old points. Store the new points into a variable called `new_color_points`, and make sure you replace all instances of `color_points` with `new_color_points`. Your solution can be done in 5 lines of code.</div>

_Important Note:_ You should use `axis.scatter` to create your plot, and you should only call it <em>once</em>. However, you can call `plt.text` multiple times. Do not use any other plotting functions.

In [None]:
# create the figure
fig, axis = plt.subplots()

# YOUR CODE HERE
raise NotImplementedError()

After fixing the plot, does it more closely match your intuitions from Part A? (You do not need to write a response to this question, but you should think about the answer).

In [None]:
"""Check that the color MDS data was correctly plotted."""
from numpy.testing import assert_array_equal, assert_allclose
from nose.tools import assert_equal, assert_not_equal
from plotchecker import ScatterPlotChecker

# check that data hasn't changed
cd = np.load("data/color.npz")
assert_array_equal(rgba, cd['rgba'])
assert_array_equal(wavelengths, cd['wavelengths'])
assert_array_equal(similarities, cd['similarities'])
assert_equal(new_color_points.shape, (14, 2))
assert_allclose(new_color_points[0], np.array([-0.42851154, -0.17449531]))
assert_allclose(new_color_points[-1], np.array([-0.26443264,  0.48598972]))
# assert_allclose(new_color_points[0], np.array([-0.39802631, -0.18197825]))
# assert_allclose(new_color_points[-1], np.array([-0.29100923,  0.49663493]))

# check the correct data was plotted
pc = ScatterPlotChecker(axis)
pc.assert_x_data_equal(new_color_points[:, 0])
pc.assert_y_data_equal(new_color_points[:, 1])

# check that the correct size and colors were used
pc.assert_colors_equal(rgba)
pc.assert_sizes_equal(100)

# check that the wavelength labels are correct
pc.assert_textlabels_equal([str(x) for x in wavelengths])
pc.assert_textpoints_equal(new_color_points)

# check that a title was included
pc.assert_title_exists()

# check that dimensions are not distorted
assert axis.get_aspect() == 'equal'

print("Success!")

---

## Part D (0.5 points)

<div class="alert alert-success">Louis also wants to try looking at the data with a dendrogram, but again wasn't paying very close attention to what Alyssa did earlier. Help him produce the correct dendrogram plot. Use the same `plot_dendrogram` function that you used on the previous problem, and don't forget to set a title. You should use the wavelengths as your x-axis tick labels. Note that for this exercise, you should use the optional "colors" keyword argument to pass the rgb values to the dendrogram function (this is helpful for visualization). Your answer can be done in 2 lines of code.</div>

In [None]:
# create the figure
fig, axis = plt.subplots()

# YOUR CODE HERE
raise NotImplementedError()

Is this a good representation for the color data? Again, you do not need to write a response to this question, but you should think about the answer.

In [None]:
"""Check that the dendrogram function was correctly used for the kinship data."""
from numpy.testing import assert_array_equal
from nose.tools import assert_equal, assert_not_equal
from plotchecker import PlotChecker

# check that data hasn't changed
cd = np.load("data/color.npz")
assert_array_equal(rgba, cd['rgba'])
assert_array_equal(wavelengths, cd['wavelengths'])
assert_array_equal(similarities, cd['similarities'])

# check that a title was included
pc = PlotChecker(axis)
pc.assert_title_exists()

# check that the labels are correct
labels = sorted([int(x.get_text()) for x in axis.get_xticklabels()])
assert_array_equal(labels, wavelengths, "color labels are incorrect")

# check that the dissimilarities are correct
assert_array_equal(dissimilarities, 1 - similarities, "color dissimilarities are incorrect")

print("Success!")