In [2]:
from dstaster import *

Run the cell below to load our dataset and to compute each painting's ratio.

In [3]:
# Load data
collection = pd.read_csv("../tate/paintings.csv", index_col=0)
collection['ratio'] =  collection['height'] / collection['width']
collection

Unnamed: 0,artist,title,year,groundtruth,height,width,ratio
T13896,John Constable,Salisbury Cathedral from the Meadows,1831,L,1537,1920,0.800521
T05010,Pablo Picasso,Weeping Woman,1937,O,608,500,1.216000
N05915,Pablo Picasso,Bust of a Woman,1909,P,727,600,1.211667
N00530,Joseph Mallord William Turner,Snow Storm - Steam-Boat off a Harbour’s Mouth,1842,L,914,1219,0.749795
T00598,Richard Dadd,The Fairy Feller’s Master-Stroke,1855,O,540,394,1.370558
...,...,...,...,...,...,...,...
N05609,Maurice Sterne,Mexican Church Interior,1934,O,1283,1022,1.255382
T14823,Unknown artist,Leon Trotsky,1980,P,510,480,1.062500
AL00397,Louise Bourgeois,Untitled,1946,O,660,1116,0.591398
T14824,Unknown artist,Leon Trotsky,1980,P,638,511,1.248532


Recall that our <b>ratio model</b> works by looking at the relation of a paintings height to its widths. If that ratio is larger than one, the painting is taller than wide and we suspect it might be a portrait. If the ratio is smaller than one, the painting is wider than tall and we suspect that it is a landscape.

In order to not mislabel paintings whose ratio is very closed to one, we decided on two <b>decision boundaries</b>. If the ratio is above the upper threshold (1.2), we label the painting as portrait (P). If the ratio is below the lower threshold (0.8), we label it as landscape (L). We label all paintings with ratios between 0.8 and 1.2 as other (O). As we saw, these thresholds seem to work fine but of course we might be able to find better ones.


<div class="task">
    <div class="no">1</div>
    <div class="text">
        The cell below lets you choose the upper and lower decision boundary and outputs a plot on the left and the corresponding confusion matrix on the right. What are, in your eyes, the best thresholds? Go back to FutureLearn and discuss with your peers. You can save the output of the plot below by right-clicking on it and share it with them.
    </div>
</div>

In [4]:
from ipywidgets import *
import ipywidgets as widgets

try:
    # Tests a) that the variable is defined and b) that it's not None
    if collection is None: 
        raise NameError
except NameError:
    error("<code>collection</code> undefined.",
          "Did you run the code cells above?")

if 'ratio' not in collection:
    error("'ratio' column not found", 
          "Did you run the code cells above?")
    

def plot_thresh(lower, upper):
    iport = collection['ratio'] >= upper
    iland = collection['ratio'] <= lower
    irest = (~iland) & (~iport)

    collection['ratio_model'] = 'O'
    collection.loc[iland,'ratio_model'] = 'L'
    collection.loc[iport,'ratio_model'] = 'P'

    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(8,4), dpi=120)

    ax1.set_xlabel('Width (mm)')
    ax1.set_ylabel('Height (mm)')

    ax1.set_xlim(200, 5000)
    ax1.set_ylim(200, 5000)

    ax1.set_xscale('log')
    ax1.set_yscale('log')

    for ix, name, col in zip([irest,iland,iport],['Other','Landscape','Portrait'],['lightgray','green','blue']):
        ax1.scatter(collection[ix]['width'], collection[ix]['height'], 
                   label=name, marker='+', c=colors[col], alpha=.75)

    ax1.plot([200,5000], [200*upper,5000*upper], color=colors['pink'], ls='--', lw=2 )
    ax1.plot([200,5000], [200*lower,5000*lower], color=colors['pink'], ls='--', lw=2 )

    # ax1.get_xaxis().get_major_formatter().labelOnlyBase = False
    ax1.get_xaxis().set_major_formatter(mpl.ticker.ScalarFormatter())
    ax1.get_yaxis().set_major_formatter(mpl.ticker.ScalarFormatter())
    ax1.set_xticks([250, 500,1000,2000,4000])
    ax1.set_yticks([250, 500,1000,2000,4000])

    ax1.spines['top'].set_visible(False)
    ax1.spines['right'].set_visible(False)

    # Reverse legend
    handles, labels = ax1.get_legend_handles_labels()
    ax1.legend(handles[::-1], labels[::-1], title='Model output')


    # Plot confusion matrix
    truth = collection['groundtruth']
    pred = collection['ratio_model']

    plot_confusion_matrix(truth, pred, 'LPO', ax2)

#     return fig, (ax1, ax2)

layout = {
    'width': '70%',
}

upper_slider = widgets.FloatSlider(
    value=1.2,
    min=1,
    max=1.3,
    step=0.025,
    description='Upper threshold:',
    continuous_update=False,
    readout_format='.3f',
    layout=layout
)

lower_slider = widgets.FloatSlider(
    value=.8,
    min=.7,
    max=1,
    step=0.025,
    description='Lower threshold:',
    continuous_update=False,
    readout_format='.3f',
    layout=layout
)

interact(plot_thresh, lower=lower_slider, upper=upper_slider)
pass


interactive(children=(FloatSlider(value=0.8, continuous_update=False, description='Lower threshold:', layout=L…