# PSYC 193: Perception and Computation 
## Lab 1: Working with image data and analyzing typicality ratings

In this lab, we will be working with an image dataset used in a recent computer vision paper by [Sangkloy et al.](https://dl.acm.org/doi/abs/10.1145/2897824.2925954). 

**Learning objectives**
* Learn the basics of working with image data
* Analyze human typicality rating data 

**Submission instructions**
1. Please rename the notebook by replacing `YOURUSERNAME` in the filename with your actual UCSD AD username. 
2. Before submitting your assignment, sure that your notebook can run from "top to bottom," executing the code in every code cell without returning fatal errors. An easy way to verify this is to click "Kernel" above in the tool bar, and try selecting "Restart & Run All."
3. Once you have verified that your notebook can run "top to bottom" without issues, click "File" in the toolbar above, then "Download as," then "PDF via LaTeX" to download a PDF version of your notebook. 
4. Upload this PDF version of your notebook to Canvas before 5pm the next class period. 

#### Getting started with jupyter notebooks
If you are relatively new to writing Python code in jupyter notebooks, it's recommended that you check out the User Interface tour. Click Help in the toolbar. 

### setup

In [None]:
## load generally useful python modules
import os
import numpy as np
import pandas as pd
from PIL import Image
import requests
from io import BytesIO
import seaborn as sns

import matplotlib.pyplot as plt
%matplotlib inline

### load in datasets

In [None]:
## import image metadata (from Sangkloy et al. (2016))
from photodraw32_metadata import metadata
M = pd.DataFrame(metadata)

In [None]:
## inspect what image metadata looks like using the pandas `head` function:
## see: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.head.html

### INSERT YOUR CODE HERE ####

**What do you think each row of this dataframe represents?**

*INSERT YOUR OWN RESPONSE HERE*

### explore dataset

**How many different images are there in this dataframe?**

*INSERT YOUR OWN RESPONSE HERE*

hint: try using the `shape` function

In [None]:
### INSERT YOUR CODE HERE ####


**How many different object categories are represented in the image dataset? (i.e., M dataframe)**

*INSERT YOUR OWN RESPONSE HERE*

hint: try using the `nunique` function

In [None]:
### INSERT YOUR CODE HERE ####


**How many different images per category are in this dataset?**

*INSERT YOUR OWN RESPONSE HERE*

hint: try using `groupby` and `count` or `value_counts`

In [None]:
### INSERT YOUR CODE HERE ####


### load and display a single image

**Here is sample code to display one of the "airplane" images in the dataset**

In [None]:
url = M['s3_url'].values[0] 
print('Example Image URL: {}'.format(url))
response = requests.get(url)
img = Image.open(BytesIO(response.content))
img

**Now display any one of the "lion" images in the dataset**

In [None]:
## INSERT YOUR CODE HERE ##

### Practice with image processing

**What are the dimensions of the example airplane image from above (i.e., width x height x num_channels)?**

hint: Try looking up how to get sizes of images using the Python Imaging Library (PIL)

In [None]:
## INSERT YOUR CODE HERE ##

**Convert image data to a NumPy array. What are its dimensions?**

*INSERT YOUR OWN RESPONSE HERE*

In [None]:
## INSERT YOUR CODE HERE ##

**Inspect the values in the array. What do these values represent? What is the largest value, and what is the smallest value in the image array?**

Is the range of values what you expected? Check out [this page on 8-bit color graphics](https://en.wikipedia.org/wiki/8-bit_color) to learn more about this way of storing information about images. 

In [None]:
## INSERT YOUR CODE HERE ##

**Crop the middle 100x100 pixels from the image and display it.**

hint: Try using the `crop` function from PIL. Use the information you extracted earlier about the width and height of the image to determine where the middle 100x100 pixels are.

In [None]:
## INSERT YOUR CODE HERE ##

### analyze distribution of ratings 

In [None]:
## import image typicality ratings (from an unpublished dataset)
T = pd.read_csv('photodraw32_ratings.csv')

In [None]:
## inspect the dataframe using the `head` function
## INSERT YOUR CODE HERE ##

#### Here is what the columns mean
* prolificID: anonymized participant identifier
* img_id: URL of image shown to participant
* category: category this image belongs to
* ratings: rating given by participants on 5-point scale from "Not typical at all" to "Extremely" typical
* enumerated_ratings: ratings converted to numeric scale ranging between -2 and +2

**How many ratings do we have per image?**

In [None]:
## INSERT YOUR CODE HERE ##

**What does the distribution of ratings look like overall, across all images and categories?**

hint: for a basic histogram, try using matplotlib `plt.hist()`. For another option, try using plotting functions from [seaborn](https://seaborn.pydata.org/tutorial/distributions.html).

In [None]:
## INSERT YOUR CODE HERE ##

**What does the distribution of average ratings look like for images within each category?**

hint: try using `FacetGrid` from [seaborn](https://seaborn.pydata.org/generated/seaborn.FacetGrid.html).

In [None]:
## INSERT YOUR CODE HERE ##

**How well do participants agree with one another on what rating to give to an image? Or, relatedly, how variable are the ratings given by different participants to the same image?**

hint: There isn't a single way to get at this question. Try to think about some metrics that might be useful, taking into account how often participants agree, and/or how large differences in ratings given between different participants are. 

In [None]:
## INSERT YOUR CODE HERE ##

**How much variation is there in typicality ratings between images within each category?**

hint: What are some metrics you know about to quantify variation?

In [None]:
## INSERT YOUR CODE HERE ##

**How much variation is there in typicality ratings between categories?**

In [None]:
## INSERT YOUR CODE HERE ##

**What are some examples of `lion` images that are rated as being highly typical? What about the least typical? Display them below** 

In [None]:
## INSERT YOUR CODE HERE ##

**Try to construct a 4 row x 8 column "image gallery" that displays all of the `lion` images from the most typical ones in top left to the least typical on the bottom right, where each image appears with the average rating it earned above it. Do the results make sense to you? Try to write your visualization code so that you can easily substitute a different category label other than `lion`.**

In [None]:
## INSERT YOUR CODE HERE ##

#### Now that you're a little bit more familiar with working with image data, how do you think you would evaluate the similarity between two different images? 

*INSERT YOUR OWN RESPONSE HERE*