DSC160 Data Science and the Arts - Twomey - Spring 2020 - [dsc160.roberttwomey.com](http://dsc160.roberttwomey.com)

# Exercise 1: Reading Image Archives (Web-Scraping and Basic Features)

This exercise takes you through the a coarse image-feature based analysis of a famous Abstract Expressionist painter, [Mark Rothko](https://www.biography.com/artist/mark-rothko). Technically, you will build a full workflow from image retrieval from an online archive -> calculation of image features -> visualization of results. Finally, it asks you to reproduce a similar result using a small image data set of your choice. 

The exercise is broken down into two parts:
- [Part 1](#Part-1:-Plotting-Rothko). A replication of an analysis by the Software Studies Initiative/Lev Manovich of Mark Rothko's paintings. 
- [Part 2](#Part-2:-Extension). The second section asks you to extend this work, applying the same methods to analyze an image set (n <= 100) of your choosing. 

Once you have completed both parts, you will submit your completed notebook as a pdf to gradescope for grading.

## Part 1: Plotting Rothko
(30 pts total)

[Mark Rothko](https://www.biography.com/artist/mark-rothko) is a celebrated Abstract Expressionist painter known for his large color field abstractions. Some historians describe a progression towards darker, less colorful compositions over the course of his life. Here, we will recreating plots similar to the plots below from the Software Studies Initiative, showing a distribution of color and brightness within his body of work.

![Rothko and Mondrian in mean brightness vs. saturation Style Space](https://live.staticflickr.com/6070/6074400716_c809d2d7a3_c_d.jpg)
*Data: 128 paintings by Piet Mondrian (1905-1917); 123 paintings by Mark Rothko (1938-1953).
Mapping: The two image plots are placed side by side. In each plot: X-axis: brightness mean; Y-axis: saturation mean.*

From [Mondrian vs Rothko: footprints and evolution in style space](http://lab.softwarestudies.com/2011/06/mondrian-vs-rothko-footprints-and.html)

### 1A. Retrieving Data from a Visual Archive
(5 points)

First you need to retreive images of Rothko's paintings from an online cultural archive. WikiArt has 163 of Rothko's paintings: [https://www.wikiart.org/en/mark-rothko](https://www.wikiart.org/en/mark-rothko). We will retrieve all of these images and store them locally. 

You can model your code on our example notebook for scraping images from WikiArt: [../examples/scrape-wikiart.ipynb](../examples/scrape-wikiart.ipynb)

In [None]:
# your code here

### 1B. Calculating Basic Image Features
(10 points)

This section presumes you have already scraped/downloaded your set of images (n of approx. 160). In this section you will iterate over your downloaded images and calculate a number of image statistics, saving the results in a pandas dataframe.

First, write a function `calc_stats()` that takes filename as an input and returns a list of image stats, including: 
  - image width (pixels)
  - image height (pixels)
  - mean hue
  - mean saturation
  - mean value (brightness)
  
(for examples of how to calculate basic image statistics, see [../examples/basic-image-stats.ipynb](../examples/basic-image-stats.ipynb))

In [None]:
# your code here

We want to calculate these stats for each of Rothko's paintings and store them in a pandas dataframe for plotting and analysis. Write code (using `calc_stats()` from above) to: 
- Iterate over Rothko's paintings
- Compute these values for each image
- Add to a dataframe
- And write to disk as a csv (`mark-rothko.csv`).

In [None]:
# your code here

### 1C. Plotting Results
(15 points)

For this section we will create some simple plots with matplot lib showing distributions of image stats (mean value, hue, saturation, and resolution). Then we will produce large bitmap plots similar to Manovich's work above.

(see example notebooks for plotting)

In [None]:
%matplotlib inline

__P1. Distribution of sizes__

First plot a histogram of image resolution using matplotlib and display inline.

In [None]:
# your code here

__P2-P4. Distribution of Mean Hue, Saturation, Value__

Next plot histograms of mean hue, saturation, and value, and dislpay inline below

In [None]:
# your code for mean hue histogram

In [None]:
# your code for mean saturation histogram

In [None]:
# your code for mean value histogram

__P5. Scatterplot with matplotlib (mean value vs. mean hue)__

Now produce a simple scatter plot of mean value against mean hue.

(see example notebook on plotting)

In [None]:
# your code for scatter plot of mean_value (X) against mean_hue (Y)

#### P6-P7. Produce Large Bitmap Figures illustrating your results

(see example notebook on producing large tiled image figures: [../examples/large_figures.ipynb](../examples/large_figures.ipynb))

In [None]:
# from skimage import io
from PIL import Image
import matplotlib.pyplot as plt

##### Step 1: Generate thumbnails from full-resolution scraped images

Write a `make_thumbnail()` function that takes a filename, imagepath, and thumbnail path as arguments

In [None]:
# your code here

Create a folder to store your thumbnails

In [None]:
# your code here

Iterate over your Rothko paintings and write thumbnails to disk

In [None]:
# your code here

##### Step 2: Create large plots on an empty bitmap canvas, placing thumbnails based on feature coordinates.

Make a folder to save result (`../data/mark-rothko/results`)

In [None]:
# your code here

Plot mean value vs. mean hue with image thumbnails on large bitmap

In [None]:
# your code here

Produce a second plot: mean value vs mean saturation

In [None]:
# your code here

Display the figures inline in this notebook

In [None]:
# your code here

## Part 2: Extension
(70 points total)

For this part, you will repeat the above image feature summary analysis (mean brightness, mean hue) using a dataset of your choice. Your data should have approximately n <= 100 images. Your output should be a similar tiled image as produced in the previous section, along with a short paragraph describing your results and why they are interesting.

 ### 2A. Scraping/downloading your new imagery
 (10 points)

In [None]:
# your code here

### 2B. Calculating image features
(10 points)

Model your features on the above exercise, or incorporate other stats (variance, edge count, etc.)

In [None]:
# your code here

### 2C. Produce and Display output plots (results)
(25 points)

Produce high resolution results images, and display them inline in the notebook

In [None]:
# your code here

### 2D. Describe your Results
(25 points)

Replace the contents of the markdown cell below with a two paragraph summary of your extension work.

```REPLACE HERE
Write one paragraph about where your images are from, how you gathered them, and why they are worth studying.
Write a second paragraph describing your results, how they relate to the Rothko analysis, and how you could build on this work.
TO HERE.```

## References

### Additional Cultural Archives:
* [The Getty](https://www.getty.edu/art/collection/) (The J. Paul Getty Museum, LA)
* [The Met Collection](https://www.metmuseum.org/art/collection) (Metropolitan Museum of Art, NYC)
* MoMA (Museum of Modern Art) online collection: [https://www.moma.org/collection/](https://www.moma.org/collection/)
  * Our evolving collection contains almost 200,000 works of modern and contemporary art. More than 85,000 works are currently available online.
* Metropolitan Museum of Art collection on Archive.org: [https://archive.org/details/metropolitanmuseumofart-gallery](https://archive.org/details/metropolitanmuseumofart-gallery?&sort=-downloads&page=2)
* [MoMA exhibition images](https://www.moma.org/collection/) (showing how paintings were installed)
   * read about it here [You Can Now Explore Every MoMA Exhibit Since 1929 for Free Online](https://mymodernmet.com/museum-of-modern-art-exhibition-history/?fbclid=IwAR3LkAPAXmDJ4C9zJn6ujfmhh2zNp6GJL9ysHTMgoKPS5ARp8jx3EklaIUk)
* [Paul Klee notebooks](http://www.kleegestaltungslehre.zpk.org/ee/ZPK/BF/2012/01/01/001/)
  - read about it [here](http://www.openculture.com/2016/03/3900-pages-of-paul-klees-personal-notebooks-are-now-online.html?fbclid=IwAR1_dGLxqy0YAiGuxJD2uTVUiyS0sSJuoX8iKuy_k01LWHbAYcbprNp4hd4)
