# Section 1: Understanding an Image

A computer represents an image as a 2D grid of **pixels** where each pixel is a single color.  Here is a tiny image **zoomed in at 50x** that shows the nine pixels:

![Color](sample-50x.png)

The actual image size of a 9 pixel image is really small (here it is): ![sample image](sample.png).  We will need to use Python to help us process large images!


## Section 1.1: Image Coordinate System

We can refer to any pixel in the image by it's coordinates -- the `x` and `y` position of each pixel.  The **top-left** pixel is (0, 0) and we increase `x` by going "right" and increase `y` by going "down".

- `(0, 0)` is the BLUE pixel in the top-left corner,
- `(1, 0)` is the middle WHITE pixel in the top row (since `x=1` moves to the right one),
- `(2, 0)` is the RED pixel in the top-right corner.

Moving down, the middle row has a `y` value of `1` fo all its pixels:

- `(0, 1)` is the GREEN pixel -- it's "over zero and down one",
- `(1, 1)` is the BLACK pixel in the center, and
- `(2, 1)` is the WHITE pixel in the middle-right of the image.

Finally, the last row:

- `(0, 2)` is the WHITE pixel in the bottom left,
- `(1, 2)` is the ILLINI ORANGE pixel,
- `(2, 2)` is the ILLINI BLUE pixel


## Section 1.2: The `DISCOVERY` library!

In DISCOVERY, we developed a library to help you with image data.  To use it:

- You must `import DISCOVERY` (all caps, because DISCOVERY!)
- Use `image = DISCOVERY.loadImage("sample.png")` to load the `sample.png` image into `image`

Try that out in the cell below:

In [145]:
import DISCOVERY
image = DISCOVERY.loadImage("sample.png")

To access any specific pixel, you need to "index into" the `image` with the `x` and `y` coordinate as `image[x][y]`.  For example:

- `image[0][0]` accesses the top-left pixel; in the `sample.png` image, it's the BLUE pixel
- `image[2][0]` accesses `x=2, y=0`; in the `sample.png` image, it's the RED pixel
- ...etc...

In the cell below, access the GREEN pixel and store it in the variable `green_pixel`:

In [146]:
green_pixel = image [0][1]
green_pixel

array([  0, 255,   0])

In [147]:
# == TEST CASE for Section 1.2 ==
# - Checks to see if `green_pixel` is storing a green pixel
DISCOVERY.run_test_case_1b(green_pixel)

✅ `green_pixel` looks like a pixel!
✅ `green_pixel` is a green pixel!
🎉 All tests passed! 🎉


<hr>

# Section 2: Accessing Color Data

Every color visible on a computer screen is made up of the three **primary colors of light** -- red, green, and blue.  Your monitor displays color by varying the intensity of the red light, green light, and blue light emitted for every pixel on your screen.  Since images are primarily displayed on computer screens, the **default color space is to represent colors as red, green, and blue**.

When you have a pixel, you will always see the three components.  For example, the contents of `green_pixel` is `[0, 255, 0]`:
- The first `0` means we have 0 / 255 (0%) **red** light
- The second `255` means we have 255 / 255 (100%) **green** light
- The final `0` means we have 0 / 255 (0%) **blue** light

## Section 2.1: Accessing Illini Orange

Using the `image` you loaded in Section 1, load the Illini Orange pixel into `illini_orange_pixel`:

In [148]:
illini_orange_pixel = image[1][2]
illini_orange_pixel

array([255,  85,  46])

Now we can access the red, green, and blue components by their value in the list.  Once you have a pixel:

- `pixel[0]` is the red color,
- `pixel[1]` is the green color, and
- `pixel[2]` is the blue color.

Find the red, green, and blue values of the `illini_orange_pixel`:

In [149]:
red = illini_orange_pixel[0]
red

255

In [150]:
green = illini_orange_pixel[1]
green

85

In [151]:
blue = illini_orange_pixel[2]
blue

46

## Section 2.2: Average Pixel Strategy

To create the mosaic, we will need to find the **average** pixel color of everyone one of your tile images.  You can do this either of two ways:
1. Using Python to find the average of all the individual pixels,
2. Creating a DataFrame to store all of the individual pixels and using `pandas` to find the average

Since you are an expert in working with DataFrames, we will go for Option #2!  However, you can try #1 if you are up for a challenge!


## Section 2.3: Finding the Height and Width

All of our images will be much larger than the 3x3 `sample.png` image -- and we don't know the size in advance -- so we need to find the `height` of an image.  Luckily, this is easy:

- `len(image)` will tell us how many columns are in the image (since the first index is the `x` value) -- this is our `width`
- `len(image[0])` will tell us how many pixels are in the first column (since images are always a rectangle) -- this is our `height`

Find the `height` and `width` of your `sample.png` image stored in `image` from Section 1:

In [152]:
width = len(image)
width

3

In [153]:
height = len(image[0])
height

3

## Section 2.4: Converting Pixel Data to a DataFrame - First Row

Once you have the height and width, you must **visit every pixel** to add it to a DataFrame.  This is **VERY SIMILAR** to doing a simulation, except that we want to visit different pixels to add them to the DataFrame instead of randomly generating data.

To start this, write a simulation except include three key changes:

1. Instead of simulating something 10,000 times, run your loop for only however many pixels are in the first row (is that `width` or `height`)?

2. Instead of using `for i in ...`, since we're changing the value of `x`, use `for x in ...` to be more descriptive.

3. Instead of simulating a random number, the three real world variables should be:
    * the **red pixel value as `r`**,
    * the **green pixel value as `g`**,
    * the **blue pixel value `b`**,
    * the **`x` value**, and
    * the **`y` value** (always `0` for now, but we will update it later)  
(You will probably have six lines of code in the top of your simulation; one to get the pixel followed by five to get `r`, `g`, `b`, `x`, and `y`).

Write your modified simulation code that will store **all of the pixels in the first row** of the image in `image`.
- Each row must have a value for `r`, `g`, `b`, `x`, and `y`.
- Store your DataFrame in the variable `df`.

In [176]:
import pandas as pd
data = []
for x in range (width):
    x = x
    y = 0
    r = image[x][y][0]
    g = image[x][y][1]
    b = image[x][y][2]
    d = {"r":r, "g":g, "b":b, "x":x, "y":y}
    data.append(d)
df = pd.DataFrame(data)


In [177]:
df

Unnamed: 0,r,g,b,x,y
0,0,0,255,0,0
1,255,255,255,1,0
2,255,0,0,2,0


In [178]:
# == TEST CASE for Section 2.4 ==
# - Checks to ensure the DataFrame contains the correct variables and pixel data
DISCOVERY.run_test_case_2d(df)

✅ `df` contains the correct number of observations.
✅ `df` contains the expected data!
🎉 All tests passed! 🎉


## Section 2.5: Converting Pixel Data to a DataFrame - All Rows

Awesome -- you've done it for one row, now just do it for all rows!

- Your existing code completely works for the first row!
- Now, you need to go through every row one-by-one.  You can add a second `for`-loop **inside** of your first `for`-loop for each row. This will visit all of the rows for `x=0`, then all of the rows for `x=1`, and so on...


### HINT: Double For-Loop

You should have two for-loops right next to each other -- one `x` loop and one `y` loop -- and **ALL** of the functionality for the loop should be inside BOTH of them.  (No code needs to be indented into just one for-loop, every line of code in your for-loop should be double-indented to be inside of the rows **and** columns loops.)


### Nerd Out!

Write your modified simulation code that will store **all of the pixels** of the image in `image`.
- Each row must have a value for `r`, `g`, `b`, `x`, and `y`.
- Store your DataFrame in the variable `df`.
- (This code will be very similar to Section 2.4 code.)

In [None]:
data = []
for x in range (0, width):
    for y in range (0, height):
        pixel = image[x][y]
        r = pixel[0]
        g = pixel[1]
        b = pixel[2]
        x = x
        y = y
        d = {"r":r, "g":g, "b":b, "x":x, "y":y}
        data.append(d)
df = pd.DataFrame(data)

In [None]:
df

Unnamed: 0,r,g,b,x,y
0,0,0,255,0,0
1,0,255,0,0,1
2,255,255,255,0,2
3,255,255,255,1,0
4,0,0,0,1,1
5,255,85,46,1,2
6,255,0,0,2,0
7,255,255,255,2,1
8,19,41,75,2,2


In [None]:
# == TEST CASE for Section 2.5 ==
# - Checks to ensure the DataFrame contains the correct variables and pixel data
DISCOVERY.run_test_case_2e(df)

✅ `df` contains the correct number of observations.
✅ `df` contains the expected data!
🎉 All tests passed! 🎉


<hr>

# Section 3: Creating an ImageToDataFrame Function

Now, put everything you've done together into a **function**.

- The `loadImageToDataFrame` function takes the name of a file as `fileName`.
- You must return a DataFrame that contains the image data.

*(You've already done this in the previous sections, you just need to put it inside of a function and return the DataFrame.)*

In [None]:
def loadImageToDataFrame(fileName):
  image = DISCOVERY.loadImage(fileName)
  data = []
  width = len(image)
  height = len(image[0])
  for x in range (0, width):
    for y in range (0, height):
      pixel = image[x][y]
      r = pixel[0]
      g = pixel[1]
      b = pixel[2]
      d = {"r":r, "g":g, "b":b, "x":x, "y":y}
      data.append(d)
  df = pd.DataFrame(data)
  return df

In [None]:
# == TEST CASE for Section 3 ==
# - Checks to ensure the DataFrame contains the correct variables and pixel data
DISCOVERY.run_test_case_3(loadImageToDataFrame)

✅ `df` looks good with "sample.png"!
✅ `df` looks good with "sample-50x.png"!
🎉 All tests passed! 🎉


<hr>

# Section 4: Find the Average Color of an Image

Given an image (see Section 1 and 2) to remind yourself of the format, find the **overall average color of the image** by finding the average red color, average green color, and average blue color.

Write this in the function `findAverageColor` below.  Since this is an image and not a DataFrame, you will need to sum up all the red values, green values, and blue values.  You must return the **average color as a dictionary** with the values:
- `avg_r`, for the average red color,
- `avg_g`, for the average green color,
- `avg_b`, for the average blue color

In [None]:
def findAverageImageColor(image):
  count = 0
  r_sum = 0
  b_sum = 0
  g_sum = 0 
  for x in range (len(image)):
    for y in range (len(image[0])):
      pixel = image[x][y]
      r_sum += pixel[0]
      b_sum += pixel[2]
      g_sum += pixel[1]
      count = len(image) * len(image[0])
  return {"avg_r": r_sum / count, "avg_b": b_sum / count, "avg_g": g_sum / count }

In [None]:
# == TEST CASE for Section 4 ==
# - Checks to ensure the DataFrame contains the correct variables and pixel data
DISCOVERY.run_test_case_4(findAverageImageColor)

✅ Dictionary contain the key `avg_r`.
✅ Dictionary contain the key `avg_g`.
✅ Dictionary contain the key `avg_b`.
✅ Looks good!
🎉 All tests passed! 🎉


<hr>

# Section 5: Finding the Average Color of Your Tile Images

Before beginning the programming, you should have set up a directory called `tiles` that contains all of your tile images.  If you haven't done that, you need to do that now.

To create an image mosaic, we need to find the average pixel color of every one of our tile images so that we can know the best tile image to use when we begin to mosaic our image.  The code below is already complete and does the following:

- Goes through each image file in your `tiles` directory,
- Finds the average pixel color of each image using your `loadImageToDataFrame` function from Section 3 and your `findAverageColor` function from Section 4,
- Finally, creates a new DataFrame `df_tiles` with the average color of each image and returns that DataFrame.

*Make sure to run this code -- we'll need it later!*

In [None]:
def createTilesDataFrame(path):
  data = []

  # Loop through all images in the `path` directory:
  for tileImageFileName in DISCOVERY.listTileImagesInPath(path):
    # Load the image as a DataFrame and find the average color:
    image = DISCOVERY.loadImage(tileImageFileName)
    averageColor = findAverageImageColor(image)

    # Store the fileName and average colors in a dictionary:
    d = { "fileName": tileImageFileName, "r": averageColor["avg_r"], "g": averageColor["avg_g"], "b": averageColor["avg_b"] }
    data.append(d)

  # Create the `df_tiles` DataFrame:
  df_tiles = pd.DataFrame(data)
  return df_tiles


<hr>

# Section 6: Splitting Up Your Base Image

To mosaic an image, we must split the base image into small regions to be replaced with the tile images.  To accomplish this, we need a function that will **find the average color of a small region of an image**.

- In our `sample.png` (from Section 1), we might need the color of just a 2x2 square from the 3x3 image.
- In `findAverageImageColor` you already found the average color of the full image.
- Now you just need to find the average image color of the subset of the image!


Create a function `findAverageImageColorInBox` that finds the average image color of a box region of the image starting at (`x`, `y`) and spanning `width` pixels wide and `height` pixels tall and return the average image color of that box.

- Example: `findAverageImageColorInBox(image, box_x=0, box_y=0, box_width=3, box_height=3)` must return the average colors of the pixels that have (`x=0` to `x=2`) and (`y=0` or `y=2`) since we start at `box_x=0`/`box_y=0` and need three pixels.

- Example: `findAverageImageColorInBox(image, box_x=5, box_y=5, box_width=5, box_height=5)` must return the average colors of the pixels that have (`x=5` to `x=9`) and (`y=5` to `y=9`) since we start at `box_x=5`/`box_y=5` and need five pixels.

- Example: `findAverageImageColorInBox(image, box_x=5, box_y=0, box_width=5, box_height=5)` must return the average colors of the pixels that have (`x=5` to `x=9`) and (`y=0` to `y=4`) since we start at `box_x=5` but `box_y=0`.

***HINT**: This can be a small modification on `findAverageImageColor` and there's a lot of ways to approach it!  Just like `findAverageImageColor`, you must return the **average color as a dictionary** with the values:
- `avg_r`, for the average red color,
- `avg_g`, for the average green color,
- `avg_b`, for the average blue color

In [None]:
def findAverageImageColorInBox(image, box_x, box_y, box_width, box_height):
  count = box_width * box_height
  r_sum = 0
  g_sum = 0
  b_sum = 0
  for x in range (box_x, box_x + box_width):
    for y in range (box_y, box_y + box_height):
      r_sum += image[x][y][0]
      g_sum += image[x][y][1]
      b_sum += image[x][y][2]
  return { "avg_r": r_sum / count, "avg_g": g_sum / count, "avg_b": b_sum / count}


In [None]:
# == TEST CASE for Section 6 ==
# - Checks to ensure the DataFrame contains the correct variables and pixel data
DISCOVERY.run_test_case_6(findAverageImageColorInBox)

✅ Test case with box_x = 0, box_y = 0, box_width = 2, box_height = 2 found returned the correct average color.
✅ Test case with box_x = 2, box_y = 0, box_width = 2, box_height = 2 found returned the correct average color.
✅ Test case with box_x = 2, box_y = 2, box_width = 2, box_height = 2 found returned the correct average color.
✅ Test case with box_x = 5, box_y = 1, box_width = 2, box_height = 2 found returned the correct average color.
✅ Test case with box_x = 5, box_y = 1, box_width = 4, box_height = 2 found returned the correct average color.
✅ Test case with box_x = 5, box_y = 1, box_width = 4, box_height = 3 found returned the correct average color.
🎉 All tests passed! 🎉


<hr>

# Section 7: Finding the Best Match

There's just **one last function**:
- You have a DataFrame of all your tile images averages (`df_tiles`),
- You can find a subset of a region of your image with your `findImageDataFrameSubset` function, **AND**
- You can find `avg_r`, `avg_g`, and `avg_b` for every region of your image with your `findAverageColor` function.

Your final function, `findBestTile`, needs to **find the best tile image from `df_tiles` for a given average color**.  You should do this by finding the row in `df_tiles` that has the smallest distance from the average color.

## Example 

Consider three simple tile images:

1. A red tile image with a color of (255, 0, 0)
2. A green tile image with a color of (0, 255, 0)
3. A blue tile image with a color of (0, 0, 255)

If your image region has `avg_r` = 10, `avg_g` = 200, and `avg_b` = 20, the distance formula tells us:

1. For the red tile (255, 0, 0), the distance away is $d = \sqrt{(255 - 10)^2 + (0 - 200)^2  + (0 - 20)^2} = 316.8990375$
2. For the green tile (0, 255, 0), the distance away is $d = \sqrt{(0 - 10)^2 + (255 - 200)^2  + (0 - 20)^2} = 59.37171044$
3. For the blue tile (0, 0, 255), the distance away is $d = \sqrt{(0 - 10)^2 + (0 - 200)^2  + (255 - 20)^2} = 308.7474696$

We find that the green tile is the closest since it has the minimum distance.  The green tile should be returned.


## Hints

- If you need to remember the format of `df_tiles`, you can find that in Section 5.
- You should add an extra value `df["dist"] = ...` to your DataFrame.
- Once you have the distance calculated, how do you return just the smallest row?

In [None]:
# Returns a single row for the title that is the best match given an r_avg, g_avg, and b_avg
def findBestTile(df_tiles, r_avg, g_avg, b_avg):
  df_tiles["dist"] = (((df_tiles["r"] - r_avg)**2) + ((df_tiles["g"] - g_avg)**2) + ((df_tiles["b"] - b_avg)**2)**1/2)
  min_distance = df_tiles.nsmallest(1, "dist")

  return min_distance

In [None]:
import DISCOVERY
# == TEST CASE for Section 7 ==
# - Checks to ensure the DataFrame contains the correct variables and pixel data
DISCOVERY.run_test_case_7(findBestTile)

✅ Test case #1 (r=0, g=0, b=0) passed!
✅ Test case #1 (r=47, g=49, b=38) passed!
✅ Test case #1 (r=54, g=49, b=38) passed!
✅ Test case #1 (r=54, g=49, b=52) passed!
✅ Test case #1 (r=-100, g=-100, b=-100) passed!
🎉 All tests passed! 🎉


<hr>

# Section 8: Your Mosaic!

Time to put everything together!

First, let's define some variables that you can configure to make your mosaic uniquely yours:

In [None]:
# What is your base image file name?
baseImageFileName = "base.jpg"

# What folder contains your tile images?
# - You can change this so you can have multiple different folders of tile images.
tileImageFolder = "tiles"

# What is the maximum number of tiles should your mosaic use across?
# - More tiles across will increase the quality of the final image.
# - More tiles across will cause your program to run slower.
# ...if you have bugs, start this value slow (it won't look great, but it will make it run fast!)
# ...a value around 200 usually looks quite good, but play around with this number!
maximumTilesX = 700

# What height should your tiles be in your mosaic?
# - A larger tile image will result in a larger output file.
# - A larger tile image will result in your program running slower.
# - A larger tile image will result in more detail in the output file.
tileHeight = 32

## Now create your mosaic!

Run the code to create your mosaic.

- This **WILL** take a bit of time (even more on slower/older laptops).
- This will run fastest if your laptop is plugged in (when it's unplugged, your laptop will try and save power and may not run at full speed).

In [None]:
print(f"Creating `df_tiles` from tile images in folder `{tileImageFolder}`...")
df_tiles = createTilesDataFrame(tileImageFolder)
print(f"...found {len(df_tiles)} tile images!")
df_tiles

Creating `df_tiles` from tile images in folder `tiles`...
...found 1805 tile images!


Unnamed: 0,fileName,r,g,b
0,tiles/10009374_1035844769820800_1616234408_n.png,121.041250,136.222344,139.622344
1,tiles/10349723_1514716235503681_1430105255_n.png,196.766234,203.713611,209.111823
2,tiles/10454218_890581817629033_1336316294_n.png,129.087813,102.122344,99.024375
3,tiles/10553974_1427040460927415_768194062_n.png,173.727812,162.035000,159.472813
4,tiles/10569992_836719873077634_1324728268_n.png,95.450312,74.631094,77.099844
...,...,...,...,...
1800,tiles/929215_352643904900366_812094193_n-9-995...,125.584594,124.698015,117.237240
1801,tiles/929215_352643904900366_812094193_n-9-996...,135.564404,125.313885,118.863920
1802,tiles/929215_352643904900366_812094193_n-9-997...,140.856821,104.608380,101.328428
1803,tiles/929215_352643904900366_812094193_n-9-998...,98.680156,67.394688,52.479062


In [None]:
print(f"Loading your base image `{baseImageFileName}`...")
baseImage = DISCOVERY.loadImage(baseImageFileName)
width = len(baseImage)
height = len(baseImage[0])


print(f"Finding best replacement image for each tile...")
# Find the pixelsPerTile to know the pixels used in the base image per mosaic tile:
import math

pixelsPerTile = int(math.ceil(width / maximumTilesX))
width = int(math.floor(width / pixelsPerTile) * pixelsPerTile)
height = int(math.floor(height / pixelsPerTile) * pixelsPerTile)
tilesX = int(width / pixelsPerTile)
tilesY = int(height / pixelsPerTile)

# Create the mosaic:
from PIL import Image
import sys
mosaic = Image.new('RGB', (int(tilesX * tileHeight), int(tilesY * tileHeight)))
for x in range(0, width, pixelsPerTile):
  for y in range(0, height, pixelsPerTile):
    avg_color = findAverageImageColorInBox(baseImage, x, y, pixelsPerTile, pixelsPerTile)
    replacement = findBestTile(df_tiles, avg_color["avg_r"], avg_color["avg_g"], avg_color["avg_b"])

    tile = DISCOVERY.getTileImage(replacement["fileName"].values[0], tileHeight)
    mosaic.paste(tile, (int(x / pixelsPerTile) * tileHeight, int(y / pixelsPerTile) * tileHeight))

  # Print out a progress message:
  curRow = int((x / pixelsPerTile) + 1)
  pct = (curRow / tilesX) * 100
  sys.stdout.write(f'\r  ...progress: {curRow * tilesY} / {tilesX * tilesY} ({pct:.2f}%)')

# Save it
mosaic.save('mosaic-hd.jpg')

# Save a smaller one (for posting):
import PIL
d = max(width, height)
factor = d / 4000
if factor <= 1: factor = 1

small_w = width / factor
small_h = height / factor    
baseImage = mosaic.resize( (int(small_w), int(small_h)), resample=PIL.Image.LANCZOS )
baseImage.save('mosaic-web.jpg')

# Print a message:
tada = "\N{PARTY POPPER}"
print("")
print("")
print(f"{tada} MOSAIC COMPLETE! {tada}")
print("- See `mosaic-hq.jpg` to see your HQ moasic! (The file may be HUGE.)")
print("- See `mosaic-web.jpg` to see a moasic best suited for the web (still big, but not HUGE)!")

Loading your base image `base.jpg`...
Finding best replacement image for each tile...
  ...progress: 338688 / 338688 (100.00%)

🎉 MOSAIC COMPLETE! 🎉
- See `mosaic-hq.jpg` to see your HQ moasic! (The file may be HUGE.)
- See `mosaic-web.jpg` to see a moasic best suited for the web (still big, but not HUGE)!


<hr>

# Section 9: Extra Credit

So your mosaic is fantastic -- but I think you can make it can be **even MORE fantastic**!  If you have ideas of how to improve your mosaic, use the following cells to re-program a function or otherwise change your logic and then re-create your mosaic.

If you're not sure and want some inspiration, visit **#project-extra-credit** on the DISCOVERY Discord.  We'll work together on ideas and look at the **pinned messages** on the channel for suggestions that we find to be really good ways of improving the mosaic!  (We have a few in mind as we write this, but there's probably even more!  You should aim to have fun!)

In [None]:
# Use these cells to add your extra credit code.
# ...and feel free to add more cells! :)

<hr>


# Section 10: Submission

Make sure to turn in your project:

```
git add -A
git commit -m "project1"
git push
```