## Mining Sherwin-William's Color Wall

Using Selenium, BeautifulSoup and a little bit of patience!

In [0]:
!pip install selenium
!apt-get update
!apt install chromium-chromedriver
!cp /usr/lib/chromium-browser/chromedriver /usr/bin

Collecting selenium
[?25l  Downloading https://files.pythonhosted.org/packages/80/d6/4294f0b4bce4de0abf13e17190289f9d0613b0a44e5dd6a7f5ca98459853/selenium-3.141.0-py2.py3-none-any.whl (904kB)
[K     |████████████████████████████████| 911kB 4.9MB/s 
Installing collected packages: selenium
Successfully installed selenium-3.141.0
Get:1 https://cloud.r-project.org/bin/linux/ubuntu bionic-cran35/ InRelease [3,626 B]
Ign:2 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64  InRelease
Ign:3 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64  InRelease
Get:4 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64  Release [564 B]
Get:5 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64  Release [564 B]
Get:6 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64  Release.gpg [819 B]
Get:7 https://developer.download.nvidia.com/compute/machine-learning/repos/ub

## Prepare Selenium

After downloading chromedriver from chromium, we set selenium to simulate a Chrome session. We need to do this because the color wall page of Sherwin-Williams is generated dynamically by Javascript. A simple `GET` request would just return the javascript code.

In [0]:
from selenium import webdriver
from bs4 import BeautifulSoup
import sys
import re

In [0]:
sys.path.insert(0, '/usr/lib/chromium-browser/chromedriver')

options = webdriver.ChromeOptions()
options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')

chrome = webdriver.Chrome(
    'chromedriver', 
    options=options
)

## Extract HTML Code

By navigating to the page with Selenium, the Javascript is executed and we can retrieve the generated HTML with `execute_script`, which executes our Javascript code.

In [0]:
color_wall_webpage = (
    'https://www.sherwin-williams.com/visualizer#/active/color-wall'
)

chrome.get(color_wall_webpage)
color_wall_webpage_html = chrome.execute_script(
    "return document.body.innerHTML"
)

# initialize BS to run on color_wall_webpage_html:
soup = BeautifulSoup(color_wall_webpage_html, 'html.parser')

## Retrieve Color Patches

Luckily, all color patches in the Color Wall have a `sw-color-chip` attribute set to `::color`.

In [0]:
color_patches = soup.find_all('div', {'sw-color-chip': "::color"})

# filter any potential noise (patch without a color title)
color_patches = list(filter(lambda p: p.get('title'), color_patches))
colors = {
    color_patch.get('title'): 
      re.search(  # retrieve exactly the rgb value
          r'[(][^)]+[)]', 
          color_patch.get('style')
      ).group(0)

    for color_patch in color_patches
}

In [0]:
colors

{'SW 6840 Exuberant Pink': '(181, 77, 127)',
 'SW 6855 Dragon Fruit': '(204, 97, 127)',
 'SW 6862 Cherries Jubilee': '(171, 60, 81)',
 'SW 6861 Radish': '(164, 46, 65)',
 'SW 6866 Heartthrob': '(168, 46, 51)',
 'SW 6868 Real Red': '(191, 45, 50)',
 'SW 6871 Positive Red': '(173, 44, 52)',
 'SW 6561 Teaberry': '(235, 209, 219)',
 'SW 6568 Lighthearted Pink': '(237, 213, 221)',
 'SW 6575 Priscilla': '(241, 211, 218)',
 'SW 6582 Impatiens Petal': '(241, 210, 215)',
 'SW 6589 Alyssum': '(242, 213, 215)',
 'SW 6596 Bella Pink': '(241, 198, 196)',
 'SW 6603 Oleander': '(242, 204, 197)',
 'SW 6562 Irresistible': '(227, 192, 207)',
 'SW 6569 Childlike': '(232, 192, 207)',
 'SW 6576 Azalea Flower': '(239, 192, 203)',
 'SW 6583 In the Pink': '(240, 188, 201)',
 'SW 6590 Loveable': '(240, 193, 198)',
 'SW 6597 Hopeful': '(240, 179, 178)',
 'SW 6604 Youthful Coral': '(240, 175, 168)',
 'SW 6563 Rosebay': '(203, 154, 173)',
 'SW 6570 Haute Pink': '(216, 153, 177)',
 'SW 6577 Jaipur Pink': '(227, 14

## Cleaning Up

As you can see the colors have the Sherwin-Williams initials (SW) and a number that prefixes their name. Also, we want to convert the RGBs to hexadecimal for convenience.

In [0]:
def rgb_to_hex(rgb):
  r, g, b = [int(_) for _ in rgb]
  return (
    "#" + 
    f'0{hex(r)[2:]}'[-2:] + 
    f'0{hex(g)[2:]}'[-2:] + 
    f'0{hex(b)[2:]}'[-2:]
  )

In [0]:
# clean the color names:
clean_colors = {
    re.sub(r'^SW[\d\s]+', r'', color_name):
      # remove parentheses and cast to array:
      rgb_to_hex(color_value[1:-1].split(','))

    for color_name, color_value in colors.items()
}

In [0]:
clean_colors

## Convert to CSV for portability

In [0]:
with open('sherwin-williams.csv', 'w') as f:
  f.write(f'name,hex\n')
  for name, hexcode in clean_colors.items():
    f.write(f'{name},{hexcode}\n')