## Mining Color Names and their Hex-Codes

Using BeautifulSoup!

In [0]:
import re
import requests
from bs4 import BeautifulSoup

## Extract HTML Code

By navigating to the page with Selenium, the Javascript is executed and we can retrieve the generated HTML with `execute_script`, which executes our Javascript code.

In [0]:
colornames = requests.get('http://mkweb.bcgsc.ca/colornames/').content

# initialize BS to run on color_wall_webpage_html:
soup = BeautifulSoup(colornames, 'html.parser')

## Mining Color Patches with BeautifulSoup

Luckily, all patches are tagged with `class="patch"`, however, the color name is in the parent `div`, with class `swatch`.

In [0]:
# take only swatches with one children (they contain the colors):
color_swatches = filter(
    lambda s: len(s.findChildren()) == 1,
    soup.find_all('div', class_='swatch')
)

colors = {}
for swatch in color_swatches:
  # remove the child text from the parent:
  child = swatch.findChildren()[0]
  color_name = swatch.text[len(child.text):]
  # get the color hex-code from the child:
  color_code = swatch.findChildren()[0].get('style')
  
  if not color_code:
    continue

  colors[color_name] = color_code

In [0]:
colors

{'beer srm 01 rgb=254,231,153 #FEE799 lch=92,41,94 vis[8881][249,228,150](1.0)': 'background:#FEE799;',
 'beer srm 02 rgb=253,217,121 #FDD979 lch=88,51,89 la_luna[4899][255,216,122](1.0)': 'background:#FDD979;',
 'beer srm 03 rgb=253,203,90 #FDCB5A lch=84,62,84 golden_tainoi[3927][255,204,92](1.0)': 'background:#FDCB5A;',
 'beer srm 04 rgb=252,193,67 #FCC143 lch=81,69,82 PMS136[47][252,191,73](3.2)': 'background:#FCC143;',
 'beer srm 05 rgb=247,179,36 #F7B324 lch=77,76,80 filmpro_golden_yellow[3612][247,181,45](2.4)': 'background:#F7B324;',
 'beer srm 06 rgb=246,168,3 #F6A803 lch=75,80,77 sun[8121][251,172,19](1.4)': 'background:#F6A803;',
 'beer srm 07 rgb=238,158,3 #EE9E03 lch=71,78,75 orange[6199][238,154,0](3.2)': 'background:#EE9E03;',
 'beer srm 08 rgb=230,146,3 #E69203 lch=68,76,72 PMS130_2X[39][226,145,0](1.4)': 'background:#E69203;',
 'beer srm 09 rgb=227,138,4 #E38A04 lch=65,75,69 PMS144[60][226,140,5](1.7)': 'background:#E38A04;',
 'beer srm 10 rgb=217,126,3 #D97E03 lch=61,7

## Time To Clean!

Remove the `background:` from color values, lowercase them and also remove square brackets and weird colors altogether.

In [0]:
clean_colors = {}
for name, code in colors.items():
  name = name[:name.find('#')].strip()
  name = re.sub(r'[\[\]]', r'', name)

  if name.startswith('PMS'):
    continue  # skip weird colors

  if 'rgb=' in name:
    continue

  code = re.sub(
      r'background:\s?', 
      r'', 
      code.lower()
  )    

  if code.strip().startswith('#'):
    clean_colors[name] = code.strip().strip(';')

In [0]:
clean_colors

{'black 3': '#1e1e1e',
 'grey 105': '#e3e3e3',
 'merlot 3': '#73343a',
 'tundora 2': '#585452',
 'horoscope': '#43373a',
 'wine 2': '#722f37',
 'loyal': '#28191d',
 'arthouse': '#4d202e',
 'verve': '#5d2e3c',
 'extrovert': '#752642',
 'encore': '#54162c',
 'siren 2': '#7a013a',
 'black sheep': '#3e393a',
 'spitfire': '#381921',
 'black rose 2': '#532934',
 'tempo': '#462b31',
 'wine berry 2': '#591d35',
 'dark crimson 2': '#592734',
 'velocity': '#4b4042',
 'double caffeine': '#332729',
 'castro 2': '#44232f',
 'temptress 2': '#3b000b',
 'aubergine 3': '#3d0734',
 'red earth': '#462028',
 'ravishing': '#401a22',
 'deep raspberry': '#590021',
 'rendezvous': '#561426',
 'afficionado': '#453638',
 'double sidewinder': '#473f40',
 'jumpstart': '#3e2025',
 'chocolate 9': '#cd661d',
 'persian red 3': '#cc3333',
 'attitude': '#532b32',
 'jon 2': '#3b1f1f',
 'hot wired': '#6f2637',
 'mayhem': '#4d1b25',
 'burgundy 5': '#900020',
 'livid brown 2': '#312a29',
 'platypus': '#3d3334',
 'cab sav 2'

## Convert to CSV for portability

In [0]:
with open('mkweb-colornames.csv', 'w') as f:
  f.write(f'name,hex\n')
  for name, hexcode in clean_colors.items():
    f.write(f'{name},{hexcode}\n')