<a href="https://colab.research.google.com/github/harshithareddy0306/Smart-Tutor-AI-AI-Driven-Personalized-Teaching-Support/blob/main/Colors_by_Allison.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Colors By Allison

Thank you Allison Parrish

https://gist.github.com/aparrish/2f562e3737544cf29aaf1af30362f469

## Terms

A way of representing words as vectors in a multi-dimensional space, where the distance and direction between vectors reflect the similarity and relationships among the corresponding words.

https://www.ibm.com/topics/word-embeddings

### Word2Vec

Word2vec is a technique in natural language processing for obtaining vector representations of words. These vectors capture information about the meaning of the word based on the surrounding words. The word2vec algorithm estimates these representations by modeling text in a large corpus.

https://en.wikipedia.org/wiki/Word2vec

https://gist.github.com/aparrish/2f562e3737544cf29aaf1af30362f469

* Turning words into measurable / similar numbers
* Word embeddings
* Neural Networks, Backpropagation, ArgMax, SoftMax, and Cross Entropy

### Neural Network

A machine learning method that uses interconnected nodes or neurons to process data in a way that mimics the human brain

### Backpropagation

Works by testing for errors from the weights of a neuron and then going back and fine tuning those weights, ex. gradient descent

### Argmax

Returns the indices of the max value along an axis

### Softmax

A vector of probabilities for each possible outcome such as classification

### Cross Entropy

Measures the difference, entropy, between probabilities. Entropy is the number of bits required to transmit a randomly selected event from a probability distribution, a Monte Carlo method for importance sampling





### Similarity

* A neural network can be used to assign different values to the same word used in different contexts
* Activation functions and associated weights are used for each value assigned to a word
* Depending on context a word with the largest weight will be chosen
* Since a word has multiple values, a vector, it is possible to 'plot' the words thus providing a graphed similarity with other words

### CBOW

Uses context to predict a target word, a type of unsupervised learning, learning from unlabeled data

### Skip Gram

Predicts context words given a target word

### Negative Sampling

Aims to maximize the similarity of words in the same context and minimize when used in different contexts

## The Data

In [None]:
# source = https://gist.github.com/aparrish/2f562e3737544cf29aaf1af30362f469 by Allison Parrish
import urllib.request, json

url = 'https://raw.githubusercontent.com/gitmystuff/Datasets/main/xkcd.json'
with urllib.request.urlopen(url) as url:
    color_data = json.load(url)

type(color_data)

dict

In [None]:
color_data.keys()

dict_keys(['description', 'colors'])

In [None]:
print(color_data['description'])
print(color_data['colors'][0:5])

The 954 most common RGB monitor colors, as defined by several hundred thousand participants in the xkcd color name survey.
[{'color': 'darker blue', 'hex': '#011288'}, {'color': 'darker green', 'hex': '#087804'}, {'color': 'green again', 'hex': '#16d43f'}, {'color': 'darker purple', 'hex': '#5f1b6b'}, {'color': 'darker pink', 'hex': '#c4387f'}]


In [None]:
import math

def hex_to_int(s):
    h = s
    s = s.lstrip("#")
    return int(s[:2], 16), int(s[2:4], 16), int(s[4:6], 16), h

def distance(coord1, coord2):
    # note, this is VERY SLOW, don't use for actual code
    coord1 = coord1[:3]
    coord2 = coord2[:3]
    return math.sqrt(sum([(i - j)**2 for i, j in zip(coord1, coord2)]))

def subtractv(coord1, coord2):
    coord1 = coord1[:3]
    coord2 = coord2[:3]
    return [c1 - c2 for c1, c2 in zip(coord1, coord2)]

def addv(coord1, coord2):
    coord1 = coord1[:3]
    coord2 = coord2[:3]
    return [c1 + c2 for c1, c2 in zip(coord1, coord2)]

def meanv(coords):
    # assumes every item in coords has same length as item 0
    sumv = [0] * len(coords[0])
    for item in coords:
        for i in range(len(item)):
            sumv[i] += item[i]
    mean = [0] * len(sumv)
    for i in range(len(sumv)):
        mean[i] = float(sumv[i]) / len(coords)
    return mean

def closest(space, coord, n=5):
    coord = coord[:3]
    closest = []
    for key in sorted(space.keys(),
                        key=lambda x: distance(coord, space[x]))[:n]:
        closest.append(key)
    return closest

In [None]:
colors = dict()
for item in color_data['colors']:
    colors[item['color']] = hex_to_int(item['hex'])

print('olive', colors['olive'])
print('red', colors['red'])
print('black', colors['black'])

olive (110, 117, 14, '#6e750e')
red (229, 0, 0, '#e50000')
black (0, 0, 0, '#000000')


In [None]:
print('darker purple', colors['darker purple'])

darker purple (95, 27, 107, '#5f1b6b')


In [None]:
closest(colors, colors['red'])

['red', 'fire engine red', 'bright red', 'tomato red', 'cherry red']

In [None]:
# check version
import plotly
plotly.__version__

'5.15.0'

In [None]:
import numpy as np
import pandas as pd

def rgb_max(row):
  return row.index[np.argmax(row)]

df = pd.DataFrame(colors).transpose()
df.columns = ['red', 'green', 'blue', 'hex']
# df['group'] = df.apply(lambda x: df.columns[x.argmax()], axis = 1)
df['group'] = df[['red', 'green', 'blue']].apply(rgb_max, axis = 1)
df['color'] = df.index
df.head()

Unnamed: 0,red,green,blue,hex,group,color
darker blue,1,18,136,#011288,blue,darker blue
darker green,8,120,4,#087804,green,darker green
green again,22,212,63,#16d43f,green,green again
darker purple,95,27,107,#5f1b6b,blue,darker purple
darker pink,196,56,127,#c4387f,red,darker pink


In [None]:
# plot 3 dimensions had to remove legend till more is learnt
import plotly.express as px

fig = px.scatter_3d(df, x = 'red',
                    y = 'green',
                    z = 'blue',
                    hover_data = ['color', 'group'])

fig.update_traces(marker=dict(
    # size=12,
    # line=dict(width=2, color=df['hex'])),
    color=df['group']),
    selector=dict(mode='markers')
)
fig.update_layout(
    showlegend=False,
    scene = dict(
        xaxis = dict(title = ''),
        yaxis = dict(title = ''),
        zaxis = dict(title = '')
    )
)
fig.show()

In [None]:
# compare actual colors with distance?
import plotly.express as px

# color_map = {'red': 'red', 'green': 'green', 'blue': 'blue'}
fig = px.scatter_3d(df, x = 'red',
                    y = 'green',
                    z = 'blue',
                    hover_data = ['color', 'group'],
                    )

fig.update_traces(marker=dict(
    # size=12,
    # line=dict(width=2, color=df['hex'])),
    color=df['hex']),
    selector=dict(mode='markers'))
fig.update_layout(
    showlegend=False,
    scene = dict(
        xaxis = dict(title = ''),
        yaxis = dict(title = ''),
        zaxis = dict(title = '')
    )
)
fig.show()

In [None]:
# subtract colors
closest(colors, subtractv(colors['purple'], colors['red']))

['darker blue', 'cobalt blue', 'royal blue', 'darkish blue', 'true blue']

In [None]:
# add colors
closest(colors, addv(colors['blue'], colors['green']))

['bright turquoise', 'bright light blue', 'bright aqua', 'cyan', 'neon blue']

In [None]:
# the average of black and white: medium grey
closest(colors, meanv([colors['black'][:3], colors['white'][:3]]))

['medium grey', 'purple grey', 'steel grey', 'battleship grey', 'grey purple']

In [None]:
# an analogy: pink is to red as X is to blue
pink_to_red = subtractv(colors['pink'], colors['red'])
closest(colors, addv(pink_to_red, colors['blue']))

['neon blue', 'bright sky blue', 'bright light blue', 'cyan', 'bright cyan']

In [None]:
# another example:
navy_to_blue = subtractv(colors['navy'], colors['blue'])
closest(colors, addv(navy_to_blue, colors['green']))

['darker green',
 'true green',
 'dark grass green',
 'grassy green',
 'racing green']