<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Hello-splatters" data-toc-modified-id="Hello-splatters-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Hello splatters</a></span><ul class="toc-item"><li><span><a href="#splatter_raw" data-toc-modified-id="splatter_raw-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>splatter_raw</a></span></li></ul></li><li><span><a href="#splatter:-An-interface-that-does-more-for-you" data-toc-modified-id="splatter:-An-interface-that-does-more-for-you-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>splatter: An interface that does more for you</a></span><ul class="toc-item"><li><span><a href="#More-data-input-formats" data-toc-modified-id="More-data-input-formats-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>More data input formats</a></span></li><li><span><a href="#Lists-of-lists" data-toc-modified-id="Lists-of-lists-2.2"><span class="toc-item-num">2.2&nbsp;&nbsp;</span>Lists of lists</a></span></li><li><span><a href="#tagged-lists" data-toc-modified-id="tagged-lists-2.3"><span class="toc-item-num">2.3&nbsp;&nbsp;</span>tagged lists</a></span></li><li><span><a href="#Figsize" data-toc-modified-id="Figsize-2.4"><span class="toc-item-num">2.4&nbsp;&nbsp;</span>Figsize</a></span></li></ul></li><li><span><a href="#nodeSize" data-toc-modified-id="nodeSize-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>nodeSize</a></span></li><li><span><a href="#alpha" data-toc-modified-id="alpha-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>alpha</a></span></li><li><span><a href="#Color" data-toc-modified-id="Color-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>Color</a></span><ul class="toc-item"><li><span><a href="#Specifying-color" data-toc-modified-id="Specifying-color-5.1"><span class="toc-item-num">5.1&nbsp;&nbsp;</span>Specifying color</a></span></li><li><span><a href="#Giving-tags-color" data-toc-modified-id="Giving-tags-color-5.2"><span class="toc-item-num">5.2&nbsp;&nbsp;</span>Giving tags color</a></span></li></ul></li><li><span><a href="#More" data-toc-modified-id="More-6"><span class="toc-item-num">6&nbsp;&nbsp;</span>More</a></span><ul class="toc-item"><li><span><a href="#Great-color-splatters" data-toc-modified-id="Great-color-splatters-6.1"><span class="toc-item-num">6.1&nbsp;&nbsp;</span>Great color splatters</a></span></li><li><span><a href="#Splatter-args" data-toc-modified-id="Splatter-args-6.2"><span class="toc-item-num">6.2&nbsp;&nbsp;</span>Splatter args</a></span></li></ul></li></ul></div>

# Hello splatters

A splatter is way to visualize and interact with multi-dimensional, possibly tagged, data. 

For those who like fancy-pants terms, know that it's a t-distributed stochastic neighbor embedding (t-SNE) happening in front of your eyes. 

Highlights:

You can splatter:
- a list of dicts (must have an 'fv', and optionally a 'tag')
- a `[fv,...]` list of fvs (themselves lists)
- a `{tag: fv_list, ...}` mapping

You can specify color
- with hex codes
- with a list of color names and short-hands
- specify colors to pick from
- specify a `{tag: color, ...}` mapping

You can specify size
- figsize, as `(height, weight)` or just one number (`height=weight`). Unit is pixels.
- nodeSize, as pixels (of radius of circle) or as proportion of the figure all the points should cover

You can also specify parameters of the t-SNE algorithm itself. 
For help and experimentation on how to do that, here's a 
[nice resource](https://distill.pub/2016/misread-tsne/).

## splatter_raw

Let's have a look at a minimal example containing five tagged points

In [1]:
from oui.splatter import splatter_raw

pts = [
    {'fv': [1, 2, 3], 'tag': 'foo'},
    {'fv': [1, 2, 4], 'tag': 'foo'},
    {'fv': [1, 3, 3], 'tag': 'foo'},
    {'fv': [2, 10, 11], 'tag': 'bar'},
    {'fv': [2, 10, 12], 'tag': 'bar'}]
splatter_raw(pts)

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

`fv` stands for "feature vector", which usually encodes some characteristics of items of interest.
When we're lazy, we'll refer to the set of `fv`s as "the data", or "the points".

You see that though the points (see?) lie in a 3-dimensional space, we've managed to squeeze them into 2-dimensions for your visual enjoyment. 

When you do such ungodly thing, you're bound to loose something of the original relationships, but the splatter tries to keep similar items as close to each other: That is, if two items were close by (i.e. similar) in the original space, the algorithm will do it's best so that you see them being close in the splatter. It will not, on the other hand, make such efforts for points that are further apart.

The `tag` of a point is optional. See though, that points that have the same tag will have the same color. And if you don't specify a `tag` at all, the point will take on the `untaggedColor` (by default, `'#444'`). 

In [2]:
pts = [
    {'fv': [1, 2, 3], 'tag': 'foo'},
    {'fv': [1, 2, 4]},
    {'fv': [1, 3, 3], },
    {'fv': [2, 10, 11], 'tag': 'bar'},
    {'fv': [2, 10, 12]}]
splatter_raw(pts, untaggedColor='#111', nodeSize=3)

<IPython.core.display.Javascript object>

`splatter_raw` is the python layer around the `Javascript` code that doesn't do much more than forward the work. See the signature below for what controls you have.

In [3]:
from inspect import signature
signature(splatter_raw)

<Signature (pts, nodeSize=1, height=200, width=200, untaggedColor='#444', maxIterations=240, fps=60, fillColors=['#ff0000', '#00ffe6', '#ffc300', '#8c00ff', '#ff5500', '#0048ff', '#3acc00', '#ff00c8', '#fc8383', '#1fad8c', '#bbf53d', '#b96ef7', '#bf6a40', '#0d7cf2', '#6ef777', '#ff6699', '#a30000', '#004d45', '#a5750d', '#460080', '#802b00', '#000680', '#1d6600', '#660050'], dim=2, epsilon=50, perplexity=30, spread=10)>

See that you can make the figure box and nodes (points) bigger. Units are in pixels...

In [5]:
import numpy as np
from oui.splatter import _splatter

pts = [{'fv': fv.tolist()} for fv in np.random.rand(100, 3)]
splatter_raw(pts, nodeSize=2, height=300, width=200)

<IPython.core.display.Javascript object>

Know, in case it ever matters, that even splatter_raw is a thin layer over `_splatter(pts, options)`, which is the actual one forwarding to JS. We stuck `spatter_raw` on top so that we could give details of the `options` arguments and do some validation. 

In [16]:
from oui.splatter import _splatter
_splatter(pts=pts, options=dict(nodeSize=2, height=300, width=200))

<IPython.core.display.Javascript object>

See [issue](https://github.com/otosense/oui/issues/7) about bounding boxes.

# splatter: An interface that does more for you

Above `splatter_raw` is a convenience function called `splatter`. It's the one you'll use more of the time since it does more for you in the way of handling different data formats and preparing the data for you.

In [19]:
import numpy as np
from inspect import signature
from oui.splatter import splatter

signature(splatter)

<Signature (pts, nodeSize=0.02, figsize=(200, 200), fillColors=None, untaggedColor='#444', alpha=1, process_pts=<function process_pts at 0x117674940>, process_viz_args=<function process_viz_args at 0x117674a60>, **extra_splatter_kwargs)>

Let's try some stuff out. 

## More data input formats

## Lists of lists

You know how you had to do this:
```python
pts = [{'fv': fv.tolist()} for fv in np.random.rand(100, 3)]
```
to get `pts` in the format accepted by `splatter_raw`. 

Well, now you don't have to.

In [20]:
pts = np.random.rand(100, 3)
splatter(pts)

<IPython.core.display.Javascript object>

## tagged lists

Sometimes you have your data already grouped by tag. It's okay, keep it that way, we'll unravel it before we give it to splatter.

In [21]:
pts = {
    'foo': [[1, 1, 1]] * 10,
    'bar': [[1, 10, 1]] * 20,
    '': [[5, 10, 10]] * 15  # just include an empty tag to denote "untagged"
}
splatter(pts)

<IPython.core.display.Javascript object>

## Figsize

Check the following three splaters out.

In [22]:
pts = np.random.rand(100, 3)

In [23]:
splatter(pts, figsize=100)

<IPython.core.display.Javascript object>

In [24]:
splatter(pts, figsize=250)

<IPython.core.display.Javascript object>

In [25]:
splatter(pts, figsize=(150, 80))

<IPython.core.display.Javascript object>

You'll notice two things here:
- You only needed to specify one number for the figsize and it will interpret it as the dimensions (in pixels) of a square. But you can still use the `(height, width)` way of expressing the figsize, just as a pair (tuple or list) instead of two separate arguments.
- The nodes are bigger when the bounding box is bigger. 

# nodeSize

About that last point: In fact, it's the proportion of surface the nodes occupy relative to the surface of the box that is conserved, and that ratio is specified by `nodeSize`. See what happens when we make `nodeSize` bigger. 

In [26]:
splatter(pts, nodeSize=0.06, figsize=250)

<IPython.core.display.Javascript object>

But know that you can still use `nodeSize` in the pixels unit as JS does. `splatter` will interpret your `nodeSize` as pixels instead of "proportion of the box's surface" as soon as `nodeSize > 0.2`. 

In [27]:
splatter(pts, nodeSize=0.19, figsize=150)

<IPython.core.display.Javascript object>

In [28]:
splatter(pts, nodeSize=0.2, figsize=150)

<IPython.core.display.Javascript object>

# alpha

You can specify the alpha (think "inverse of transparency"). 
An alpha can be expressed as a number between `0` (invisible) 
and `1` (not transparent at all). 

In [29]:
splatter(pts, nodeSize=0.1, figsize=150, alpha=0.2) 

<IPython.core.display.Javascript object>

You usually want to apply an alpha when you have a lot of points so that you can 
see density when they overlap. 
You can also specify the alpha of individual colors, directly in their hex code, 
but we'll leave you figure that out. 
Here, the useful tool is to be able to apply an alpha ratio globally. 

Note: It will only take effect for those colors that don't already have an explicit 
alpha in their hex code.

# Color

## Specifying color

First, know that color has to be specified in hex form. 
Like... `'#ADD8E6'` is light blue. 
There's a plethora of online tools to get the hex of your color of choice. 
The first one I found googling is: https://htmlcolorcodes.com/color-picker/

Alternatively, you can use our little hex_color tool:

In [30]:
from oui.color_util import hex_color

`hex_color` is a collection (meaning you can do things like `list(hex_color)`:

In [31]:
print(*hex_color, sep='\t')

b	g	r	c	m	y	k	w	f	t	i	s	o	p	l	a	d	n	v	h	maroon	dark_red	brown	firebrick	crimson	red	tomato	coral	indian_red	light_coral	dark_salmon	salmon	light_salmon	orange_red	dark_orange	orange	gold	dark_golden_rod	golden_rod	pale_golden_rod	dark_khaki	khaki	olive	yellow	yellow_green	dark_olive_green	olive_drab	lawn_green	chart_reuse	green_yellow	dark_green	green	forest_green	lime	lime_green	light_green	pale_green	dark_sea_green	medium_spring_green	spring_green	sea_green	medium_aqua_marine	medium_sea_green	light_sea_green	dark_slate_gray	teal	dark_cyan	aqua	cyan	light_cyan	dark_turquoise	turquoise	medium_turquoise	pale_turquoise	aqua_marine	powder_blue	cadet_blue	steel_blue	corn_flower_blue	deep_sky_blue	dodger_blue	light_blue	sky_blue	light_sky_blue	midnight_blue	navy	dark_blue	medium_blue	blue	royal_blue	blue_violet	indigo	dark_slate_blue	slate_blue	medium_slate_blue	medium_purple	dark_magenta	dark_violet	dark_orchid	medium_orchid	purple	thistle	plum	violet	magenta	fuchsia	orchid	medium_violet_r

`hex_color`'s attribute names are these above color names (and short-cuts thereof), and the corresponding attribute value is... the hex for that color. Lucky you.

In [32]:
hex_color.blue

'#0000FF'

In [33]:
hex_color.b

'#0000FF'

In [34]:
hex_color.light_blue

'#ADD8E6'

## Giving tags color

`fillColors` is where you can specify a list of colors that will be used to color the points according to their tag. The list is traversed in order, and assigned to each new unique tag encountered in `pts`, in order. 

Of course, `filleColors` falls back on a default

In [35]:
pts = {
    'use': [[1, 1, 1], [1, 1, 2], [1, 2, 1]],
    'the': [[5, 5, 5], [6, 6, 6], [7, 7, 7]],
    'force': [[10, 11, 12]],
    '': [[1, 5, 9], [9, 1, 5]]  # just include an empty tag to denote "untagged"
}
hc = hex_color

In [36]:
splatter(pts)

<IPython.core.display.Javascript object>

In [37]:
splatter(pts, fillColors=[hc.bisque, hc.blue_violet, hc.dark_khaki])

<IPython.core.display.Javascript object>

In [39]:
splatter(pts, fillColors=[hc.bisque, hc.blue_violet, hc.dark_khaki], untaggedColor=hc.crimson)

<IPython.core.display.Javascript object>

If you want to map specific tags to specific colors, you can do that by specifying a `{tag: color,...}` map.

In [40]:
splatter(pts, fillColors={'use': hc.pink, 'the': hc.orchid, 'force': hc.gainsboro}, untaggedColor=hc.crimson)

<IPython.core.display.Javascript object>

And that map doesn't have to be completely specified. We'll fill in the gaps with the aforementioned default `fillColors`. 

In [41]:
splatter(pts, fillColors={'use': hc.pink}, untaggedColor=hc.crimson)

<IPython.core.display.Javascript object>

You can also specify `untaggedColor` directly in `fillColors` by specifying a color for `''` or `None`. These take precedence over the `untaggedColor` argument.

In [45]:
splatter(pts, fillColors={
    'use': hc.pink, 
    'the': hc.orchid, 
    'force': hc.gainsboro, 
    '':hc.crimson})

<IPython.core.display.Javascript object>

# More

## Great color splatters

In [6]:
from oui.splatter import splatter
from oui.color_util import color_names_and_codes
import numpy as np

# splatter(pts, fill_colors=final_df.hex.to_list(), node_size=0.03, figsize=400)

`color_names_and_codes` is a list of color names, hex codes, and dec codes.

Note: they are unique in names, but not in code (some different names for same colors)

In [8]:
print(f"We have {len(color_names_and_codes)} such items")
color_names_and_codes[35:38]


We have 164 such items


[{'color': 'orange', 'hex': '#FFA500', 'dec': [255, 165, 0]},
 {'color': 'gold', 'hex': '#FFD700', 'dec': [255, 215, 0]},
 {'color': 'dark_golden_rod', 'hex': '#B8860B', 'dec': [184, 134, 11]}]

We will use the RGB values (the `dec` field) as our fvs and the `color` as our tags. Makes sense, right?

You know what else makes sense? To assign the `hex` field (or the tag itself) to be the color for that tag. 

Let's do it!

In [11]:
pts = [{'fv': x['dec'], 'tag': x['color']} for x in color_names_and_codes]
fillColors = {x['color']: x['hex'] for x in color_names_and_codes}
splatter(pts, figsize=500, fillColors=fillColors)

<IPython.core.display.Javascript object>

Note that a good way to get a sense of how our splatter squeezes multi-dimensions into 2 dimensions is to splatter 2 dimensions only. You would think that it would just fix the points in their 2d location.

Now, that would be true if we initiated our points in the 2d location and didn't make the t-SNE parameters too extreme. But since the splatter, by default, initiates the location randomly, it will not converge in it's most stable point, but instead in some other local minimum. 

Try splattering only RG, or RB, or GB of our RGB points

In [13]:
pts = [{'fv': x['dec'][1:], 'tag': x['color']} for x in color_names_and_codes]
fillColors = {x['color']: x['hex'] for x in color_names_and_codes}
splatter(pts, figsize=500, fillColors=fillColors)

<IPython.core.display.Javascript object>

## Splatter args

In [265]:
print(splatter_raw.__doc__)


    Splatter multidimensional pts (that is, see a TSNE iteration happen in front of your eyes,
    squishing your multidimensional pts into two dimensions.

    The `pts` input is a list of dicts, where every dict must have, at a minimum, an `'fv'` field whose value
    is a list of fixed size for all pt of pts.

    Optionally, you can include:
    - 'tag': Will be use to categorize and color the point

    :param pts: Your pts, in the form of a list of dicts, list of lists, or dict of lists.
        All forms of data will be converted to a list of dicts where these dicts have four fields
        (other fields are possible, but are ignored by splatter): `fv` (required), `tag`, `source`, and `bt`.
        - `fv`: the "feature vector" that is used to computer node simularity/distance
        - `tag`: used to denote a group/category and map to a color
        - `source` and `bt`: which together denote the reference of the node element --
            `source` being an identification of t