# Exploring Unicode
*Possibly pointless*

Occasionally, I find myself googling something like:

> unicode arrow symbol

So I can use a:

 →

instead of a:

==>
--->


Python includes a database of Unicode symbols so why Google it?

## Unicodedata

If you're lucky enough to know the *exact* name of your desired symbol, getting the character is straight forward.

```python
import unicodedata

>>> unicodedata.lookup("RIGHTWARDS ARROW")
>>> '→'
```

How can we find characters with 'ARROW' in their name?

## Building a Mapping

What we'd like to do is build a mapping of:

`CHARACTER NAME: CHARACTER`

So we can run a query along the lines of:

`{k: v for k, v in mapping.items() if 'ARROW' in k}`

### Getting Ready

Roughly speaking we want to do something where:

- `unicodedata.name` provides a canonical name
- We generate a the character itself

We know that `ord` will return a code point for a given character. And conversely, `chr` will return a Unicode string from an ordinal

From the documentation, `chr` will accept values from `0 <= i <= 0x10ffff`


`0x10ffff` is hexadecimal format, (base 16). To see the 'normal' way (base 10):

```python

>>> int('0x10ffff', base=16)
>>> 1114111

```

Although there isn't any need to convert it. The `0x` prefix denotes the hexadecimal format

```python

>>> 0x10
>>> 16
>>> 0x10 + 16
>>> 32
>>> range(0x1000)
>>> range(0, 4096)
```

### Building

In [1]:
import unicodedata

char_mapping = {}

for i in range(0x10ffff):
    character = chr(i)
    try:
        character_name = unicodedata.name(character)
    except ValueError:
        continue

    char_mapping[character_name] = character

### Querying


In [2]:
having_tokens = ["ARROW", "RIGHT"]
not_having_token = ["LEFT"]

def token_filter(char_name, has=having_tokens, has_not=not_having_token):
    return all([
            all([token in char_name for token in has]),
            all([token not in char_name for token in has_not]),
        ])


arrows = {k: v for k, v in char_mapping.items() if token_filter(k)}

for k, v in arrows.items():
    print(f"{k} : {v}")

MODIFIER LETTER RIGHT ARROWHEAD : ˃
MODIFIER LETTER LOW RIGHT ARROWHEAD : ˲
COMBINING RIGHT ARROWHEAD ABOVE : ͐
COMBINING RIGHT ARROWHEAD BELOW : ͕
COMBINING RIGHT ARROWHEAD AND UP ARROWHEAD BELOW : ͖
COMBINING DOUBLE RIGHTWARDS ARROW BELOW : ͢
ARABIC RIGHT ARROWHEAD ABOVE : ࣸ
ARABIC RIGHT ARROWHEAD BELOW : ࣺ
ARABIC DOUBLE RIGHT ARROWHEAD ABOVE : ࣻ
ARABIC DOUBLE RIGHT ARROWHEAD ABOVE WITH DOT : ࣼ
ARABIC RIGHT ARROWHEAD ABOVE WITH DOT : ࣽ
COMBINING RIGHT ARROWHEAD AND DOWN ARROWHEAD BELOW : ᷿
COMBINING RIGHT ARROW ABOVE : ⃗
COMBINING RIGHT ARROW BELOW : ⃯
RIGHTWARDS ARROW : →
RIGHTWARDS ARROW WITH STROKE : ↛
RIGHTWARDS WAVE ARROW : ↝
RIGHTWARDS TWO HEADED ARROW : ↠
RIGHTWARDS ARROW WITH TAIL : ↣
RIGHTWARDS ARROW FROM BAR : ↦
RIGHTWARDS ARROW WITH HOOK : ↪
RIGHTWARDS ARROW WITH LOOP : ↬
UPWARDS ARROW WITH TIP RIGHTWARDS : ↱
DOWNWARDS ARROW WITH TIP RIGHTWARDS : ↳
RIGHTWARDS ARROW WITH CORNER DOWNWARDS : ↴
RIGHTWARDS PAIRED ARROWS : ⇉
RIGHTWARDS DOUBLE ARROW WITH STROKE : ⇏
RIGHTWARDS DOU