## Unicode strings and writing Python code in unicode

This notebook is about writing unicode in Python.  The main point is that not everything that can be
part of unicode string can occur freely in your code.

First, you can enter unicode characters directly in Jupyter Notebook.  The α in the next cell was entered by
typing \alpha and then hitting the Tab-key.  Try it yourself in the following code cell (note it must be a code
cell).  Note, this won't work in Google Colab, and as far as I know, you can't enter unicode characters
into a Colab notebook with your keyboard.  But you should be able to paste unicode characters into your notebook from your clipboard, and the principles discussed below about what can occur in a Python code file are the same.

In [None]:
α

The name in the following code cell was entered by typing \CYRILLIC and then hitting TAB and selecting from a menu, then hitting TAB again.

In [26]:
П = 55

There is a restriction. Only characters that are legal parts of Python names can be entered using the \character_name TAB convention.  So for example keyboard entry for \NABLA, a perfectly fine unicode character
found among the math symbols, does not work.  Nor do emojis. The rule about what can be entered by keyboard seems to be the same as the rule about what can be part of Python name.  And that rule is: "Python 3 restricts variable names to unicode characters that represent characters in written languages" ([ref](https://python-3-for-scientists.readthedocs.io/en/latest/python3_features.html))

But this still leaves a pretty wide range of possible names.

Let's illustrate what this means by looking at some Python code that uses some characters not found in ASCII. 
We emphasize:  It's not just that unicode **strings** are fine, but also that names that use the right kind of unicode characters are fine.  Consider a Python file `unicode_code.py` that contains the following:

```
alphabet = 'αβγδεζηθικλμνξοπρςστυφχψ'
φ = 3

def αβ (γ):
    # Return every other character in string
    return γ[::2]
```

We see three names that aren't ASCII: φ, a variable name; αβ, a function name; and
γ, a function parameter.


In [5]:
# Note this won't work unless you have a file named `unicode_code.py` with the contents above
# in a directory where python knows to look for imports (for example, the directory your jupyter notebook
# is running in)
import unicode_code

First we have imported a unicode string,  Let's verify that it's intact:

In [2]:
unicode_code.alphabet

'αβγδεζηθικλμνξοπρςστυφχψ'

It is a Python string of exactly the right length.  There are 24 characters in the Greek alphabet.

In [5]:
len(unicode_code.alphabet)

24

String operations work as expacted.  Peeling off and reversing the last 11 characters.

In [18]:
unicode_code.alphabet[23:12:-1]

'ψχφυτσςρποξ'

In [6]:
last_greek = unicode_code.alphabet[23:12:-1]
last_greek

'ψχφυτσςρποξ'

In [7]:
len(last_greek)

11

Check out using the name `φ` (entered as above using  \phiTAB)

In [3]:
unicode_code.φ

3

Check out calling the function `αβ`:

In [8]:
unicode_code.αβ('zyxwvut')

'zxvt'

Calling it on a unicode string.

In [9]:
unicode_code.αβ(last_greek)

'ψφτςπξ'

Prepping for the grand finale:

In [10]:
from unicode_code import αβ

And the next cell is all Greek to me:

In [11]:
αβ('ψχφυτσςρποξ')

'ψφτςπξ'

#### Characters from the extended unicode set

Now the restriction on names doesn't apply to strings.
Any characters in the vast unicode character set can be part of a string in Python 3.X.

For example, having looked up the unicode code points in a unicode chart,
let's run through some strings containing characters in the unicode card deck,
using `chr`, a function that produces a unicode character (a string of length one) from
a unicode code point.

In [1]:
# Printing the Unicode card deck


suit = 0x1f0a0,0x1f0b0,0x1f0c0,0x1f0d0
rank = 1,2,3,4,5,6,7,8,9,10,11,13,14
for s in suit:
     for r in rank:
        print(chr(s+r),end='  ')
     print()
print()

🂡  🂢  🂣  🂤  🂥  🂦  🂧  🂨  🂩  🂪  🂫  🂭  🂮  
🂱  🂲  🂳  🂴  🂵  🂶  🂷  🂸  🂹  🂺  🂻  🂽  🂾  
🃁  🃂  🃃  🃄  🃅  🃆  🃇  🃈  🃉  🃊  🃋  🃍  🃎  
🃑  🃒  🃓  🃔  🃕  🃖  🃗  🃘  🃙  🃚  🃛  🃝  🃞  



However, as mentioned above, we can't use characters that don't come from the writing system
of a language for variable names.  For example, let's try a playing card cut and pasted from the output above:

In [2]:
🂤 = 3

SyntaxError: invalid character '🂤' (U+1F0A4) (1070479639.py, line 1)