# **CHARACTERS ENCODING**

* **code point** : a number which makes a character (32 = a space)

## **ASCII standard**

* ASCII standard = 128 code points
* standard ASCII occupies 128 out of 256 possible code points, you can only make use of the remaining 128
* A code page is a standard for using the upper 128 code points to store specific national characters
  * For example, there are different code pages for Western Europe and Eastern Europe, Cyrillic and Greek alphabets, Arabic and Hebrew languages, and so on.
  * This means that the one and same code point can make different characters when used in different code pages
  * In consequence, to determine the meaning of a specific code point, you have to know the target code page.

=> **ambiguous!**

## **Unicode**

* assigns unique **unambiguous** characters (letters, hyphens, ideograms, etc.) to more than a million code points
* The first 128 Unicode code points are identical to ASCII, 
  * and the first 256 Unicode code points are identical to the ISO/IEC 8859-1 code page (a code page designed for western European languages)

* **UTF-8**
  * Unicode Transformation Format
  * uses as many bits for each of the code points as it really needs to represent them, for example:
    * all Latin characters (and all standard ASCII characters) occupy eight bits;
    * non-Latin characters occupy 16 bits;
    * CJK (China-Japan-Korea) ideographs occupy 24 bits
  * Due to features of the method used by UTF-8 to store the code points, there is no need to use the BOM, but some of the tools look for it when reading the file, and many editors set it up during the save

* Python 3 fully supports Unicode and UTF-8:
  * you can use Unicode/UTF-8 encoded characters to name variables and other entities;
  * you can use them during all input and output.
  * This means that Python3 is completely I18Ned.

# **STRINGS IN PYTHON**

## **`ord()`** :
  * one-character string
  * returns its code point
  * or causes `TypeError` exception

In [6]:
# ord() = 'ordinal'. Show ASCII/UNICODE code point value.

ch1 = 'a' 
ch2 = ' ' # space

print(ord(ch1))
print(ord(ch2))


97
32


## **`chr()`** :
  * takes a code point
  * returns the correspond character
  * `ValueError` or `TypeError` exceptions if invalid argument (e.g., a negative or invalid code point)


String operations:
* indexing
* iterating
* slicing
* `in` and `not in` operators


In [8]:
alpha = "abdefg"

print(alpha[1:3])
print(alpha[3:])
print(alpha[:3])
print(alpha[3:-2])
print(alpha[-3:4])
print(alpha[::2])
print(alpha[1::2])

alphabet = "abcdefghijklmnopqrstuvwxyz"

print("f" in alphabet)
print("1" in alphabet)
print("ghi" in alphabet)
print("Xyz" in alphabet)


bd
efg
abd
e
e
adf
beg
True
False
True
False



### **Immutable**

* Different from lists:
* Not allowed:
  * `del` instruction
  * `append` method
  * `insert` method
* Same as lists:
  * operators: `+`, `*`, `+=`, `*+`
  * `min()` and `max()` allowed
  * `index()`
  * `list()`
  * `count()`
* Specific to characters:
  * `capitalize()`
  * `center()`
  * `startswith()` and `endswith()`
  * `find()`
  * `isalnum()` (digits or alphabetical characters only)
  * `isalpha()` (alphabetical characters only)
  * `isdigit()` (digits only)
  * `isspace()`
  * `islower()` and `isupper()` (lower or upper case only)
  * `lower()` and `upper()`
  * `swapcase()` and `title()`
  * `join()`
  * `lstrip()` and `rstrip()` (removes leading/ending whitespaces or specified characters)
  * `strip()` (`lstrip()` + `rstrip()` combined)
  * `replace()`
  * `rfind()` (search from the end)
  * `split()` (with no argument, splits on one or plus whitespaces)

[Python String methods](https://docs.python.org/3.4/library/stdtypes.html#string-methods)

In [None]:
x = "abcdefghijklmnopqrstuvwxyz"
del(x)     # does nothing, no error message
del(x[0])  # NameError: name 'x' is not defined -> not a list


In [None]:
str1 = 'a'
str2 = 'b'

print(str1 + str2)  # ab
print(str2 + str1)  # ba
print(5 * 'a')      # aaaaa
print('b' * 4)      # bbbb
str1 += str2
print(str1)         # ab


In [None]:
print(min("aAbByYzZ"))   # A (upper is before lower)
print(max("aAbByYzZ"))   # z
print(list("aAbBy"))     # ['a', 'A', 'b', 'B', 'y']


In [15]:
print('[' + 'alpha'.center(10) + ']')
print('[' + 'gamma'.center(20, '*') + ']')
print("www.cisco.com".lstrip("w."))
print("cisco.com".rstrip(".com"))


[  alpha   ]
[*******gamma********]


## **Comparing strings** 
* Python's strings can be compared using the same set of operators which are in use in relation to numbers:
  * `==`
  * `!=`
  * `>`
  * `>=`
  * `<`
  * `<=`

* Python just compares code point values, character by character:
  * The final relation between strings is determined by **comparing the first different character in both strings** (keep ASCII/UNICODE code points in mind at all times.)
  * When you compare two strings of different lengths and the shorter one is identical to the longer one's beginning, the **longer string is considered greater**
  * String comparison is always case-sensitive: **upper-case letters are taken as lesser than lower-case**

* Comparing strings against numbers is generally a bad idea.
  * `==` always gives False
  * `!=` always produces True.
  * all other will raise a `TypeError` exception.


In [None]:
'10' == 10  # False
'10' != 10  # True
'10' > 10   # TypeError exception

### **Sorting strings**

* `sorted()` function creates a new list
* `sort()` method sorts in-place

In [None]:
secondGreek = ['omega', 'alpha', 'pi', 'gamma']
firstGreek2 = sorted(secondGreek)

secondGreek = ['omega', 'alpha', 'pi', 'gamma']
secondGreek.sort()
print(secondGreek)


## **Convert numbers to strings with `str()` function**

In [None]:
itg = 13
flt = 1.3
si = str(itg)
sf = str(flt)

print(si + ' ' + sf)

## **Convert strings to numbers with `int()` and `float()` functions**

* Possible when and only when the string represents a valid number
* If the condition is not met, expect a **`ValueError`** exception

In [None]:
si = '13'
sf = '1.3'
itg = int(si)
flt = float(sf)

print(itg + flt)