# Exam Block 2: Data Aggregates
***

# I. Strings in Detail

### ASCII, UNICODE, UTF-8
1. **ASCII (American Standard Code for Information Interchange); ASCII codes represent text in computers**
    
![ASCII Code](images/ASCII_full.svg "ASCII Code")


In [1]:
# Python program to illustrate ascii()
ascii_str = "La grande fête!"

In [4]:
print(ascii(ascii_str))

'La grande f\xeate!'


In [8]:
print(ascii("¥"))

'\xa5'


In [11]:
print(ascii("Straße"))

'Stra\xdfe'


In [9]:
print(ascii("Россия"))

'\u0420\u043e\u0441\u0441\u0438\u044f'


In [7]:
type(ascii("Россия"))

str

### UNICODE, UTF-8 (8 bits), UTF16 (16 bits: Java, JavaScript)
2. **Unicode Standard: Encoding representation in most writing systems (includes characters of non-Latin languages and disciplines)**

![Escape Sequence](images/escaping_sequence.png "Escape Sequence")

In [28]:
unicode_str = "街道"  # Chinese meaning 'street'

In [29]:
print(unicode_str)

街道


In [19]:
print(ascii(unicode_str))

'\u8857\u9053'


In [26]:
print(unicode_str)

街道


In [55]:
print("\u8857\u9053")

街道


## Normalizing Unicode Text to a Standard Representation

In [72]:
import unicodedata

In [82]:
s1 = "Spicy Jalape\u00f1o"
print(s1)
print(ascii(s1))

Spicy Jalapeño
'Spicy Jalape\xf1o'


In [83]:
s2 = "Spicy Jalapen\u0303o"
print(s2)
print(ascii(s2))

Spicy Jalapeño
'Spicy Jalapen\u0303o'


In [81]:
"""
Normalizing unicode text
NFD: Characters should be fully decomposed  with the use of combining characters.
"""

t1 = unicodedata.normalize("NFD", s1)
print(t1)
print(ascii(t1))

t2 = unicodedata.normalize("NFD", s2)
print("\n", t2)
print(ascii(t2))

Spicy Jalapeño
'Spicy Jalapen\u0303o'

 Spicy Jalapeño
'Spicy Jalapen\u0303o'


## Escaping using \ character
![Escape Characters](images/escape_chars.png "Escape Characters")

In [39]:
print("One line, \nThis is another line")

One line, 
This is another line


In [40]:
print("to tab use \\t")
print("This is an example:\n\ton a new line.")

to tab use \t
This is an example:
	on a new line.


In [44]:
print('This is an example of a \'single\' quote string.')

This is an example of a 'single' quote string.


In [43]:
print("Double quote string with \"double\" quote!")

Double quote string with "double" quote!


In [46]:
print("Hello \bWorld!")  # Erase one space (backspace)

Hello World!


In [48]:
print("you see me not \rnow you see me!")

you see me not now you see me!


In [54]:
print("\110\145\154\154\157")  # \ooo (octal)

Hello


## String Copying

In [56]:
str1 = "This is string 1"

In [125]:
str2 = str1  # One way
str1==str2

True

In [124]:
print(id(str1))
print(id(str2))

140686408582688
140686408582688


In [122]:
str3 = str(str2)  # A second way
id(str3)

140686408582688

In [121]:
str4 = str3[:]  # A third way
print(str4)

This is string 1


In [120]:
str5 = "" + str4  # A fourth not so cool way :(
print(str5)
id(str5)

This is string 1


140686408582688

## String slicing & indexing

In [106]:
palindrome1 = "racecar"
print(palindrome1[::-1])  # Read backwards reads the same ;)

racecar


In [115]:
import numpy as np

In [118]:
arr1 = np.arange(11)
print(arr1)
print(arr1[::-1])

[ 0  1  2  3  4  5  6  7  8  9 10]
[10  9  8  7  6  5  4  3  2  1  0]


## Some string manipulation methods

**upper()** and **lower()**

In [127]:
str1 = "this is a string".upper()
print(str1)

THIS IS A STRING


In [128]:
print(str1.lower())

this is a string


## isxxx() methods
1. `isupper()` Returns True if the string has at least one letter and all the letters are uppercase.
1. `islower()` Returns True if the string has at least one letter and all the letters are lowercase.
1. `isalpha()` Returns True if the string consists only of letters and is not blank.
1. `isalnum()` Returns True if the string consists only of letters and numbers and is not blank.
1. `isdecimal()` Returns True if the string consists only of numeric characters and is not blank.
1. `isspace()` Returns True if the string consists only of spaces, tabs, and new-lines and is not blank.
1. `istitle()` Returns True if the string consists only of words that begin with an uppercase letter followed by only lowercase letters

In [177]:
msg = "Hello world!"
print(f"islower() only boolean: {msg.islower()}")

msg2 = msg.upper()
print(f"isupper() only boolean: {msg2.isupper()}")


name = "M234onica"
print(name.isalnum())

# contains whitespace
name = "M3onica Gell22er "
print(name.isalnum())

name = "Mo3nicaGell22er"
print(name.isalnum())

name = "133"
print(name.isalnum())

islower() only boolean: False
isupper() only boolean: True
True
False
True
True


**capitalize()**

In [179]:
title = "why the rite?"
print(title.capitalize())

Why the rite?


**split()**

In [136]:
str_lst = str1.split(" ")
print(str_lst)

list_reversed = str_lst[::-1]
list_reversed2 = list(reversed(str_lst))  # Another reversing approach
print(list_reversed2)
str_reversed = " ".join(list_reversed)
print(str_reversed)

['THIS', 'IS', 'A', 'STRING']
['STRING', 'A', 'IS', 'THIS']
STRING A IS THIS


In [143]:
str2 = "The Gee is in the house and not over there!"
lst = str2.split(" ", 3)  # split(separator, maxsplit(option, default -1))
print(lst)

['The', 'Gee', 'is', 'in the house and not over there!']


### len(), chr(), ord()

#### chr() A character whose Unicode code point is the integer
#### Valid range from 0 to 1,114,111

In [146]:
print(len(str2))

dict1 = dict()
print(len(dict1))

43
0


In [153]:
char1 = chr(36)
print(char1)
type(char1)

char2 = chr(1_114_111)
print(char2)


$
􏿿


In [154]:
char = "P"
# Find unicode of P
u_code = ord(char)
print(u_code)

80


# II. Lists in Detail


## Indexing: start index (default is 0), (last index is -1)

In [222]:

lst1 = ["Hello", [], "there", 100, "my", False, "people!"]
print(lst1[2])

"""
Item index finding
    parameters:
        element - the element to be searched,
        start(optional) - start searching from this index,
        end(optional) - search the element up to this index.
"""
print(lst1.index("my", 0, -2))

# Hop every other item starting from index zero
print(lst1[::2])


# List index drilling
print(lst1[2][0])

"""
List referencing: good for working with the new copy but still want original list modification 
available.
"""
shallow_copy_list = lst1
print(f"Original lst1 id: {id(lst1)}")
print(f"Shallow copied list: {id(shallow_copy_list)}")

# List duplication, two different objects.
lst2 = lst1[:]
print(id(lst1))
print(id(lst2))

there
4
['Hello', 'there', 'my', 'people!']
<class 'list'> 120
array('h', [0, 1, 2, 3, 4, 5, 6]) 94
t
Original lst1 id: 140686435172672
Shallow copied list: 140686435172672
140686435172672
140686435188096


## Numeric arrays

![Array methods](images/array_methods.png "Array methods")
![Type Codes](images/array_type_code.png "Type code")

In [240]:
from array import array
import sys


# Creating a numeric array based on the last item length
num_lst = [0, 1, 2, 3, 4, 5, 6]
print(type(num_lst), sys.getsizeof(num_lst))
num_lst2 = [x for x in range(1000+1)]
print(sys.getsizeof(num_lst2))

arr = array("b", [0, 1, 2, 3, 4, 5, 6])
print(type(arr), sys.getsizeof(arr))
arr2 = array("h", [x for x in range(1000+1)])
print(sys.getsizeof(arr2))

arr3 = np.arange(1001)
print(type(arr3))
print(sys.getsizeof(arr3))

<class 'list'> 120
8856
<class 'array.array'> 87
2082
<class 'numpy.ndarray'>
8112
