# Exam Block 2: Data Aggregates
***

# I. Strings in Detail

### ASCII, UNICODE, UTF-8
1. **ASCII (American Standard Code for Information Interchange); ASCII codes represent text in computers**
    
![ASCII Code](images/ASCII_full.svg "ASCII Code")


In [1]:
# illustrate ascii()
ascii_str = "La grande fête!"

In [4]:
print(ascii(ascii_str))

'La grande f\xeate!'


In [8]:
print(ascii("¥"))

'\xa5'


In [11]:
print(ascii("Straße"))

'Stra\xdfe'


In [9]:
print(ascii("Россия"))

'\u0420\u043e\u0441\u0441\u0438\u044f'


In [7]:
type(ascii("Россия"))

str

### UNICODE, UTF-8 (8 bits), UTF16 (16 bits: Java, JavaScript)
2. **Unicode Standard: Encoding representation in most writing systems (includes characters of non-Latin languages and disciplines)**

![Escape Sequence](images/escaping_sequence.png "Escape Sequence")

In [114]:
unicode_str = "街道"  # Chinese meaning 'street'

In [29]:
print(unicode_str)

街道


In [19]:
print(ascii(unicode_str))

'\u8857\u9053'


In [26]:
print(unicode_str)

街道


In [55]:
print("\u8857\u9053")

街道


## Normalizing Unicode Text to a Standard Representation

In [72]:
import unicodedata

In [82]:
s1 = "Spicy Jalape\u00f1o"
print(s1)
print(ascii(s1))

Spicy Jalapeño
'Spicy Jalape\xf1o'


In [83]:
s2 = "Spicy Jalapen\u0303o"
print(s2)
print(ascii(s2))

Spicy Jalapeño
'Spicy Jalapen\u0303o'


In [338]:
"""
Normalizing unicode text
NFD: Characters should be fully decomposed with the use of combining characters.
"""

t1 = unicodedata.normalize("NFD", s1)
print(t1)
print(ascii(t1))

t2 = unicodedata.normalize("NFD", s2)
print("\n", t2)
print(ascii(t2))

Spicy Jalapeño
'Spicy Jalapen\u0303o'

 Spicy Jalapeño
'Spicy Jalapen\u0303o'


## Escaping using \ character
![Escape Characters](images/escape_chars.png "Escape Characters")

In [39]:
print("One line, \nThis is another line")

One line, 
This is another line


In [40]:
print("to tab use \\t")
print("This is an example:\n\ton a new line.")

to tab use \t
This is an example:
	on a new line.


In [44]:
print('This is an example of a \'single\' quote string.')

This is an example of a 'single' quote string.


In [43]:
print("Double quote string with \"double\" quote!")

Double quote string with "double" quote!


In [46]:
print("Hello \bWorld!")  # Erase one space (backspace)

Hello World!


In [48]:
print("you see me not \rnow you see me!")

you see me not now you see me!


In [54]:
print("\110\145\154\154\157")  # \ooo (octal)

Hello


## String Copying

In [117]:
str1 = "This is string 1"

In [118]:
str2 = str1  # One way
str1==str2

True

In [119]:
print(id(str1))
print(id(str2))

140642574677488
140642574677488


In [121]:
str3 = str(str2)  # A second way
id(str3)

140642574677488

In [123]:
str4 = str3[:]  # A third way
print(str4)

This is string 1


In [124]:
str5 = "" + str4  # A fourth way
print(str5)
id(str5)

This is string 1


140642574677488

## String slicing & indexing

In [134]:
palindrome1 = "racecar"
print(palindrome1[::-1])  # Read backwards reads the same ;)
print(palindrome1[1:])

racecar
acecar


In [133]:

arr1 = [n for n in range(11)]
print(arr1)
print(arr1[::-1])

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
[10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0]


## Some string manipulation methods

**upper()** and **lower()**

In [127]:
str1 = "this is a string".upper()
print(str1)

THIS IS A STRING


In [128]:
print(str1.lower())

this is a string


## isxxx() methods
1. `isupper()` Returns True if the string has at least one letter and all the letters are uppercase.
1. `islower()` Returns True if the string has at least one letter and all the letters are lowercase.
1. `isalpha()` Returns True if the string consists only of letters and is not blank.
1. `isalnum()` Returns True if the string consists only of letters and numbers and is not blank.
1. `isdecimal()` Returns True if the string consists only of numeric characters and is not blank.
1. `isspace()` Returns True if the string consists only of spaces, tabs, and new-lines and is not blank.
1. `istitle()` Returns True if the string consists only of words that begin with an uppercase letter followed by only lowercase letters

In [177]:
msg = "Hello world!"
print(f"islower() only boolean: {msg.islower()}")

msg2 = msg.upper()
print(f"isupper() only boolean: {msg2.isupper()}")


# It's alphanumeric
name = "M234onica"
print(name.isalnum())

# contains whitespace
name = "M3onica Gell22er "
print(name.isalnum())

name = "Mo3nicaGell22er"
print(name.isalnum())

name = "133"
print(name.isalnum())

islower() only boolean: False
isupper() only boolean: True
True
False
True
True


**capitalize()**

In [179]:
title = "why the rite?"
print(title.capitalize())

Why the rite?


**split() and .join()**

In [136]:
str_lst = str1.split(" ")
print(str_lst)

list_reversed = str_lst[::-1]
list_reversed2 = list(reversed(str_lst))  # Another reversing approach
print(list_reversed2)
str_reversed = " ".join(list_reversed)
print(str_reversed)

['THIS', 'IS', 'A', 'STRING']
['STRING', 'A', 'IS', 'THIS']
STRING A IS THIS


In [143]:
str2 = "The Gee is in the house and not over there!"
lst = str2.split(" ", 3)  # split(separator, maxsplit(option, default -1))
print(lst)

['The', 'Gee', 'is', 'in the house and not over there!']


### len(), chr(), ord()

#### chr() A character whose Unicode code point is the integer
#### Valid range from 0 to 1,114,111

In [146]:
print(len(str2))

dict1 = dict()
print(len(dict1))

43
0


In [136]:
char1 = chr(36)
print(char1)
print(type(char1))

char2 = chr(1_114_111)
print(char2)


$
<class 'str'>
􏿿


In [137]:
char = "P"
# Find unicode of P
u_code = ord(char)
print(u_code)
print(type(u_code))

80
<class 'int'>


### Type Conversion
***
#### Function &emsp; Description
***
`ascii()`&emsp; Returns a string containing a printable representation of an object.
***
`bin()`&emsp; Converts an integer to a binary string.
***
`bool()`&emsp; Converts an argument to a Boolean value.
***
`chr()`&emsp; Returns string representation of character given by integer argument.
***
`complex()`&emsp; Returns a complex number constructed from arguments.
***
`float()`&emsp; Returns a floating-point object constructed from a number or string.
***
`hex()`&emsp; Converts an integer to a hexadecimal string.
***
`int()`&emsp; Returns an integer object constructed from a number or string.
***
`oct()`&emsp; Converts an integer to an octal string.
***
`ord()`&emsp; Returns integer representation of a character.
***
`repr()`&emsp; Returns a string containing a printable representation of an object.
***
`str()`&emsp; Returns a string version of an object.
***
`type()`&emsp; Returns the type of an object or creates a new type object.
***

# II. Lists in Detail


## Indexing: 
`start index (default is 0), stop index[(last index is -1), step]`

In [337]:

lst1 = ["Hello", [], "there", 100, "my", False, "people!"]
print(lst1[2])

"""
Item index finding
    parameters:
        element - the element to be searched,
        start(optional) - start searching from this index,
        end(optional) - search the element up to this index.
"""
print(lst1.index("my", 0, -2))

# Hop every other item starting from index zero
print(lst1[::2])


# List index drilling
print(lst1[2][0])

"""
List referencing: good for working with the new copy but still want original list modification 
available.
"""
shallow_copy_list = lst1
print(f"   Original lst1 id: {id(lst1)}")
print(f"Shallow copied list: {id(shallow_copy_list)}")

# List duplication, two different objects.
lst2 = lst1[:]
print(id(lst1))
print(id(lst2))

there
4
['Hello', 'there', 'my', 'people!']
t
   Original lst1 id: 140686439579328
Shallow copied list: 140686439579328
140686439579328
140686439565888


## Numeric arrays

![Array methods](images/array_methods.png "Array methods")
![Type Codes](images/array_type_code.png "Type code")

In [9]:
from array import array
import numpy as np
import sys


# Creating a numeric array

num_lst = [0, 1, 2, 3, 4, 5, 6]
print(f"Type for num_lst: {type(num_lst)},        Size: {sys.getsizeof(num_lst)}")


arr = array("b", [0, 1, 2, 3, 4, 5, 6])
print(f"    Type for arr: {type(arr)}, Size: {sys.getsizeof(arr)}")
print()
arr_bin = array("b", [x for x in range(127+1)])  # top number of a byte
print(f"Max size for arr_bin: {sys.getsizeof(arr_bin)}")


print()


num_lst2 = [x for x in range(1000+1)]
arr2 = array("h", [x for x in range(1000+1)])
arr_short = array("h", [x for x in range(32_767+1)])  # top number of short
print(f"Max size for arr_short: {sys.getsizeof(arr_short)}")
arr3 = np.arange(1001)
print(type(arr3))

print()
print("***Checking a list and numeric arrays with int items up to 1000***")
print(f"num_lst2 type: {type(num_lst2)},          Size: {sys.getsizeof(num_lst2)}")
print(f"    arr2 type: {type(arr2)},   Size: {sys.getsizeof(arr2)}")
print(f"    arr3 type: {type(arr3)}, Size: {sys.getsizeof(arr3)}")

Type for num_lst: <class 'list'>,        Size: 120
    Type for arr: <class 'array.array'>, Size: 87

Max size for arr_bin: 208

Max size for arr_short: 65616
<class 'numpy.ndarray'>

***Checking a list and numeric arrays with int items up to 1000***
num_lst2 type: <class 'list'>,          Size: 8856
    arr2 type: <class 'array.array'>,   Size: 2082
    arr3 type: <class 'numpy.ndarray'>, Size: 8112


### Integer types in Python
![Integer Binary](images/int_binary.png "Integer Binary representation")
<br>
##### Note:
The first thing to notice about these binary representations is that their lengths differ. The integer 6 needs only three bits, but the integer 999 needs ten bits. To be safe, Python allocates a fixed number of bytes of space in memory for each variable of a normal integer type, which is known as `int` in Python. Typically, an integer occupies four bytes, or 32 bits. Integers whose binary representations require fewer than 32 bits are padded to the left with 0s.
<br><br>

## List Basic Methods

#### append()

In [342]:
"""
append() adds the new item(s) to the end
    parameters:
        elmnt - Required. Any type (string, number, list, etc.) The element to search for.
"""
lst = [1, 2, 3, 7]
lst.append(5)
print(lst)

[1, 2, 3, 7, 5]


In [343]:
"""
insert()
    parameters: 
        pos - Required. A number specifying in which position to insert the value.
        elmnt - Required. A element of any type (string, number, object etc.)
"""
lst.insert(3, 4)
lst.insert(0, "first")
print(lst)

['first', 1, 2, 3, 4, 7, 5]


In [346]:
"""
len()
    Can be a sequence(string, bytes, tuple, list, range)
    parameters:
        elmnt
"""
var1 = bin(100)
print(len(var1))

print([b for b in var1])

9
['0', 'b', '1', '1', '0', '0', '1', '0', '0']


#### sorted() and .sort()

In [374]:
"""
sorted()
    Definition and Usage: Returns a sorted list of the specified iterable object.
    NOTE: Sorted works with the same data types in the list.
    
.sort()
    Definition: Sorts the given list.
    
    parameters for sorted() and .sort():
        iterable - Required. The sequence to sort, list, dictionary, tuple etc.
        key - Optional. A Function to execute to decide the order. Default is None.
        reverse - Optional. A boolean. False will sort ascending, True will sort descending.
"""
# num_lst = [2, False, "two"]
# srt_lst = sorted(num_lst)  # Error
num_lst2 = ["a", "c", "d", "b"]
srt_lst2 = sorted(num_lst2)
#print(srt_lst)
print(id(num_lst2), num_lst2)
num_lst2.sort()
print(id(srt_lst2), srt_lst2)

print()

# key= and reverse=
def len_func(e): return len(e)
cars = ['Ford', 'Mitsubishi', 'BMW', 'VW']
cars.sort(key=len_func, reverse=True)
print(cars)

140686459165184 ['a', 'c', 'd', 'b']
140686459776960 ['a', 'b', 'c', 'd']

['Mitsubishi', 'Ford', 'BMW', 'VW']


#### del()

### ****in, not in*** operators

In [7]:
# Check wether a value is part of a sequence
lst_is = [1, 2, 3, 4, 5]
print(3 in lst_is)
print(6 in lst_is)

print()

msg = "Hi there mah' people!"
print("o" in msg)
print("O" in msg)
print("x" in msg) 

True
False

True
False
False


### Lists in lists - Matrices and cubes

In [23]:
import pandas as pd

# Two-dimensional arrays
two_dim_arr = np.array([['1001A','Ray', 'Technical Head'], ['2004B', 'Karlos' , 'Manager'], ['3100A', 'Alex' , 'Lead Developer']])
two_dim_arr
pd.DataFrame(two_dim_arr, columns=["Employee Id", "Employee Name", "Employee Dept."])


Unnamed: 0,Employee Id,Employee Name,Employee Dept.
0,1001A,Ray,Technical Head
1,2004B,Karlos,Manager
2,3100A,Alex,Lead Developer


In [24]:
# Two-dimensional list
lst = [['Tom', 'Reacher', 25], ['Krish', 'Pete', 30], 
       ['Nick', 'Wilson', 26], ['Juli', 'Williams', 22]] 
      
df = pd.DataFrame(lst, columns =['FName', 'LName', 'Age']) 
print(df)

   FName     LName  Age
0    Tom   Reacher   25
1  Krish      Pete   30
2   Nick    Wilson   26
3   Juli  Williams   22


In [29]:
# Multi-dimensional array/list
multi_arr = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])
for x in multi_arr:
    print(x)

[[1 2 3]
 [4 5 6]]
[[ 7  8  9]
 [10 11 12]]


In [32]:
# Accessing item: 5
multi_arr[0][1][1]

5

### Tuples ###
---
`count()` Returns the number of times a specified value occurs in a tuple
***
`index()` Searches the tuple for a specified value and returns the position of where it was found
***
`len()` Returns the number of items in an object.
***

In [57]:
# Tuples are immutable once created, items can only be accessed.
from pprint import PrettyPrinter
tup1 = (1, "Hello", False, [3, 3, 4])
print(tup1)

tup2 = tuple(x for x in range(11))
print(tup2)

dim_tup = (((21, "age"), (10, "age"), (76, "age")), ((89, "grade"), (100, "grade"), (90, "grade")))
pp = PrettyPrinter(indent=3, depth=5)
pp.pprint(dim_tup)
dim_tup[1][0][0]

(1, 'Hello', False, [3, 3, 4])
(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
(  ((21, 'age'), (10, 'age'), (76, 'age')),
   ((89, 'grade'), (100, 'grade'), (90, 'grade')))


89

In [35]:
tup1
tup1[0]

1

In [37]:
tup1[1] = "Hi"

TypeError: 'tuple' object does not support item assignment

## Dictionaries
---
`clear()` Removes all the elements from the dictionary.
***
`copy()` Returns a copy of the dictionary.
***
`fromkeys()` Returns a dictionary with the specified keys and value.
***
`get()` Returns the value of the specified key.
***
`items()` Returns a list containing a tuple for each key value pair.
***
`keys()` Returns a list containing the dictionary's keys.
***
`pop()` Removes the element with the specified key
***
`popitem()` Removes the last inserted key-value pair.
***
`setdefault()` Returns the value of the specified key. If the key does not exist: insert the key, with the specified value.
***
`update()` Updates the dictionary with the specified key-value pairs.
***
`values()` Returns a list of all the values in the dictionary
***


In [59]:
"""
Create a new dictionary: {key: value}
"""
dict1 = {}
dict2 = dict()
print(type(dict1), type(dict2))

<class 'dict'> <class 'dict'>


In [62]:
"""
Adding new items
"""
dict1["name"] = "Panchito"
dict1["age"] = 30
print(dict1)

{'name': 'Panchito', 'age': 30}


In [71]:
"""
Accessing dictionary values
"""
# Way 1, by indexing item key
print(dict1["name"])
# print(dict1["school"])  # Error if key doesn't exist

# Way 2 .get(key), validation if item exists else provide a default value
print(dict1.get("school", "nothing here!"))


Panchito
nothing here!


### fromkeys()

In [73]:
# Create a new dictionary from variables.
x = ("key1", "key2", "key3")
default_val = 0
dict3 = dict.fromkeys(x, default_val)
dict4 = {}.fromkeys(x, default_val)
print(dict3)
print(dict4)

{'key1': 0, 'key2': 0, 'key3': 0}
{'key1': 0, 'key2': 0, 'key3': 0}


### .copy()

In [74]:
cp_dict = dict3.copy()
print(id(dict3), id(cp_dict))

140642574564928 140642547660608


### .items()

In [75]:
print(cp_dict.items())

dict_items([('key1', 0), ('key2', 0), ('key3', 0)])


In [80]:
print(dict1.keys())
print(dict1.values())

dict_keys(['name', 'age'])
dict_values(['Panchito', 30])


### update()
If item doesn't exist, it creates it.

In [112]:
dict1.update({"country": "France"})
dict1.update({"countri": "China"})
# dict1.update({"country:": "Mexico"})
print(dict1)

{'name': 'Panchito', 'age': 30, 'country': 'France', 'countri': 'China'}


### popitem(key, value)

In [111]:
dict1.popitem()

('countri', 'China')

### .pop(key)

In [113]:
dict1.pop("countri")
print(dict1)

{'name': 'Panchito', 'age': 30, 'country': 'France'}


### .setdefault(key, value)

In [98]:
car_dict2 = {}
car_dict2.setdefault("model", "BMW")

print(car_dict2.setdefault("model", "Vocho"))
print(car_dict2)

BMW
{'model': 'BMW'}


### .clear()

In [99]:
print(car_dict2.clear())
print(len(car_dict2))

None
0


### Iterating through dictionaries

In [101]:
for key in dict1.keys():
    print(key)

name
age
country
countri


In [102]:
for value in dict1.values():
    print(value)

Panchito
30
France
China


In [110]:
for key, val in dict1.items():
    print(key, val)

name Panchito
age 30
country France
countri China
