# Structured Data

Data type representation is a way to identify different type of value that could be required depending on the problem to be solved. What happens when you have to repeat a certain piece of code several times? First, it makes the code bigger and takes away one of the most important principles of Python coding such as readability. Also, it is heavily saying that the data _per sé_ needs to be handled in a different way. For instance:

```python
"""
The following code has variables holding ASCII values.
"""
var_a = 97
var_b = 98
var_c = 99
var_d = 100
var_e = 101
var_f = 102
var_g = 103
# ... 
var_y = 121
var_z = 122

# 32 is the distance from lower case chars to become upper case.
def to_upper(char):
    return char - 32

var_A = to_upper(var_a)
var_B = to_upper(var_b)
var_C = to_upper(var_c)
var_D = to_upper(var_d)
var_E = to_upper(var_e)
var_F = to_upper(var_f)
var_G = to_upper(var_g)
# ...
var_Y = to_upper(var_y)
var_Z = to_upper(var_z)
```

There are multiple options to avoid having duplicated code and to improve how to treat your data and that which can be achieved using data structures to properly handle your code. Below are the data structures we will be reviewing as part of this exercise:

1. Lists
    * Arrays
2. Tuples
    * Triples
3. Dictionaries
4. Sets

## Lists

Lists are one of the most used data structure in Python. They provide enough functions and interactions so they have become an essential part of the language. Knowing about it can solve many issues.

### Related functions

The language _per sé_ will give us some key functions in order to work with the lists. In this section we will describe their usages. See them below:

| Method  | Description |
| ------- | ----------- |
| append  | Inserts at the end.|
| insert  | Inserts at the specified position.|
| pop     | Removes the last element, or the one you specify. This also returns the element.|
| remove  | Removes the first found item that is equal to the specified value.|
| reverse | Reverses the list.|
| sort    | Sorts the list.|

There are also some other functions that can be used on the lists that are not being mentioned here. 
The language also provides functions that work over collections such as _len_ that allow us to get the size of the list. To see the full reference, visit https://docs.python.org/3/tutorial/datastructures.html.

### Arrays
There is no differentiation between linked lists and arrays in Python, but it provides all the facilities for it to be threted as an array. Items can be addressed with accessors:

```python
# This creates an empty array. If we try to assign a value to any position, it will flag an error.
array = []
# Now let's play a little bit with the language to populate the array.
array = [0] * 5 # This creates a list of 5 elements, all of them 0.
print(f"array={array}")
# Once the elements are reserved in memory, then we can play with the accessors.
array[0] = 1
# The following line uses the accessor ('[', ']') to obtain the value of the first element.
print(f"array[0]={array[0]}")
```

As you can notice, mutability is one of the features from this data structure that is relevant to highlight since it will allow us to modify the contained values once created.

### Example

Let's transform our first code snippet into a list to see how much the code is going to be reduced and reused.

In [25]:
var_abc = list()

# let's populate the list, instead of creating a variable for each letter, it can be contained
# in a list.
for idx in range(97, 123): # range works as the interval [97, 123[
    var_abc.append(idx)

print(f"var_abc{var_abc}")

# To uppercase
def to_upper(char):
    return char - 32

# lists can also be defined with square brackets
var_ABC = []

# Let's traverse var_abc and populate var_ABC.
# In this case, instead of having 26 statements converting each letter, we have 3 lines, that could be 2. 
for value in var_abc:
    value_uppercase = to_upper(value)
    var_ABC.append(value_uppercase)

print(f"var_abc{var_abc}")
print(f"var_ABC{var_ABC}")

# now let's convert them into alpha values
alpha_abc = list()
for idx in range(len(var_abc)): # [0, var_abc's length[
    """
     In this case I'm not using an extra variable to hold the char's value prior to be inserted
     into the list. chr(x) returns the char (according to ASCII) that a number represents.
     See https://docs.python.org/3/library/functions.html#chr
    """
    alpha_abc.append(chr(var_abc[idx]))

# lists/arrays can also be initialized with values.
alpha_ABC = ['A', 'B', 'C']
# Given that A, B, and C are already part of the list/array, let's skip them by start
# traversing on index 3, size of alpha_ABC.
start = len(alpha_ABC)
for idx in range(start, len(var_ABC)):
    alpha_ABC.append(chr(var_ABC[idx]))

print(f"alpha_abc{alpha_abc}")
print(f"alpha_ABC{alpha_ABC}")

reversed_alpha_ABC = alpha_ABC.reverse()

print(f"reversed alpha_ABC{alpha_ABC}")

# This will reverse the list back since the order is alphabetical
sorted_alpha_ABC = alpha_ABC.sort()

print(f"sorted alpha_ABC{alpha_ABC}")


var_abc[97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122]
var_abc[97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122]
var_ABC[65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90]
alpha_abc['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
alpha_ABC['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z']
reversed alpha_ABC['Z', 'Y', 'X', 'W', 'V', 'U', 'T', 'S', 'R', 'Q', 'P', 'O', 'N', 'M', 'L', 'K', 'J', 'I', 'H', 'G', 'F', 'E', 'D', 'C', 'B', 'A']
sorted alpha_ABC['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z']


## Tuples

Depending on the problem you are tackling, sometimes is quite helpful to keep the data secure by not allowing the information a data structure contains to be modified and that is known as unmutable values. A Python tuple works as a vector, but once you create the structure, you cannot change what is being hold. Also, it is important to mention that tuples allow duplicated values and they are ordered.

```python
# Just by adding surrounding parenthesis, holding more than one value, you're saying this is a tuple
not_a_tuple = (1)
tuple = (1, "alphanumeric value")

# As you can tell, the tuple itself is a reserved keyword and it can be used the same way than '()'
another_tuple = tuple(1, 2)
```
### Related functions

Given that a tuple is also a sequence, it shares most of the functionality that is not mutable with lists. Here is a list of functions that only apply to tuples:

| Method | Description |
| ------ | ----------- |
| count  | Returns the number of times a specified value occurs in a tuple. |
| index  | Searches the tuple for a specified value and returns the position of where it was found. |

### (Tu|Tri|Cuadru|Multi)-ples

Even though the name tuple implies for it to hold 2 values, that won't be the case here. You can create unmutable structures able to hold from one (with some tricks) to multiple values (the amount of them you prefer).

```python
# That coma is saying you'll have 2 values, but no need to be specified according to the language
# So here we are doing some hacks to have just one value.
single_value = ("one value", )

tuple = (1, 2)
triple = ("one", "two", "three")
cuadruple = (1.0, 2.0, 3.0, 4.0)

# see the fifth value holding a list
quintuple = (1, "two", 3.0, '4', [5])
```

In case you want to get a single value from a tuple, it is as simple as using accessors in the same way you would use them with lists/arrays, just remember that once you try to modify any of them, an error will be triggered by the interpreter. The interesting one is how to get all the values at once:

```python
tuple = (1, 2)

# this will unpack the values into 'a' and 'b'
a, b = tuple

# 'a' can change its reference, but if you take a look at 'tuple', it is still the same.
a = 5

triple = ("one", "two", "three")

x, y, z = triple
```

### Example

Let's say you are only interested in one value a tuple is holding, we can use some hacks to get just the ones you care about.

In [26]:
# Hypothetic HTTP response

# HTTP responses usually hold a lot of metadata such as status code, content, and
# headers, so this tuple will hold those values.
response = (200, {"json_values": [0, 2, 3]}, {"Content-Type": "application/json"})

# The thing is, you just want to validate the request you're doing was executed successfully. The '_' will allow
# us to get only the values we care about, in this case, the status code to be validated.
status_code, _, _ = response



if status_code == 200:
    print("Success")
else:
    print("Failure")

"""
If you try to unpack a tuple with less variables than the amount of values being hold by the
structure, it will flag an error. Remember you can always use accessors in case you don't want
to create a lot of variables, if you don't know the length of the tuple, or even if the size 
of it changes dynamically.

_, content = response # ValueError: too many values to unpack (expected 2)
"""

# status_code holds a reference to the tuple, you will be able to change the reference, but the
# tuple will mantain the values.

print(f"tuple before being modified: {response}")

status_code = 500

print(f"status_code modified: {status_code}")
print(f"tuple after being modified: {response}")

# and remember you can always pack a tuple whenever you want.
new_response = (status_code, {"failure": "just an example"}, {"Content-Type": "application/json"})

print(f"new_response type:{type(new_response)}")

# but also remember that tuples need to hold at least 2 values
not_a_tuple = ("not")
print(f"not_a_tuple type:{type(not_a_tuple)}")

# the trick is always available
a_tuple = ("tuple", )
print(f"a_tuple type:{type(a_tuple)}")

Success
tuple before being modified: (200, {'json_values': [0, 2, 3]}, {'Content-Type': 'application/json'})
status_code modified: 500
tuple after being modified: (200, {'json_values': [0, 2, 3]}, {'Content-Type': 'application/json'})
new_response type:<class 'tuple'>
not_a_tuple type:<class 'str'>
a_tuple type:<class 'tuple'>


## Dictionaries

Dictionaries are a key-value paired, ordered, and mutable collection, where all the keys must be different, this means you must use the key to access the value instead of an index. They are an essential structure since they can be translated almost directly from and to JSON which is a very common way to present data. As in all the structures we've seen so far, the language provides ways to work with them easily.

### Related functions

Dictionaries are also sequences/collections, most of list's methods will apply, here is a list of built-in functions for `dicts`:

| Method     | Description |
| ---------- | ----------- |
| clear      | Removes all elements of dictionary. |
| copy       | Returns a shallow copy of dictionary dict. |
| fromkeys   | Create a new dictionary with keys from seq and values set to value. |
| get        | For key key, returns value or default if key not in dictionary. |
| has_key    | Returns `true` if key in dictionary, `false` otherwise. |
| items      | Returns a list of dictiorary's `(key, value)` tuple pairs. |
| keys       | Returns list of dictionary's keys. |
| setdefault | Similar to get(), but will set dict[key]=default if key is not already in dict. |
| update     | Adds the incoming dictionary's key-values pairs to the dictionary. |
| values     | Returns list of dictionary dictionary's values. |

```python
# empty dict def
dictionary = dict()

# as in lists and tuples, you can also define a new dict with '{' and '}'.
dictionary_2 = {}

dictionary_3 = dict(first = "1st", second = "2nd", third = "3rd")

dictionary_4 = {"fourth": "4th", "fifth": "5th", "sixth": "6th"}
```

The keys of a dict can be made of _strings_, _chars_, _integers_, or _doubles_, you can also mix them up. On the other hand the values can be anything you can think of, from simple data types to any kind of complex class or structures.

```python
from math import pi as PI

class Example:
    def __init__(self):
        self.description = "I'm just an example"
        self.value = [
            {
                "fib": tuple(1, 1, 2, 3, 5, 8, 13),
                "sq": set(1, 4, 9, 16, 25, 36, 49)
            }
        ]

example = new Example()

dictionary_5 = {
    1: "integer",
    2.0: ["double"],
    'c': 99,
    "alphanumeric value - as long as you need": PI,
    "5": example,
    6.2: True
}
```

Traversing a dictionary works similarly to traversing a list or a tuple, just beware of using the `key` instead of the `index`.

```python
from math import pi as PI

class Example:
    def __init__(self):
        self.description = "I'm just an example"
        self.value = [
            {
                "fib": tuple(1, 1, 2, 3, 5, 8, 13),
                "sq": set(1, 4, 9, 16, 25, 36, 49)
            }
        ]

example = new Example()

dictionary_5 = {
    1: "integer",
    2.0: ["double"],
    'c': 99,
    "alphanumeric value - as long as you need": PI,
    "5": example,
    6.2: True
}

# As you can tell in the following statements, accessors ('[', ']') are required
# to get data from a dictionary.
for key in dictionary_5:
    print(f"dict[{key}]:{dictionary_5[key]}")

# You can get a list holding the keys or a list holding the values in case of need
keys = dictionary_5.keys()
values = dictionary_5.values()

# you can also get both paired, let's use a loop along with it

for key, value in dictionary_5.items():
    # we don't have to access the dict, since value already holds what we need
    print(f"dict[{key}]:{value}")
```

This structure is very important, since it simplifies a lot of the validation that is required in certain programs because of some of its properties, e.g. keys are unique. Let's take a look to some of these usages.

### Example



In [27]:
# the following imports allow us to handle json input/output easily
from json import load as json_load
from json import dump as json_dump

# TODO: Please ignore this, boiler plate to showcase dicts
filename = "./sample.json"
file_content = {
    "firstName": "Jane",
    "lastName": "Doe",
    "hobbies": ["running", "sky diving", "singing"],
    "age": 35,
    "children": [
        {
            "firstName": "Alice",
            "age": 6
        },
        {
            "firstName": "Bob",
            "age": 8
        }
    ]
}

# the following code handles JSON files writing
with open(filename, "w+") as json_file:
    json_dump(file_content, json_file)

# the following code handles JSON files reading
data = None
with open(filename, "r+") as json_file:
    data = json_load(json_file)

# in case the following statements fail, it will stop the program
assert "firstName" in data
assert "nickname" not in data

print(f"firstName: {data['firstName']}")

# in case of a missing key, an exception will be triggered
try:
    print(f"nickname: {data['nickname']}")
except KeyError as kErr:
    print("Nickname's key is missing")

# we can also use the function get
print(f"hobbies: {data.get('hobbies')}")

# get returns None in case of a missing key
print(f"nickname: {data.get('nickname')}")

# default values can be set, for get not to return None in case of a missing key
print(f"nickname: {data.get('nickname', 'default_nickname')}")

# Now let's modify our dict to be dumped into our file by adding new children.
# Using append given that children is a List.
data["children"].append({"firstName": "TwinABC", "age": 0})
data["children"].append(dict(firstName = "TwinXYZ", age = 0))

print(f"Children: {data['children']}")

# Dumping the data to the json file. Please check and compare sample.json initial
# data with the latest modifications.
with open(filename, "w+") as json_file:
    json_dump(data, json_file)


firstName: Jane
Nickname's key is missing
hobbies: ['running', 'sky diving', 'singing']
nickname: None
nickname: default_nickname
Children: [{'firstName': 'Alice', 'age': 6}, {'firstName': 'Bob', 'age': 8}, {'firstName': 'TwinABC', 'age': 0}, {'firstName': 'TwinXYZ', 'age': 0}]


## Sets

Sets in Python work similarly as they do in mathematics, they don't allow duplicated values as part of the collection. They also don't have a defined order and are unchangeable, once defined we cannot change those items that belong to the set. The math that supports this structure, will allow us to play around with it.

### Related functions

| Method     | Description |
| ---------- | ----------- |
| add                        | Adds an element to the set. |
| clear                      | Removes all the elements from the set. |
| copy                       | Returns a shallow copy of the set. |
| difference                 | Returns a set containing the difference between two or more sets. |
| difference_update          | Removes the items in this set that are also included in another, specified set. |
| discard                    | Removes the specified item. |
| intersection               | Returns a set, that is the intersection of two or more sets. |
| intersection_update        | Removes the items in this set that are not present in other, specified set(s). |
| is_disjoint                | Returns whether two sets have a intersection or not. |
| issubset                   | Returns whether another set contains this set or not. |
| issuperset                 | Returns whether this set contains another set or not. |
| symetric_difference        | Returns a set with the symmetric differences of two sets. |
| symetric_difference_update | Inserts the symmetric differences from this set and another. |
| union                      | Returns a set containing the union of sets. |
| update                     | Updates the set with another set, or any other iterable. |

```python
# use curly braces('{, '}') or set to create a new one.
set1 = {1, 8, 27, 64, 125}
set2 = set(1, 1, 2, 3, 5) # 1 will only be added once
```
You can use `len` or `append` to do thigs with it. The methond `append` won't trigger any exception if you try to add a duplicated value, in fact, it will do nothing.

### Example

As mentioned in this section, the math that involves set is very helpful. Let's take a look on how to take advantage of it.

In [28]:
"""
Set Math:
    A U B = {x in A or x in B}
    A ∩ B = {x in A and x in B}
    A \ B = {x in A and x not in B}
"""

# Let's say you have a local data base (your whatsapp contacts) and the server's data (whatsapp worldwide users).

my_whatsapp_contacts = ["Carlos", "Juan", "Pedro"]
# ChatGPT query "15 random first names in a python list"
fictional_whatsapp_db = [
    "Ethan", "Olivia", "Liam", "Ava", "Noah", "Emma",
    "William", "Sophia", "James", "Isabella", "Benjamin",
    "Mia", "Samuel", "Charlotte", "Alexander"
] + my_whatsapp_contacts # Do not forget about our friends ;)

# As you can notice, these are lists, but we need sets
A = set(my_whatsapp_contacts)
B = set(fictional_whatsapp_db)

U = A.union(B)
# Since A is contained in B, the difference is the empty set (len must be 0)
assert len(U - B) == 0

I = A.intersection(B)
# this will be my contacts
print(f"My contacts are part of the whatsapp data base:{I}")

# Snooping the whatsapp db
diff = B - A
print(f'These "are the contacts" the whatsapp DB has and I don\'t: {diff}')

# Adding Carlos since I think his not a part of my contacts
if "Carlos" in A: # this is obvious, but is just to make a point.
    A.add("Carlos")
    print(f"Just to make a point, Carlos won't be added twice: {A}")

My contacts are part of the whatsapp data base:{'Juan', 'Carlos', 'Pedro'}
These "are the contacts" the whatsapp DB has and I don't: {'Samuel', 'Isabella', 'Olivia', 'Benjamin', 'Noah', 'Sophia', 'Alexander', 'Emma', 'Mia', 'Liam', 'Ava', 'Charlotte', 'Ethan', 'James', 'William'}
Just to make a point, Carlos won't be added twice: {'Juan', 'Carlos', 'Pedro'}


____
## Equivalent Perl Code ##

#### Lists ####

&emsp;Python:
``` python
var_abc = list()
var_ABC = []
alpha_ABC = ['A', 'B', 'C'] 

var_abc.append(idx)

reversed_alpha_ABC = alpha_ABC.reverse()
print(f"reversed alpha_ABC{alpha_ABC}")
sorted_alpha_ABC = alpha_ABC.sort()
print(f"sorted alpha_ABC{alpha_ABC}")
```

&emsp;Perl:
``` perl
@var_abc = ( );
@var_ABC = ( );
@alpha_ABC = ('A', 'B', 'C');

push(@var_abc, $idx);

@reversed_alpha_ABC = reverse(@alpha_ABC);
print("reversed alpha_ABC @alpha_ABC \n");
@sorted_alpha_ABC = sort(@alpha_ABC);
print("sorted alpha_ABC @alpha_ABC\n");
```

#### Dictionaries ####

&emsp;Python:
``` python
# Definition
data = {
  "firstName": "Henry",
  "nickname": "ford",
  "year": 1964,
  "children": ["Rodrigo", "Marcela", "Mariana"]
}

# Asserts
assert "firstName" in data
assert "nickname" not in data

# Access
print(f"firstName: {data['firstName']}")
print(f"hobbies: {data.get('nickname')}")
print(f"nickname: {data.get('nickname', 'default_nickname')}")

# Adding 
data["children"].append({"firstName": "TwinABC", "age": 0})
data["children"].append(dict(firstName = "TwinXYZ", age = 0))

```

&emsp;Perl:
```
# Definition
%data = (
  firstName => "Henry",
  nickname => "ford",
  year => 1964,
  children => ["Rodrigo", "Marcela", "Mariana"]
);

# Exists (Not an assertion)
exists $data{firstName} or die;
exists $data{nickname} or die;

# Access
print("firstName: $data{firstName}\n");
print("hobbies: $data{nickname}\n");

# Adding 
push(@{$data{children}}, "Elizabeth");
push(@{$data{children}}, "Jose");
```

____
## Equivalent TCL Code ##

#### Lists ####

&emsp;Python:
``` python
var_abc = list()
var_ABC = []
alpha_ABC = ['A', 'B', 'C'] 

var_abc.append('a')

reversed_alpha_ABC = alpha_ABC.reverse()
print(f"reversed alpha_ABC{alpha_ABC}")
sorted_alpha_ABC = alpha_ABC.sort()
print(f"sorted alpha_ABC{alpha_ABC}")
```

&emsp;TCL:
``` tcl
set var_abc { }
set var_ABC = [ ];
set alpha_ABC = { "A" "B" "C" }

lappend var_abc  "a";

set reversed_alpha_ABC  lreverse alpha_ABC
puts "reversed alpha_ABC $alpha_ABC \n"
set sorted_alpha_ABC lsort alpha_ABC
puts "sorted alpha_ABC $alpha_ABC\n"
```

#### Dictionaries ####

&emsp;Python:
``` python
# Definition
data = {
  "firstName": "Henry",
  "nickname": "ford",
  "year": 1964,
  "children": ["Rodrigo", "Marcela", "Mariana"]
}

# Asserts
assert "firstName" in data
assert "nickname" not in data

# Access
print(f"firstName: {data['firstName']}")
print(f"hobbies: {data.get('nickname')}")
print(f"nickname: {data.get('nickname', 'default_nickname')}")

# Adding 
data["children"].append({"firstName": "TwinABC", "age": 0})
data["children"].append(dict(firstName = "TwinXYZ", age = 0))

```

&emsp;TCL:
```
# Definition
set data [dict create firstName "Henry" nickname "ford" year 1964 children [ "Rodrigo" "Marcela" "Mariana" ]]

# Asserts
assert data { dict exists $data firstName }
assert data { dict exists $data nickName }

# Access
set value1 [dict get $data firstName]
puts "firstName: $value1\n"
set value2 [dict get $data nickName]
print("hobbies: $value2\n");

# Adding 
dict lappend data children Elizabeth
dict lappend data children Jose
```