# Complex Data Types

Now that we've covered the basic building blocks (boolean, string, int, and float), we'll learn structures that hold and relate many of those building blocks in ways that are useful for data analysis, visualization, and machine learning.

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1wgsLsEmt-YFbl5Qd58YSGzPEC2eWqEK_?usp=sharing)

## A: Lists

#### **Constructing Lists**

Lists are used to store one or more items of constant or varying datatypes.

They are constructed with square brackets, using commas to separate the individual values, as shown below:

In [1]:
emptyList = []
emptyList

[]

In [2]:
myList = [1, 2.0, "3.0", True]
myList

[1, 2.0, '3.0', True]

In [3]:
list_of_strings = ["hello", "world", "how's", "it", "going"]

list_of_ints = [1, 2, 3, 4, 5]

list_of_bools = [True, False, False, True, True]

In [4]:
print(list_of_strings)

['hello', 'world', "how's", 'it', 'going']


As we mentioned, data as a list can be of multiple different types, shown below...

..but in real-life applications, this can get a bit messy and confusing. Most of the time, your list will contain data of only one type.

In [5]:
multi_type_list = ["it", "is", True, "that", "there", "are", 3, "datatypes",
                   "in", "this", "list"]

In [6]:
print(multi_type_list)

['it', 'is', True, 'that', 'there', 'are', 3, 'datatypes', 'in', 'this', 'list']


#### **Concatenating Lists**

You can easily make a super-list with just using the regular old ``` + ``` operator between lists.

In [7]:
first = ["one", "two", "three"]
second = ["four", "five", "six"]
third = ["seven", "eight", "nine"]

all = first + third + second
all

['one', 'two', 'three', 'seven', 'eight', 'nine', 'four', 'five', 'six']

In [8]:
## Subtraction is not supported
list1 = ["One", "Two"]
list2 = ["One"]

list1 - list2

TypeError: unsupported operand type(s) for -: 'list' and 'list'

#### **Accessing List Elements**

Lists are INDEXED, meaning that each list item is associated with (and uniquely accessible by) a unique integer value.

Python lists are 0-indexed, which means that the first item is accessible via 0, the second item via 1, and so on.

R people: this will take some adjusting—R is 1-indexed!


You can access individual list elements by their index using square bracket notation, such as


```
list[index]
```



In [9]:
myList = [1,2,3,4,5]

In [10]:
strings_list = ['first', 'second', 'third', 'fourth', 'fifth']

#Accessing the index-one item:
print(strings_list[1])

#Accessing the true first item (index 0):
print(strings_list[0])

second
first


In addition to accessing _single_ list elements, lists can also be **sliced** in order to return multiple, sequential list elements.

The syntax for this is:

```
list[start_index:end_index]
```

A weird thing to note here is that while the start index **is** included in the items returned, the end index is NOT. As such, list[1:4] would return items 1, 2, and 3 of list, but not item 4.

In [11]:
#Getting items 0 through 3. Notice that 4 isn't included!
strings_list = ['first', 'second', 'third', 'fourth', 'fifth']

print(strings_list[0:4])

['first', 'second', 'third', 'fourth']


By excluding the number **before** the colon, we can slice from the _start_ up until (but not including) the item after the colon.

By excluding the number **after** the colon, we can slice from the number before the colon up until the _last_ item.

In [12]:
#Slicing from the beginning through (not including) index 2
strings_list[:2]

['first', 'second']

In [13]:
#Slicing from index 2 through the last item
strings_list[2:]

['third', 'fourth', 'fifth']

Finally, we can also use _negative_ numbers as indices, which will start counting from the back of the list and move towards the front.

In [14]:
#Last item
strings_list = ['first', 'second', 'third', 'fourth', 'fifth']

strings_list[-1]

'fifth'

In [15]:
strings_list = ['first', 'second', 'third', 'fourth', 'fifth']

strings_list[-1:]

['fifth']

In addition to just **accessing** list items this way, we can also **change** them.

In [16]:
change_list = ["first", "second", "third"]
change_list[1] = "2nd"
change_list

['first', '2nd', 'third']

In [17]:
#Changing via slice:
change_list_2 = ['first', 'second', 'third', 'fourth', 'fifth', 'sixth']
change_list_2[1:3] = ["2nd", "3rd"]
change_list_2

['first', '2nd', '3rd', 'fourth', 'fifth', 'sixth']

#### **List Methods**

There are several functions built-in to python that allow us to operate on (and return useful information about) lists.

We can **obtain the length** (number of items) of a list with the len() function.

Note that this is the same function we used on strings!

In [18]:
#Using len()
long_list = [1, 3, 5, 7, 9, 123, 243, 98, 143, 5, 2, 34, 123]

len(long_list)

13

In [19]:
short_list = [1, 3]

len(short_list)

2

We can also **append** items to lists. These will (by default) be added to the end of the list.

In [20]:
unfinished_list = ['the', 'quick', 'brown', 'fox', 'jumped', 'over']
unfinished_list.append('the')
unfinished_list.append('lazy')
unfinished_list.append('dog')

print(unfinished_list)

['the', 'quick', 'brown', 'fox', 'jumped', 'over', 'the', 'lazy', 'dog']


**COMMON ERROR ALERT!!**

Notice that the .append(item) function _actually changes_ the list it operates on.

This is DIFFERENT from how we make a string uppercase, for example. When we do this:

```
your_string.upper()
```

...the original string (your_string) isn't changed. If we want to change your_string, we must **overwrite** it, with

```
your_string = your_string.upper()
```

For .append(), all we need to do to _actually change_ our list is:

```
your_list.append('new_item')
```

If we instead try to (mistakenly) do

```
your_list = your_list.append('new_item')
```
your_list will just be overwritten with an empty value, and you will lose your list.

In [21]:
#Example:
your_list = [1, 2, 3, 4]

#Correctly adding an item:
your_list.append(5)
print(your_list)

#Incorrectly adding an item:
your_list = your_list.append(6)
print(your_list)

[1, 2, 3, 4, 5]
None


**Removing** items works in a similar way, with the .remove() method.

In [22]:
your_list = [1,2,"Three",4,5,6,7,"Three"]
your_list.remove("Three")

your_list #Notice, remove changes the array in place!

[1, 2, 4, 5, 6, 7, 'Three']

In [23]:
your_list = [1,2,"Three",4,5,6,7,"Three"]

your_list.pop(3)
your_list

[1, 2, 'Three', 5, 6, 7, 'Three']

In [24]:
your_list = [1,2,"Three",4,5,6,7,"Three"]

del your_list[3]
your_list

[1, 2, 'Three', 5, 6, 7, 'Three']

Generating numerical lists with the **range** function.

 (This will become especially useful later in helping us loop through datasets!)

With the ``` range() ``` function, we can generate a range of numbers, that can then be cast to a list via the ```list()``` method.

The syntax for range is: ```range(start, stop, skip)``` , seen below.


In [25]:
#This will give us a range of the numbers 0 through 9, not including 10
print(range(0,10))
type(range(0, 10))

range(0, 10)


range

In [26]:
list(range(0, 10))

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [27]:
list(range(0, 50, 7))

[0, 7, 14, 21, 28, 35, 42, 49]

In [28]:
#We can cast it to a list!
print(list(range(0,10)))
print(type(list(range(0,10))))

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
<class 'list'>


We can also use the third (optional) "skip" parameter to choose how much space is between each range item.

In [29]:
list(range(0, 30, 3))

[0, 3, 6, 9, 12, 15, 18, 21, 24, 27]

## B: Dictionaries

Dictionaries are another hugely important structure for effectively storing and analyzing data in python.

They are **similar** to lists in a few key ways.
- Both are MUTABLE — meaning that once you've created them, you can change individual values later. We did this in lists, with changing ```list[0] = 'new value'```
- Both are **dynamic** - elements can be added and removed after creation. We showed this with list.remove() and list.append().

But dictionaries have a few key **differences** from lists, as well.
- They are not indexed like lists! They are _unordered_. It doesn't make sense to discuss the "first" item of a dictionary. There is no first, and no last item.
- Instead, dictionaries are a collection of associated **key-value pairs**.

Let's explore this below.

#### **Creating a Dictionary**.

Like lists, dictionaries have their own special syntax that tells the python interpreter that a certain variable should be stored as a dictionary type.

For dictionaries, this syntax is a series of **key:value pairs**, separated by commas, within curly braces {}.

The following is a dictionary mapping UVA departmental abbreviations to their full names (at least, according to lou's list.)

In [30]:
my_first_dict = {
    'AAS':'African-American and African Studies',
    'AMST':'American Studies',
    'ANTH':'Anthropology',
    'ARTH':'History of Art',
    'ASTR':'Astronomy',
    'BIOL':'Biology'
}

type(my_first_dict)

dict

As discussed, dictionary's don't have an order. We can access values by their keys, in quotes inside square brackets...

In [31]:
my_first_dict['AAS']

'African-American and African Studies'

But unless we have a 0:<something> key-value pair in our dictionary, we can't look for the first item (or any other ordered item) with dict[0] or dict[3] etc.

In [32]:
#This doesn't work!
my_first_dict[3]

KeyError: 3

We can add entires (key:value pairs) to an already-existing dictionary through the following syntax:

```
dict['new_key'] = 'new_value'
```

In [33]:
#Adding an additional department:
my_first_dict['CHEM'] = 'Chemistry'

#Seeing that our changes worked:
my_first_dict

{'AAS': 'African-American and African Studies',
 'AMST': 'American Studies',
 'ANTH': 'Anthropology',
 'ARTH': 'History of Art',
 'ASTR': 'Astronomy',
 'BIOL': 'Biology',
 'CHEM': 'Chemistry'}

Likewise, we can **remove** an item with the "del" keyword.

In [34]:
#Delete art history entry:
del my_first_dict['ARTH']

my_first_dict

{'AAS': 'African-American and African Studies',
 'AMST': 'American Studies',
 'ANTH': 'Anthropology',
 'ASTR': 'Astronomy',
 'BIOL': 'Biology',
 'CHEM': 'Chemistry'}

Dictionaries serve as a sort of "lookup table," or a "function," storing an informative value for each key.

In this spirit, dictionary **keys** must be unique, but their **values** don't have to be.



In [35]:
my_duplicate_dict = {
    'AAS':'African-American and African Studies',
    'AMST':'American Studies',
    'ANTH':'Anthropology',
    'ARTH':'History of Art',
    'ASTR':'Astronomy',
    'BIOL':'Biololioliology',
    'BIOL':'Biology'

}

In [36]:
#Notice: Biology isn't returned. the information is lost here!
my_duplicate_dict['BIOL']

'Biology'

If you wanted to store two values (Biology and Biololioliology) with the key 'BIOL', perhaps to indicate that 'BIOL' corresponds to two departments, you could use a **list** as the value for 'BIOL'.

In [37]:
more_informative_dict = {
    'AAS':'African-American and African Studies',
    'AMST':'American Studies',
    'ANTH':'Anthropology',
    'ARTH':'History of Art',
    'ASTR':'Astronomy',
    'BIOL':['Biology','Biololioliology'] # A tiny list here!
}

In [38]:
#This allows us to retain information on both definitions of BIOL.
more_informative_dict['BIOL'][1]

'Biololioliology'

**Dict keys can be any of the primitive datatypes we talked about**.
- Bool
- Float
- Int
- String

...but they can't be the more complex, _mutable_ dataypes we're discussing now, like lists or dictionaries.

In [39]:
#This is all fair game:
varied = {
    'string':135,
    0:True,
    True:False
}

In [40]:
varied[0]

True

In [41]:
#But this is not:
varied_bad = {
    ["list", "as", "a", "key?"]:"NOPE",
    {"dict":"key?", "noooope":"no"}:"STOP"
}

TypeError: unhashable type: 'list'

#### **Lists from dictionaries**

The dictionary type in python gives us several methods to return certain interesting parts of dictionaries as lists.

``` your_dict.keys() ``` returns all the keys as type "dict_keys", which can easily be made into a list through ```list()```

In [42]:
print(my_first_dict.keys())
print(type(my_first_dict.keys()))
print("With list() wrapper:",list(my_first_dict.keys()))

dict_keys(['AAS', 'AMST', 'ANTH', 'ASTR', 'BIOL', 'CHEM'])
<class 'dict_keys'>
With list() wrapper: ['AAS', 'AMST', 'ANTH', 'ASTR', 'BIOL', 'CHEM']


A similar function exists ( ```dict.values()```) to extract only the values.

In [43]:
print(my_first_dict.values())
print("With list() wrapper:", list(my_first_dict.values()))

dict_values(['African-American and African Studies', 'American Studies', 'Anthropology', 'Astronomy', 'Biology', 'Chemistry'])
With list() wrapper: ['African-American and African Studies', 'American Studies', 'Anthropology', 'Astronomy', 'Biology', 'Chemistry']


And if you want both, you can use ```dict.items()```, which will give you a list of key-value pairs.

The key-value pairs are returned as ```tuples```, which we'll learn about in a future lesson.

In [44]:
list(my_first_dict.items())[0]

('AAS', 'African-American and African Studies')

## C: Sets
Sets are unordered collections of unique elements. They are ideal for membership testing and eliminating duplicates.

In [45]:
# Creating a set
example_set = {1, 2, 3, 4, 5, 5}
print("Original Set (duplicates not shown):", example_set)

Original Set (duplicates not shown): {1, 2, 3, 4, 5}


In [46]:
# Adding elements
example_set.add(6)
print("After adding 6:", example_set)

After adding 6: {1, 2, 3, 4, 5, 6}


In [47]:
# Removing elements
example_set.discard(4)  # Removes 4 from the set
print("After discarding 4:", example_set)

After discarding 4: {1, 2, 3, 5, 6}


## D: NumPy Arrays
NumPy arrays are similar to Python lists but are optimized for numerical operations and scientific computing.

In [3]:
import numpy as np

In [49]:
np_array = np.array([1, 2, 3, 4, 5])
print("NumPy Array:", np_array)

NumPy Array: [1 2 3 4 5]


### Elementwise operations

In [4]:
normal_list = [1, 2, 3, 4, 5]
np_array = np.array(normal_list)

In [5]:
print("Array multiplied by 2:", np_array * 2)

Array multiplied by 2: [ 2  4  6  8 10]


In [None]:
print("List multiplied by 2:", normal_list * 2)

### Array Operations

In [6]:
list1 = ["Strings" , "More Strings" , "Even more strings"]
list2 = [1, 2, 3, 4, 5]
list3 = [6, 7, 8, 9, 0]

In [7]:
array1 = np.array([1, 2, 3, 4, 5])
array2 = np.array([6, 7, 8, 9, 0])
array3 = np.array([-1, -2, -3])
array4 = np.array(["string1" , "string2" , "string3" , "string4" , "string5"])

In [52]:
list1 + list2

['Strings', 'More Strings', 'Even more strings', 1, 2, 3, 4, 5]

In [53]:
array1 + array2

array([ 7,  9, 11, 13,  5])

In [8]:
# Shapes don't match!
array1 + array3

ValueError: operands could not be broadcast together with shapes (5,) (3,) 

In [55]:
# Types don't match!
array1 + array4

UFuncTypeError: ufunc 'add' did not contain a loop with signature matching types (dtype('int64'), dtype('<U7')) -> None

In [56]:
# Adding lists do not numerically add elements, they just concatenate them!
list1 + list1

['Strings',
 'More Strings',
 'Even more strings',
 'Strings',
 'More Strings',
 'Even more strings']

In [57]:
# Strings can't "numerically" be added!
array4 + array4

UFuncTypeError: ufunc 'add' did not contain a loop with signature matching types (dtype('<U7'), dtype('<U7')) -> None

In [58]:
np.multiply(array1, array2)

array([ 6, 14, 24, 36,  0])

In [59]:
np.prod(array1)

120

In [60]:
array1*array2

array([ 6, 14, 24, 36,  0])

### NumPy Functions

In [61]:
print("Mean of the array:", np.mean(np_array))

Mean of the array: 3.0


In [62]:
print("Standard deviation of the array:", np.std(np_array))

Standard deviation of the array: 1.4142135623730951


In [63]:
print("Array sum:", np.sum(np_array))

Array sum: 15


In [64]:
print("Cumulative sum of the array:", np.cumsum(np_array))

Cumulative sum of the array: [ 1  3  6 10 15]


In [65]:
print("Element-wise square of the array:", np.square(np_array))

Element-wise square of the array: [ 1  4  9 16 25]
