# Introduction to Advanced Python Data Types

### Learning Goals

  - Objects and data containers
  - Specific Python containers
    - Strings
    - Lists
    - Tuples
    - Dictionaries
    - Sets

In the previous tutorial, we focused on Python's simple data types, like `int`, `float`, and `bool`. In general, these simple object types store individual values (e.g, a = 10). Today, we will talk about objects that can contain multiple values of data, such as `lists` and `tuples` and, yes, `strs` which we've already met.  

All of these, whether they hold single values or multiple things, have one important aspect in common: they are all "**objects**" in Python. When we do an assignment, like `py = 3.14`, Python creates an object in your computer's memory, in this case of type `float` and then tags it with the label `py`. The object contains the value (3.14 in this case), and some other goodies we'll learn about later. The tag is not the object and vice versa. The tag, what we'll often call the "variable name", is the way we grab the value stored in the object in order to use it.

### Python Strings

Even though we covered them in the last tutorial, a `str` is a little bit different than, say, an `int`. While an `int` contains a single value, and `str` actually contains a bunch of characters, or strings of length one. For example, consider this string:

In [5]:
mystring = 'This string contains letters, spaces, etc., and 68 total characters!'
print(mystring)
print("mystring is a ", type(mystring))

This string contains letters, spaces, etc., and 68 total characters!
mystring is a  <class 'str'>


While normal people care about the words and the punctuation, computers don't, and nor often do programmers. So another way to look at this objects is to see how many things it contains, and we can do this with the `len()` (length) function:

In [6]:
len(mystring)

68

So `mystring` actually contains 68 things: each letter, space, numerical digit, and punctuation mark, generally called "characters".

We can even look inside our string using "**indexing**" (much more on this both later in this tutorial and in the weeks and months to follow). We index using square brackets `[ ]` so, for example, try this:

In [6]:
mystring[3]

's'

Here, `mystring[3]` gave us one of the *elements* of `mystring`.

Python Strings, unlike `int`s or `float`s, are a type of [*container*](https://en.wikipedia.org/wiki/Container_(abstract_data_type)). Containers are data types that can contain multiple objects. Whereas a variable has a name, like `a`, attached to a single value, like `2`, containers contain multiple values and often (as we will see later) contain variables of different types (`int` and `str` etc). 

In fact, the very reason `mystring` has a  *length*, which we got with the `len()` function, is because it is a container. A `int`, not being a container, doesn't have a length. Let's verify this:

In [7]:
myint = 42
len(myint)

TypeError: object of type 'int' has no len()

Whoops, we get an error! Because an `int` is always a *single* object, Python sees no need to keep track of its length.

Containers are one of the major ways Python offers to work with data. Later on, we will learn about specialized containers for data science, like (the [Pandas DataFrame](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html)). 

First, however, we will learn about the built-in Python containers: *strings*, *lists*, *dictionaries*, *tuples*, and *sets*. Of these, the first 3 are used most often, so we'll focus on those.

#### Simple indexing

As we saw above, a string is a sequence of characters, generally a human-readable chunk of text.

In [9]:
mystring = 'A sequence of characters!'

Because it's a *sequence* (that is, because the characters have an *order*, unlike the letters in a Scrabble bag), we can pluck out individual characters or "elements" by indexing, as we saw briefly above.

In [10]:
mystring[6]

'e'

A single element of a string is still a string:

In [11]:
a_char = mystring[6]
type(a_char)

str

And since it's still a string, it still has a length:

In [12]:
len(a_char)

1

So any string of length > 1 is really a container of other strings of length 1 – how meta!

---

In the cell below, try a `mystring[1]`

' '

Was that what you expected?

---

Indexing in Python is *zero-based*. So what you might expect to be the first element of something is actually the "zeroith" element. The way to think about it is that the "index" is really an *offset* from the beginning of the container, i.e. the first element. So `mystring[1]` is saying "Whatever is 1 element over from the beginning of the string", which is a space (`' '`).

As we mentioned before, we've got *much* more indexing to go – it's a huge part of data science!

#### Strings are immutable

Like some other data containers we'll meet below, strings are *immutable* – so once you create one, you cannot change it. For example, based on what we just learned, you might reasonably think that, if we can get a specific value using indexing with `[index]`, we should be able to set a value the same way. Let's try:

In [32]:
mystring[0] = '1'

TypeError: 'str' object does not support item assignment

That throws an error, and Python tells us that strings do not support assignment (directly setting values).

Instead, if you want to "change" your string, you need to make a new string, which is a new immutable container object.

In [13]:
new_str = '1 sequence of characters'

### Python Lists

#### Lists are containers of indexed, ordered and *mutable* data. 

Lists can be used to contain multiple elements of *any* kind. Lists are created by writing a set of values inside square brakets:

In [14]:
mylist = [2, 3, 4, 5] # this list contains 4 numbers
print(mylist)

[2, 3, 4, 5]


#### Lists can contain elements of any type

In [15]:
list_of_int = [10, 4, 2, 5]                                  # This list contains integers
print(list_of_int)
type(list_of_int)

[10, 4, 2, 5]


list

In [16]:
list_of_float = [2.3, 4.3, 5.5, 6.1]                         # This list contains floating numbers
print(list_of_float)
type(list_of_float)

[2.3, 4.3, 5.5, 6.1]


list

In [19]:
list_of_boolean = [True, False, True, True]                  # This list contains boolean values 
print(list_of_boolean)
type(list_of_boolean)

[True, False, True, True]


list

In [20]:
list_of_strings = ['this', 'is', 'a', 'list','of','strings'] # This list contains strings
print(list_of_strings)
type(list_of_strings)

['this', 'is', 'a', 'list', 'of', 'strings']


list

Notice that, because strings are containers, `list_of_strings` is actually a *container of containers*!

---

In the cell below, see if Python will let you create a list with elements of *different types*:

---

#### Lists are mutable

As mentioned in passing above, unlike strings, a list is *mutable*, meaning that we can change the elements if you want:

Let's remind ourselves what out list of `floats` was:

In [21]:
list_of_float

[2.3, 4.3, 5.5, 6.1]

Now let's try to change one of the elements by indexing it:

In [22]:
list_of_float[3] = 10.1
list_of_float

[2.3, 4.3, 5.5, 10.1]

So, unlike a `str`, a `list` will allow you to change its elements!

Also, remember the distinction between objects themselves and their tags. We gave the above `list` the tag `list_of_float`, but that tag is just a description for human readers. To Python, that's just an arbitrary tag. So, if we wished, we could change one of the elements of `list_of_float` to a `str` like this:

In [23]:
list_of_float[3] = "I'm not a float"
list_of_float

[2.3, 4.3, 5.5, "I'm not a float"]

Now out "`list_of_float`" contains a `str`, which we can verify:

In [24]:
print(type(list_of_float[3]))

<class 'str'>


#### Lists can contain lists

Finally (for now), one crucial thing about lists is that *lists can contain lists!* So if we do the following:

In [25]:
multi_list = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
multi_list

[[1, 2, 3], [4, 5, 6], [7, 8, 9]]

Now `multi_list` is a list of lists; we can get the first list like this:

In [26]:
multi_list[0]

[1, 2, 3]

Now, what if we wanted the 2nd element of the first list? We could do it using a "temporary variable" like this:

In [27]:
tmp = multi_list[0]  # get the first list in multi_list, and store it in "tmp"
tmp[1]               # get the second element of the list in "tmp"

2

But Python lets us do this in one go like this:

In [28]:
multi_list[0][1]

2

What's happening here is that Python first evaluates `multi_list[0]` to give you the first list in `multi_list`, and then the `[1]` gives you the second element of that list.

---

In the cell below, use this compact indexing technique (`list[index][index]`) to get the "i" in "list" from `list_of_strings`.

---

#### A list of lists can be thought of as a table or matrix

Now, here's a really cool thing: we can think of the values in `multi_list` as being laid out in a table or *matrix* (like data in a spreadsheet) like this:

|  multi_list | Column 1  | Column 2  | Column 3  |
|:----------|:----------|:----------|:----------|
| **Row 1** |   1    |   2    |   3    |
| **Row 2** |   4    |   5    |   6    |
| **Row 3** |   7    |   8    |   9    |

in which every ***row*** is one of the lists in `multi_list`!

So now we can think of, say, this:

In [54]:
multi_list[1][2]

6

As specifing the *row* and *column indexes*, like  
`multi_list[row index = 1][column index = 2]`

In fact, if you want to think of a list of lists like a table or matrix, you can make this more obvious when you first make your object:

In [29]:
like_a_matrix = [[1, 1, 2],
                 [3, 5, 8],
                 [13, 21, 34]]
like_a_matrix

[[1, 1, 2], [3, 5, 8], [13, 21, 34]]

To print it like a matrix though, you'd have to get a little more cute with `print()`.

In [30]:
print(like_a_matrix[0], "\n", like_a_matrix[1], "\n", like_a_matrix[2], "\n")

[1, 1, 2] 
 [3, 5, 8] 
 [13, 21, 34] 



Here, the "\n" is how you tell `print()` to "hit return", that is, to start a "**n**"ew line.

### Python Tuple

Tuples are ordered collections of data. They are similar to lists but they are *immuatable*. Whereas you can add or change elements to a previously defined list, you cannot do that with tuples. Tuples are thus great for data that you don't anyone to mess with. So, for example, raw data from an experiment, being sacred, would go in a tuple. The results of calculations done on the data, however, would go in a list because you might want to change the calculations without having to make a new list every time.

Tuples are defined with parenthesis:

In [31]:
mytuple = (9,4,5)
print(mytuple)
type(mytuple)

(9, 4, 5)


tuple

#### Tuples are just like lists in some key ways

##### *Tuples can hold any other object (just like lists)*

In [34]:
mytuple2 = (4, 'four', 'IV', [1, 0, 0])
mytuple2

(4, 'four', 'IV', [1, 0, 0])

##### *Tuples are indexed just like lists*

In [35]:
mytuple[0]

9

In [37]:
mytuple2[3][0]

1

#### Tuples differ from lists in one key way

##### *Tuples are immutable*

Because tuples are immutable, we can't just change a value in a tuple if we wish.

So this will work just fine:

In [38]:
mylist = [10, 'ten', 'X', 1010]
mylist

[10, 'ten', 'X', 1010]

In [40]:
mylist[3] = "ten is 1010 in binary"
mylist[3]

'ten is 1010 in binary'

But this will not:

In [41]:
mytup = (10, 'ten', 'X', 1010)
mytup

(10, 'ten', 'X', 1010)

In [42]:
mytup(3) = "ten is 1010 in binary"  
mytup(3)

SyntaxError: cannot assign to function call (3515652462.py, line 1)

### Python Set

A set is defined as an unordered, unidexed and immutable collection of items. Whereas lists are defined by `[]`, and tuples are defined by `()`, sets are defined by `{}`.

In [55]:
myset = {"A", "B", "C", "D"}
print(myset)
type(myset)

{'B', 'D', 'C', 'A'}


set

Unordered means that items in the set do not have an assigned order, so they cannot be indexed. Back to our scrabble analogy, it doesn't make any sense to talk about the "third" tile in a scrabble bag of letters; the letters in the bag have no order, they're just jumbbled in a bag. Notice above that the elements of the set did *not* print in the order used to make the set. 

We can make the lack or order for set clear by testing two sets for equality:

In [58]:
{"A", "B", "C", "D"} == {"D", "C", "A", "B"}

True

So two sets are the same *as long as they contain the same elements*. For lists (and tuples) to be the same, however, the elements also have to be in the same order. So this is `False`: 

In [57]:
["A", "B", "C", "D"] == ["D", "C", "A", "B"]

False

Here's an interesting riddle about sets: What will the following give you? (Think about it before you run it.)

In [62]:
{"A", "B", "C", "D"} == {"D", "C", "A", "B", "D"}

True

What's going on here? By definition, each item in set is *unique*. If you try to specify duplicates in a set, they will be ignored:

In [63]:
myset2 = {"C", "K", "E", "D", "D"}
print(myset2)

{'D', 'C', 'E', 'K'}


So `myset2` contains only one "D".

Because sets are unorderd, it doesn't make any sense to try to index them. If you try to ask for the element of set at offset 1, you'll get an error:

In [52]:
myset[1]

TypeError: 'set' object is not subscriptable

This also means that a set is immutable and elements cannot be replaced:

In [53]:
myset[2] = "F"

TypeError: 'set' object does not support item assignment

You can make a set from a list.  Lets say, for example, people signed up for something on your organization's website, and their names were automatically stored in a list. But, for various reasons, some people signed up twice. 

In [70]:
# A real list would be much longer, but...
name_list = ["John", "Xie", "Julia", "Kat", "Ahmed", "John"]
name_list

['John', 'Xie', 'Julia', 'Kat', 'Ahmed', 'John']

In [71]:
# Convert list to set
name_set = set(name_list)
print(name_set) 

{'Xie', 'Julia', 'John', 'Ahmed', 'Kat'}


The duplicates are now removed! Also, compare the order of the list with the (arbitrary) order in which the members of set were printed.

Sets have some cool properties; they are the Python implementation of the "sets" you learned about in high school or college – remember Ven diagrams? – the overlapping circles with stuff in them? 

#### Set operations

Python has special operators for comparing two sets: 

| operator  | action  |
|:----------:|:----------|
|   `\|`   |   union   |
|   `& `    |   intersection   |
|   `-`     |   difference   |
|   `^`     |   symmetric difference   |

Let's remind ourselves what `myset` and `myset2` contain:

In [69]:
print(myset, "\n", myset2)


{'B', 'D', 'C', 'A'} 
 {'D', 'C', 'E', 'K'}


Now we can check out (or remember for high school) what each of the set operators do.

##### *Union* - all elements in either set

In [73]:
myset | myset2

{'A', 'B', 'C', 'D', 'E', 'K'}

##### *Intersection* - only elements in *both* sets

In [72]:
myset & myset2

{'C', 'D'}

##### *Difference* - only elements in the first set but *not* the second set

In [74]:
myset - myset2

{'A', 'B'}

##### *Symmetric Difference* - only elements that are in one set but *not* the other

In [76]:
myset ^ myset2

{'A', 'B', 'E', 'K'}

You might not use `sets` that frequently, but don't forget about them! Many people have fallen into the trap of writing code to compare `lists` when the task could have easly been accomplished by just comparing `sets`.

### Python Dictionaries

A dictionary is an ordered, changeable, collection of items but do not allow duplicates. Dictionaries are defined by pairs of variables where one of the two variables in the pair is generally thought of as a label, the other one as a value. Whereas lists are defined by `[]`, tuple are defined by `()`, dictionaries just liek sets are defined by `{}`. 

The structure of the content of dictionaries is differently organized than, lists, tuples and sets:

In [None]:
mydictionary = {
    "apples": 3, 
    "oranges": 5, 
    "bananas": 2, 
    "pinapples": 1}
print(mydictionary)
type(mydictionary)

Dictionaries allow efficient and human friendly addressing of items. For example, we can recall how many organes we have in the dictionary by directly calling the oranges label and the corresponding value in the pair of values will be returned:

In [None]:
mydictionary["oranges"]

$\color{blue}{\text{Answer the following questions.}}$

  - Use the cell below to write code that let you find out how many bananas we have in our container `mydictionary` of fruits?
  [Report your answer here]

### A few exercises as a reminder

$\color{blue}{\text{Answer the following questions.}}$

  - Build a `List` of pets called `home`, where each pet name (a string) is followed by the number of pets I own and have at home (say (cats, 2, dogs, 1, etc)?
  
  [Show your list in the cell below]

  - Build a dictionary called `home_dict`, similar to the `home` list but using the features of the dictionary where a pet name can be associated to its correspoding number.
  
    [Show your dictionary in the cell below]

  - Find the number of dogs in `home` and in `home_dict`.
  
    [Show your code in the cell below]

 [Use this cell, to explain in your own words how you found the dog and the corresponding number of dogs in `home` first and in `home_dict` after that] 
 
 

### More practicing with using Python Lists

Although Lists, Containers, Tuple, and Dictionary are all interesting and important datatypes in python, we will use primarily Lists in thsi class. Below some additional material to practice making lists and accessing them.

Lists in Python are just what they sound like, lists of things. We make them using `[square brackets]`.

In [None]:
mylist = [1, 3, 5, 7, 11, 13]

A list is an extension of a regular (or "scalar") `variable`, which can only hold one thing at the time.

In [None]:
notalist = 3.14

In [None]:
print(notalist)

In [None]:
print(mylist)

Aside: If we're working with only numbers, then you can think of a regular variable as a "scalar" and a list as a "vector".

Lists can, however, hold things besides numbers. For example, they can hold 'text'.

In [None]:
mylist2 = ['this', 'is', 'a', 'list', 'of', 'words']

In [None]:
mylist2

(Some people, even us, might casually call this a vector but that's technically not true.)

In reality, lists can hold all sort of things, say numbers (scalars), 'text' and even other lists, and all at once.

In [None]:
mylist3 = [1, 'one', [2, 3, 4]]

In [None]:
mylist3

Note that this last list holds a list at index=2

We can get elements of a list by using `index` values in square brackets.

In [None]:
mylist

In [None]:
mylist[5]

In [None]:
mylist3[2]

**Python uses a 0-based indexing, not a 1-based indexing (the first value in container is indexed with the number 0). This means that the first value in a list is at index=0, not index=1. This is different than many other languages including R and MatLab!**

We can address more than one element in a list by using the `:` (colon) operator.

In [None]:
mylist[0:3]

We can read this as "Give me all the elements in the interval between 0 **inclusive** to 3 **exclusive**."

I know this is weird. But at least for any two indexes `a` and `b`, the number of elements you get back from `mylist[a,b]` is always equal to `b` minus `a`, so I guess that's good!

We can get any consecutive hunk of elements using `:`.

In [None]:
mylist[2:5]

If you omit the indexes, Python will assume you want everything.

In [None]:
mylist[:]

That doesn't seem very useful... But, actually, it will turn out to be **really** useful later on, when we will start using numpy arrays!

If you just use one index, the `:` is assumed to mean "from the beginning" or "to the end". Like this:

In [None]:
mylist[:3] # from the beginning to 3

And this:

In [None]:
mylist[3:] # from 3 to the end

In addition to the `list[start:stop]` syntax, you can add a step after a second colon, as in `list[start:stop:step]`. This asks for all the element between `start` and `stop` but in steps of `step`, not necessarily consecutive elements. For example every other element:

In [None]:
mylist[0:5:2] # get every other element

As you've probably figured out, all our outputs above have been lists. So if we assign the output a name, it will be another list.

In [None]:
every_other_one = mylist[0:-1:2] # could also do mylist[0::2]

In [None]:
every_other_one

See!

If we want a group of elements that aren't evenly spaced, we'll need to specify the indexes "by hand".

In [None]:
anothernewlist = [mylist[1],mylist[2],mylist[4]]

In [None]:
anothernewlist

So those are the basics of lists. They:

* store a list of things (duh)
* start at index zero
* can be accessed using three things together:
    - square brackets `[]`
    - integer indexes (including negative "start from the end" indexes)
    - a colon `:` (or two if you want a step value other than 1)
    


In [None]:
letters = [2, 4,  3, 6, 5, 8, 11, 10, 9, 14, 13, 12, 7]
# 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'

string_letters = str(letters)
lists_letters = list(letters)
tuples_letters = tuple(letters)
sets_letters = set(letters)


print("String: ", string_letters)
print("Lists: ", lists_letters)
print("Tuples: ", tuples_letters)
print("Sets: ", sets_letters)