# Introduction to Advanced Python Data Types

### Learning Goals

  - Advanced python data types
  - The concept of data container
  - Python Containers
    - Lists
    - Strings
    - Tuples
    - Dictionaries
    - Sets

In the previous tutorial we have focussed the simple data types in python, `int`, `float`, `complex`, `bool`, and `str`. In general, these simple variables types store individual variable values (e.g, a = 10). The complex number type (b = 2 + 2j) is sort of exception, as it stores 2 values, a point on the "complex plane", but it's still considered a "number".

If you look back at the previous turorial, `str` (strings) are a little bit different. They can store multiple characters all at once. For example:

In [5]:
mystring = 'this string contains five words (and 7 spaces)'
print(mystring)

this string contains five words (and 7 spaces)


We can even look inside our string using "indexing" (much more on this in the weeks and months to follow). We index using square brackets `[ ]` so, for example, try this:

In [6]:
mystring[3]

's'

Python Strings are a type of [*container*](https://en.wikipedia.org/wiki/Container_(abstract_data_type)). Containers are data types that can contain multiple objects. Whereas a variable has a name, like `a`, attached to a single value, like `2`, containers contain multiple values and often (as we will see later) contain variables of different types (`int` and `str` etc). 

Because containers hold multiple things, they have *length*, which we can get with the `len()` function.

In [None]:
len(mystring)

Containers are one of the major ways Python offers to work with data. Later on, we will learn about specialized containers for data science, like (the [Pandas DataFrame](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html)). 

First, however, we will learn about the built-in Python containers: *strings*, *lists*, *dictionaries*, *tuples*, and *sets*. Of these, the first 3 are used most often, so we'll focus on those.

### Python Strings

As we saw above, a string is a sequence of characters, generally a human-readable chunk of text.

In [7]:
mystring = 'a sequence of characters'

Because it's a *sequence* (that is, because, the characters have an *order*, unlike the letters in a Scrabble bag), we can pluck out individual characters or "elements" by indexing, as we saw briefly above.

In [8]:
mystring[6]

'e'

There is no such thing as a "character" type or `chr` in Python, so a single element of a string is still a string:

In [18]:
a_char = mystring[6]
type(a_char)

str

And since it's still a string, it still has a length:

In [19]:
len(a_char)

1

---

In the cell below, try a `mystring[1]`

' '

Was that what you expected?

Indexing in Python is *zero-based*. So what you might expect to be the first element of something is actually the "zeroith" element. The way to think about it is that the "index" is really an *offset* from the beginning of the container, i.e. the first element. So `mystring[1]` is saying "Whatever is 1 element over from the beginning of the string", which is a space (`' '`).

You can index from the end too. Try a `mystring[-1]`:

In [11]:
mystring[-1]

's'

Wait, what happened there? Is indexing from the end one-based and indexing from the beggining is zero-based? Well, functionally "yes", but it actually makes sense and is consistent.

First, `-0` is the same as `0`:

In [20]:
-0 == 0

True

So if we do a `mystring[-0]`...

In [21]:
mystring[-0]

'a'

We get the first element, `a`, which makes sense; it's the same as `mystring[0]`. 

Now let's do a `len(mystring) - 1`:

In [15]:
len(mystring) - 1

23

Which corresponds to the *offset* of the last element. So now let's get the last element the "long" way:

In [16]:
mystring[len(mystring) - 1]

's'

Which works just fine. But why do that when we can just do `mystring[-1]`?!

In [22]:
mystring[-1]

's'

We'll do much more indexing as we go but, briefly, we can grab multiple elements by "slicing" using the `:` (colon) operator. Like this:

In [24]:
sub_str = mystring[0:10]
sub_str

'a sequence'

Which can be read as "Start at offset 0 and give me 10 elements."

Notice that means that we do *not* get the element at offset 10, so

In [27]:
mystring[10]

' '

is a space. But if we look at the last element of `sub_str`:

In [28]:
sub_str[-1]

'e'

it's an "e", and if we try this:

In [26]:
sub_str[10]

IndexError: string index out of range

We get an error telling us that there isn't anything at offset 10! But we *did* get 10 elements, which we can confirm by:

In [30]:
len(sub_str)

10

As we mentioned before, we've got *much* more indexing to go – it's a huge part of data science!

One final thing about strings is that they are *immutable* – so once you create one, you cannot change it. So, for example, if you wanted a string that said "1 sequence of characters", you cannot do this:

In [32]:
mystring[0] = '1'

TypeError: 'str' object does not support item assignment

Instead, you need to make a new string, which you could do with minimal typing like this:

In [39]:
new_str = '1' + mystring[1:]
new_str

'1 sequence of characters'

Note that if you don't specify the number of elements after the `:`, Python interprets it as "to the end". Again, we're going to get lots of indexing practice as we go!

### Python Lists

#### Lists are containers of indexed, ordered and *mutable* data. 

They can be used to contain multiple elements of *any* kind. Lists are created by writing a set of values inside square brakets:

In [40]:
mylist = [2, 3, 4, 5] # this list contains 4 numbers
print(mylist)

[2, 3, 4, 5]


#### Lists can contain elements of any type

In [41]:
list_of_int = [10, 4, 2, 5]                                  # This list contains integers
print(list_of_int)
type(list_of_int)

[10, 4, 2, 5]


list

In [55]:
list_of_float = [2.3, 4.3, 5.5, 6.1]                         # This list contains floating numbers
print(list_of_float)
type(list_of_float)

[2.3, 4.3, 5.5, 6.1]


list

In [56]:
list_of_boolean = [True, False, True, True]                  # This list contains boolean values 
print(list_of_boolean)
type(list_of_boolean)

[True, False, True, True]


list

In [45]:
list_of_strings = ['this', 'is', 'a', 'list','of','strings'] # This list contains strings
print(list_of_strings)
type(list_of_strings)

['this', 'is', 'a', 'list', 'of', 'strings']


list

---

In the cell below, see if Python will let you create a list with elements of *different types*:

---

#### Lists are mutable

As mentioned in passing above, unlike strings, a list is *mutable*, meaning that we can change the elements if you want:

In [59]:
list_of_boolean[3] = False
list_of_boolean

[True, False, True, False]

#### Lists can contain lists

Finally (for now), one crucial thing about lists is that *lists can contain lists!* So if we do the following:

In [49]:
multi_list = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
multi_list

[[1, 2, 3], [4, 5, 6], [7, 8, 9]]

Now `multi_list` is a list of lists; we can get the first list like this:

In [51]:
multi_list[0]

[1, 2, 3]

Now, what if we wanted the 2nd element of the first list? We could do it using a "temporary variable" like this:

In [52]:
tmp = multi_list[0]
tmp[1]

2

But Python lets us do this in one go like this:

In [53]:
multi_list[0][1]

2

#### A list of lists can be thought of as a table or matrix

Now, here's a really cool thing: we could think of our list of lists as being laid out in a table (or like a spreadsheet) like this:

|  multi_list | Column 1  | Column 2  | Column 3  |
|:----------|:----------|:----------|:----------|
| **Row 1** |   1    |   2    |   3    |
| **Row 2** |   4    |   5    |   6    |
| **Row 3** |   7    |   8    |   9    |


in which every ***row*** is one of the lists in `multi_list`!

So now we can think of, say, this:

In [54]:
multi_list[1][2]

6

As specifing the row and column indexes, like  
`multi_list[row index = 1][column index = 2]`

### Python Tuple

Tuples are ordered collections of data. They are similar to lists but immuatable. Whereas you can add new elements to a previously defined lists you cannot do that with Tuples.

Tuples are defined with parenthesis:

In [None]:
mytuple = (9,4,5)
print(mytuple)
type(mytuple)

$\color{blue}{\text{Answer the following questions.}}$

  - Can you create a Tuple with different types of variable in it?
  [Use the cell below to create a list containing different types of variables and the following one to describe the result of the experiment in your own words]

[Describe the results of the experiment above here:]



### Add something about indexing tuples

Lists and Tuple differ in a variety of ways. The most noticble one is that lists are mutable, tuple are immutable. This means that, we can add elements to a list but not a tuple (in the example we will use the method `.append` to attempt to add elements to a list and a tuple):

In [None]:
mylist = [1,2,3] # this list has three elements
print(mylist)
type(mylist)

Now, we attempt to add a forth element to the list (using `.append()`)

In [None]:
mylist.append(4) 
print(mylist)
type(mylist)

In [None]:
mytuple = (1,2,3)
print(mytuple)
type(mytuple)

Now, we attempt to add a forth element to the tuple

In [None]:
mytuple.append(4) # it does not work

*note. You can explore all the other methods of a container (or any variable in Python) by typing: `<containerName>.` and then hit the `TAB` key on your keyboard.*  Don't forget the "**.**" after the name of your container!

$\color{blue}{\text{Answer the following questions.}}$

  - How many methods does a `Tuple` have?
  [Report your answer here]
  
  - Which methods does a `List` have?
  [Report your answer here]
  

### Python Set

A set is defined as an unordered, unidexed and immutable collection of items. Whereas lists are defined by `[]`, tuple are defined by `()`, sets are defined by `{}`.

In [62]:
myset = {"A", "B", "C", "D"}
print(myset)
type(myset)

{'B', 'D', 'C', 'A'}


set

Unordered means that items in the set do not have an assigned order, so they cannot be indexed: 

In [63]:
myset[1]

TypeError: 'set' object is not subscriptable

This is means also that a set is immutable and elements cannot be replaced:

In [None]:
myset{2} = "F"

This is different from Tuple and Lists:

In [None]:
mylist = ["A","B","C","D"]
print(mylist[2])

In [None]:
mylist[3] = "F"
print(mylist)

$\color{blue}{\text{Answer the following questions.}}$

  - In the line above I changed the element of `mylist` indexed by the number `3`, why did that operation change the last element in the list?
  [Report your answer here]
  

Sets have some cool properties, similar to the "sets" you might have studied in highschool or college. We can perform union operations with sets. 

Ok so let's remind ourselves what `myset` contained:

In [None]:
print(myset)

Now let's create a new set (`myset2`) with numbers in it, and then construct the union of `myset` and `myset2`, we will save the union set in a variable called `myset3`:

In [None]:
myset2 = {'E', 'F', 'G'}
print(myset2)
myset3 = {}
myset3 = myset.union(myset2)
print(myset3)

$\color{blue}{\text{Answer the following questions.}}$

  - can you repeate the same union experiment with lists? For example, can you take the list `mylist`  create a second one called `mylist2` containing other elements (your pick) and create a third list `mylist3` that is the union of mylist` and `mylist2`?
  
  [Use the cell below to try the experiment]
  
  [Report your verbal description of the results of the experiment here.]
  

$\color{blue}{\text{Answer the following questions.}}$

  - can `lists` or `sets` contain a mixture of different data types, say letters, float numbers and integers? Attempt to make a list containing letters, float numbers and integers call it `myWonderList`, show its result using `print()`, then also make a set containing letters, float numbers and integers, call it `myWonderSet`, show its result using `print()`:

  [Use the cell below to implement the experiment]
  
  [Report your verbal description of the results of the experiment here.]
  

set from a list

In [64]:
# List with duplicates
my_list = [1, 2, 3, 4, 4, 4, 5, 6, 6, 7, 8, 8, 8]

# Convert list to set
my_set = set(my_list)

print(my_set)  # Output: {1, 2, 3, 4, 5, 6, 7, 8}


{1, 2, 3, 4, 5, 6, 7, 8}


### Python Dictionaries

A dictionary is an ordered, changeable, collection of items but do not allow duplicates. Dictionaries are defined by pairs of variables where one of the two variables in the pair is generally thought of as a label, the other one as a value. Whereas lists are defined by `[]`, tuple are defined by `()`, dictionaries just liek sets are defined by `{}`. 

The structure of the content of dictionaries is differently organized than, lists, tuples and sets:

In [None]:
mydictionary = {
    "apples": 3, 
    "oranges": 5, 
    "bananas": 2, 
    "pinapples": 1}
print(mydictionary)
type(mydictionary)

Dictionaries allow efficient and human friendly addressing of items. For example, we can recall how many organes we have in the dictionary by directly calling the oranges label and the corresponding value in the pair of values will be returned:

In [None]:
mydictionary["oranges"]

$\color{blue}{\text{Answer the following questions.}}$

  - Use the cell below to write code that let you find out how many bananas we have in our container `mydictionary` of fruits?
  [Report your answer here]

### A few exercises as a reminder

$\color{blue}{\text{Answer the following questions.}}$

  - Build a `List` of pets called `home`, where each pet name (a string) is followed by the number of pets I own and have at home (say (cats, 2, dogs, 1, etc)?
  
  [Show your list in the cell below]

  - Build a dictionary called `home_dict`, similar to the `home` list but using the features of the dictionary where a pet name can be associated to its correspoding number.
  
    [Show your dictionary in the cell below]

  - Find the number of dogs in `home` and in `home_dict`.
  
    [Show your code in the cell below]

 [Use this cell, to explain in your own words how you found the dog and the corresponding number of dogs in `home` first and in `home_dict` after that] 
 
 

### More practicing with using Python Lists

Although Lists, Containers, Tuple, and Dictionary are all interesting and important datatypes in python, we will use primarily Lists in thsi class. Below some additional material to practice making lists and accessing them.

Lists in Python are just what they sound like, lists of things. We make them using `[square brackets]`.

In [None]:
mylist = [1, 3, 5, 7, 11, 13]

A list is an extension of a regular (or "scalar") `variable`, which can only hold one thing at the time.

In [None]:
notalist = 3.14

In [None]:
print(notalist)

In [None]:
print(mylist)

Aside: If we're working with only numbers, then you can think of a regular variable as a "scalar" and a list as a "vector".

Lists can, however, hold things besides numbers. For example, they can hold 'text'.

In [None]:
mylist2 = ['this', 'is', 'a', 'list', 'of', 'words']

In [None]:
mylist2

(Some people, even us, might casually call this a vector but that's technically not true.)

In reality, lists can hold all sort of things, say numbers (scalars), 'text' and even other lists, and all at once.

In [None]:
mylist3 = [1, 'one', [2, 3, 4]]

In [None]:
mylist3

Note that this last list holds a list at index=2

We can get elements of a list by using `index` values in square brackets.

In [None]:
mylist

In [None]:
mylist[5]

In [None]:
mylist3[2]

**Python uses a 0-based indexing, not a 1-based indexing (the first value in container is indexed with the number 0). This means that the first value in a list is at index=0, not index=1. This is different than many other languages including R and MatLab!**

We can address more than one element in a list by using the `:` (colon) operator.

In [None]:
mylist[0:3]

We can read this as "Give me all the elements in the interval between 0 **inclusive** to 3 **exclusive**."

I know this is weird. But at least for any two indexes `a` and `b`, the number of elements you get back from `mylist[a,b]` is always equal to `b` minus `a`, so I guess that's good!

We can get any consecutive hunk of elements using `:`.

In [None]:
mylist[2:5]

If you omit the indexes, Python will assume you want everything.

In [None]:
mylist[:]

That doesn't seem very useful... But, actually, it will turn out to be **really** useful later on, when we will start using numpy arrays!

If you just use one index, the `:` is assumed to mean "from the beginning" or "to the end". Like this:

In [None]:
mylist[:3] # from the beginning to 3

And this:

In [None]:
mylist[3:] # from 3 to the end

In addition to the `list[start:stop]` syntax, you can add a step after a second colon, as in `list[start:stop:step]`. This asks for all the element between `start` and `stop` but in steps of `step`, not necessarily consecutive elements. For example every other element:

In [None]:
mylist[0:5:2] # get every other element

As you've probably figured out, all our outputs above have been lists. So if we assign the output a name, it will be another list.

In [None]:
every_other_one = mylist[0:-1:2] # could also do mylist[0::2]

In [None]:
every_other_one

See!

If we want a group of elements that aren't evenly spaced, we'll need to specify the indexes "by hand".

In [None]:
anothernewlist = [mylist[1],mylist[2],mylist[4]]

In [None]:
anothernewlist

So those are the basics of lists. They:

* store a list of things (duh)
* start at index zero
* can be accessed using three things together:
    - square brackets `[]`
    - integer indexes (including negative "start from the end" indexes)
    - a colon `:` (or two if you want a step value other than 1)
    


In [None]:
letters = [2, 4,  3, 6, 5, 8, 11, 10, 9, 14, 13, 12, 7]
# 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'

string_letters = str(letters)
lists_letters = list(letters)
tuples_letters = tuple(letters)
sets_letters = set(letters)


print("String: ", string_letters)
print("Lists: ", lists_letters)
print("Tuples: ", tuples_letters)
print("Sets: ", sets_letters)