# Python Fundamentals: Data Types and Structures

* * * 

<div class="alert alert-success">  
    
### Learning Objectives 
    
* Recognize that data comes in different types.
* Use functions to manipulate variables.
* Use a search engine to look up how functions work.
* Understand when to use a list, and when to use a dictionary.
</div>


### Icons Used in This Notebook
🔔 **Question**: A quick question to help you understand what's going on.<br>
🥊 **Challenge**: Interactive exercise. We'll work through these in the workshop!<br>
💡 **Tip**: How to do something a bit more efficiently or effectively.<br>
⚠️ **Warning:** Heads-up about tricky stuff or common mistakes.<br>
📝 **Poll:** A Zoom poll to help you learn!<br>
🎬 **Demo**: Showing off something more advanced – so you know what Python can be used for!<br>


### Sections
1. [Data Types in Python](#dtypes) 
2. [Functions and Methods](#func)
3. [Lists: Ordered Data Structures](#lists)
4. [Dictonaries: Key-Value Pairs](#dicts)

<a id='dtypes'></a>

# Data Types in Python

**Data types** are classifications of data. Programming languages need separate data types, because you can do different kinds of things with different kinds of data. For instance, think of "rounding off" – this makes sense with numbers, but not so much with text.

There are a lot of data types in Python. Today, we'll introduce **integers, floats, and strings**. We will also cover **lists and dictionaries**, which offer ways of organizing data and are sometimes called data structures.

We can use the `type()` **function** to identify the type of a variable. Functions are signified by parentheses following them, which contain any inputs to the function.

🔔 **Question:** Let's check the type of two variables below. You don't know the names of these types yet, but broadly speaking, what do you think each variable could be classified as?

In [1]:
life_exp = 28
type(life_exp)

int

In [2]:
continent = 'Asia'
type(continent)

str

Here are some of the most common data types you'll encounter while using Python (and programming languages in general):

* **int**: Integers (e.g., `a = 2`).
* **float**: Decimal numbers (e.g., `a = 2.01`).
* **str**: Strings, which denotes text (e.g., `a = "2"` or `a = '2'`).

Operations and functions work differently for different types. For example, subtraction works with numeric types like floats, but not with strings.

In [3]:
# Subtraction with floats
life_exp - 2

26

In [4]:
# Subtraction with strings?
continent - 2

TypeError: unsupported operand type(s) for -: 'str' and 'int'

In contrast, addition works for both strings and numbers:

In [5]:
# Addition with floats
life_exp + 2

30

In [7]:
# Addition with strings = concatenation
'South-' + continent
  # only works with other strings. same type required
    
# You can accidentally delete python base (but can restart kernel to fix)
# example: del print

TypeError: can only concatenate str (not "int") to str

### 💡 Tip: Guidelines for Variable Names

- Python is case-sensitive (`life_exp` and `Life_exp` are two separate variables).
- Use meaningful variable names (e.g. `continent` is more informative than `a_variable`). Ideally, you should be able to tell what is going on in the code and variables without having to run it.
- Don't use variable names that refer to existing variables and functions in Python (e.g., `print`, `sum`, `str`).

## Type Conversion

Types can get confusing. For instance, we can write a number as either an integer or a string. Python treats these differently, even if to us the value is the same:

In [8]:
a = '3'
b = 3

b - a

TypeError: unsupported operand type(s) for -: 'int' and 'str'

Even though our intention is to do numeric subtraction, the type of `a` is a string, which results in an error.

If we could convert this to an integer, the operation will work. 

We can do this with **type conversion**. The `int()` function will convert the input to an integer:

In [9]:
int(a)

3

In [10]:
type(int(a))

int

In [11]:
b - int(a)

0

There are other type conversion functions.

- `str()` converts a variable to a string.
- `float()` converts a variable to a float. 

⚠️ **Warning:** If the value cannot be converted to that type, the function will return a `ValueError`. Run the cell below.

In [12]:
int('Netherlands')
# can't sensibly convert a string into an integer

ValueError: invalid literal for int() with base 10: 'Netherlands'

In the above case, the error means that **non-numeric characters** cannot be interpreted as a number.

## 🥊 Challenge 1: What's Your Type?</span>

What is the type of the following expressions? Run these cells and wrap them in a `type()` function to see!

💡 **Tip:** If you don't understand what `round()` does, use a search engine to look up `python round` and see if you can find answers. 

In [15]:
# 1
b - int(a)

type(b - int(a))

int

In [22]:
# 2
round(2.44)

type(round(2.44)) # int
type(2.44) # float
type(float("2.44")) # this works!

float

In [20]:
# 3
['Afghanistan','Canada','Zimbabwe']
type(['Afghanistan','Canada','Zimbabwe'])

list

📝 **Poll PyFun 2-1:** What is the type of the expressions?

<a id='func'></a>

# Functions and Methods

You've been using **functions** like `print()` and `type()`, to carry out common tasks with data and variables.

Functions can be recognized by their trailing parentheses `()`. The data you want to apply the function to goes inside those parentheses.

You can even wrap functions into one another. This is called **nesting**. The output of the inner function will become the input of the outer function. Like this:

In [23]:
type(round(3.5))
# nesting!

int

A **method** is a special type of function: one that belongs to a **particular type of object**, like a string or an integer. Methods allow you to do different things with different objects.

For instance, we can use a method to turn a string variable into lowercase or uppercase. These lowercase and uppercase methods don't exist for integers, though. That's why we call them methods instead of functions – and why we access them in a different way.

You can access (and recognize) methods through **dot notation**. It looks like this: `variable.method()`

Let's look at the built-in method `upper()`, which can be applied to a string-type variable:

In [27]:
country = 'Greece'
country.upper()

'GREECE'

🔔 **Question**: What do you think the below cell does?

In [None]:
country.lower()
# makes everything lower case!

# you call functions using periods
# country.lower(), country.upper()

Note that you can run methods on variables that hold a data value, or on the data values directly!

## 🥊 Challenge 2: Chaining Methods

Methods can be **chained** in a single line. This is fine, as long as the output of one method directly feeds into the input of the next. These lines can be read sequentially left to right. 

Don't run the next code cell yet! Use your search engine to look up the two methods that `country` goes through: `lower()` and `startswith()`.

After reading up about what these methods do, what do you think the final output of this cell will be?

In [28]:
country.lower().startswith('g')
  # methods can be chained! so we first make the country object lower and then check that it starts with the lower case "g"

True

💡 **Tip**: Recall that when you're doing the same kind of thing with functions, it looks a bit different. You read these **nested functions** from the inside out.

In [35]:
"A".lower().capitalize().upper().lower()

'a'

In [29]:
print(type(int('3')))

<class 'int'>


## Adding Arguments

As you can see, functions typically take their input between the parentheses (e.g. `type()` takes the variable you want to know the type of). Methods, on the other hand, don't always take values in between their parentheses.

For instance, in the cell above, the `lower()` method doesn't take any values in between the parentheses, but the `startswith()` method does.

Methods (and functions too) can often take additional values that alter their behavior. These values, that go in between the parentheses, are called **arguments**. Let's try them out.

## 🥊 Challenge 3: Time to Split

First, let's save a string in a variable.

In [30]:
sentence = 'The capital of Brazil is Brasília. It has a tropical savanna climate.'

We can use the `split()` method on `sentence`. Try it out below.

In [38]:
# YOUR CODE HERE

sentence.split() # object.method()

['The',
 'capital',
 'of',
 'Brazil',
 'is',
 'Brasília.',
 'It',
 'has',
 'a',
 'tropical',
 'savanna',
 'climate.']

Finally, look up the `split()` method using your search engine. For instance, look for `python split() documentation`.

You will find that you can use `sep='.'` in between the parenthesis of `split()`, when applying it to `sentence`. What does it look like this argument does? What is the output?

In [39]:
# YOUR CODE HERE
sentence.split(sep = ".") # this uses a period as a separator instead of each word

# if you hold shift tab inside a method it will pull up documentation on a lower tab!

['The capital of Brazil is Brasília', ' It has a tropical savanna climate', '']

<a id='lists'></a>

# Lists: Ordered Data Structures

**Data structures** allow us to organize data. A list is one such data structure. It is a collection of ordered items. Use a list when you want to keep a bunch of items in one spot.

We specify a list with square brackets: `[]` and commas separating each entry in the list.

In [40]:
country_list = ['Ethiopia', 'Canada', 'Thailand', 'Denmark', 'Japan']
type(country_list)

list

🔔 **Question:** `len()` gives us the number of items in a list. What is the output of the line below?

In [41]:
len(country_list)

5

## Indexing Lists

If we want to retrieve an item of a list, we do so by telling Python which **index** of the list we want (e.g., we want the first, second, and third items). This is called **indexing** the list. 

To index, we use **square brackets**.

🔔 **Question:** Look at the index we create for `country_list` below. What do you think will be printed?

In [42]:
country_list[1]
 # it will print the second item (Canada) because python is zero indexed

'Canada'

Note that Python is **zero**-indexed, meaning the first item has index zero, not one! 

We can also get multiple items from a list. We specify the start index and the end index, separated by a colon `[start:stop]`. 

The colon indicates that you want to access the item between the two endpoints. If one side of the colon is empty, it indicates using one end of the list as the starting or ending points. 

🔔 **Question:** Can you guess what the output of these statements will be?

In [43]:
country_list[1:3]
# inclusive on the first, but not on the third

['Canada', 'Thailand']

In [44]:
country_list[2:5]

['Thailand', 'Denmark', 'Japan']

In [45]:
country_list[3:]
# index place 3 onwards
  

['Denmark', 'Japan']

⚠️ **Warning:** Note that Python will include the item at the start index, but **exclude** the item at the stop index. Here's what happens if you slice `country_list[1:4]`:

<img src="../img/list-index.png" alt="List Indexing in Python" width="500"/>

## 🥊 Challenge 4: Indexing

Index the following list to get rid of the values `1`, `3`, and `5`. There are different ways to do this!

In [50]:
numbers = [1, 3, 5, 7, 10, 13]

# YOUR CODE HERE
numbers[3:]
# numbers[-3:0] # third from last (index at 0)

[7, 10, 13]

📝 **Poll PyFun 2-2:** How can we index the list to get rid of the values 1, 3, and 5?

Note that lists can contain different data types, such as integers, floats, strings, and even other lists!

## List Methods

Recall that methods are functions that operate specifically on objects with a particular data type. They are accessed via dot notation: `object.method()`. 

Lists have their own methods that perform operations specific to lists. The most common method is the `append()` method, which adds an item to the end of a list. 

The code below adds a country to `country_list` using `append()`:

In [51]:
print(country_list)

['Ethiopia', 'Canada', 'Thailand', 'Denmark', 'Japan']


In [54]:
country_list.append('USA')
  # adds USA!
  # altering variable without using assignment (=). This is a rare behavior in python.

In [55]:
print(country_list)

['Ethiopia', 'Canada', 'Thailand', 'Denmark', 'Japan', 'USA', 'USA']


🔔 **Question:** Is there anything noteworthy about the way `.append()` seems to work? (Tip: it has to do with assigning variables!)

<a id='dicts'></a>

# Dictionaries: Key-Value Pairs

Dictionaries are organized in pairs of keys and values. The **keys** can be used to access the **values**. Use a dictionary when you have data organized in pairs. In our context, dictionaries can be used to create tabular data (a so-called **data frame**). We'll show you how to do this later. 

Dictionaries are specified in Python using curly braces. **Colons separate the keys and values**. 

Let's take a look at an example dictionary:

In [56]:
# An example dictionary
example_dict = {
    'country': 'Afghanistan',
    'year': 1952,
    'population': 8425333}

We can access the items of a dictionary by referring to its key name, inside square brackets. 

In [57]:
example_dict['year']

1952

🔔 **Question**: What do you think the following cell will do?

In [58]:
example_dict['population']

8425333

## List or Dict?

When would you use a list, and when a dictionary?

Take our Gapminder dataset with countries, continents, and life expectancies as an example. A list could contain the data of one of the columns – for instance, the country names.

A dictionary could contain a bunch of columns, with the key being the column name, and the values being lists of items for each column! 

Let's see that in action.

## 🎬 Demo: Creating a Dictionary

Here, we will create a dictionary called `country_dict` that takes in a list of items as its values.

In [59]:
country = ['Afghanistan', 'Greece', 'Liberia']
continent = ['Asia', 'Europe', 'Africa']
life_exp = [28.801, 76.670, 46.027]

# Creating a dict from lists
country_dict = {'country':country, 'continent':continent, 'life_exp':life_exp}
country_dict


{'country': ['Afghanistan', 'Greece', 'Liberia'],
 'continent': ['Asia', 'Europe', 'Africa'],
 'life_exp': [28.801, 76.67, 46.027]}

Dictionaries have another advantage – it's easy to turn them into a **data frame**! You'll learn more about data frames in the next workshop.

In [60]:
import pandas as pd

pd.DataFrame(country_dict)
  # very nice and ready to publish

Unnamed: 0,country,continent,life_exp
0,Afghanistan,Asia,28.801
1,Greece,Europe,76.67
2,Liberia,Africa,46.027


<div class="alert alert-success">

## ❗ Key Points

* Methods are functions that only work on certain data types.
* Lists are a collection of ordered items, which can contain different data types.
* List indices start at 0, not 1.
* The `.append()` method adds an item to a list.
* Lists can be indexed using square brackets - e.g. `some_list[0]` indexes the first item of `some_list`. 
* Dictionaries are mappings of key-value pairs. 
* Dictionary values can be accessed using square brackets – e.g. `some_dict['name']` accesses the value corresponding to the 'name' key.
    
</div>