<img src="https://ga-dash.s3.amazonaws.com/production/assets/logo-9f88ae6c9c3871690e33280fcf557f33.png" style="float: left; margin: 10px;"> 
#  Intro to Python: Data Types

Author: Tim Book

---

Week 1 | Lesson 1.1

### LEARNING OBJECTIVES
*After this lesson, you will be able to:*
- Learn to use basic Jupyter Notebook features
- Define integers, strings, tuples, lists, and dictionaries
- Demonstrate arithmetic operations and string operations

### STUDENT PRE-WORK
*Before this lesson, you should already be able to:*
- Describe/define Python data types


## First and Foremost: Python is a Calculator
_(...just like every other programming language)_

Let's learn some common mathematical operations:

In [1]:
# Addition
2 + 2

4

In [2]:
# Subtraction (note we can have negative numbers!)
3 - 7

-4

In [3]:
# Multiplication
5 * 2

10

In [4]:
# Division
5 / 2

2.5

In [5]:
# Exponentiation (do NOT use ^)
5**2

25

In [6]:
# Modular division ("mod" for short)
5 % 2

1

In [7]:
# Floor division (ie "round down" division)
5 // 2

2

In [8]:
# /poll "What is `5 + 2 * 3`?" "21" "11" "idk" anonymous limit 1
5 + 2 * 3

11

In [9]:
# /poll "What is `(5 + 2) * 3`?" "21" "11" "idk" anonymous limit 1
(5 + 2) * 3

21

## Variables
Great - Python is just a fancy calculator. It's also important for us to be able to save numbers as **variables** so we can reference them later without memorizing their value.

In [10]:
x = 3
y = 4
z = 2

In [11]:
(x + y) / z

3.5

## Naming Rules

> _There are only two hard things in Computer Science: cache invalidation and naming things._ - Phil Karlton

You can _pretty much_ name variables whatever you want. But, there are a few rules we should follow. Some are strict, some are just good manners.

### Variable naming rules (mandatory)
- Names can only consist of numbers, letters and underscores.
- Names can't begin with numbers.
- You can't name a variable after a built-in Python keyword (eg `if`).

### Variable naming rules (good manners)
- Names should _**always**_ be descriptive (ie, don't name variables `x` and `df`)
- No capital letters!
- Variables should not begin with an underscore (this means something special)
- Multi-word variables should be in `snake_case`. All lower case separated by underscores.
- Technically, you _can_ name variables after built-in Python _functions_ (like `print`), but it's an _extremely_ bad idea to do so.
    - Rule of thumb: If a variable name turns green, don't use it!
    
### Math exercise (sorry):
Recall the quadratic formula for solving a polynomial equation with coefficients $a$, $b$, $c$:

$$ x = \frac{-b \pm \sqrt{b^2 - 4ac}}{2a} $$

In [12]:
a = 1
b = -8
c = 15

In [13]:
discrim = b**2 - 4*a*c

In [14]:
# Slack thread: Give me the code to produce one of the two roots!
(-b + discrim**0.5) / (2*a)

5.0

## So, what is a "data type"?
When you hear the word "data", you probably think of a spreadsheet. Actually, **data is a synonym for information!** Anything that represents "information" is data. Including any and all Python variables. If I run `x = 3`, then `x` is data!

Data can come in various **types.** We've already seen two types!

1. The `int` type: Integers with no decimal part (eg `2`, `-30`, `14`)
1. The `float` type: Numbers with a decimal part, even if that part is zero (eg `2.5`, `3.141`, `2**0.5`, `-3.0`)

Curious about what an object's data type is? Simply use the `type()` function to ask!

```python
type(3) # int
type(4.2) # float
```

In [15]:
type(3)

int

In [16]:
type(4.2)

float

## Strings

---

Strings are how we store text data in Python. Strings are _strings of characters_ between either double quotes (`"`) or single quotes (`'`). Python doesn't care which as long as they match.


In [17]:
"The pen is mightier than the sword!"

'The pen is mightier than the sword!'

In [18]:
'Single quotes work just fine too.'

'Single quotes work just fine too.'

In [19]:
# Multi-line string
multi_line_string = """If you have three
quotes in a row,
you can even have a string
that spans multiple lines!"""

In [20]:
print(multi_line_string)

If you have three
quotes in a row,
you can even have a string
that spans multiple lines!


In [21]:
# Escape characters
"Backslashes allow you to have \"quotes\" inside your quotes!"

'Backslashes allow you to have "quotes" inside your quotes!'

The **print** command prints the value assigned to the variable `x` on the screen. 

The **print** statement removes the quotations, whereas just running they jupyter cell with `x` at the last line leaves the quotations in.

You can use 'single' or "double" quotations to create a string variable.

## String Math!
Besides simply storing text, we can also operate on strings. Everything in Python has a **type**, and types can be operated on with their respective **methods**. Methods are actions we can perform on a type using the following syntax:

```python
variable.method(parameters)
```

In [22]:
s1 = "Be quiet"
s2 = "this is a library!"

In [23]:
s1 + s2

'Be quietthis is a library!'

In [24]:
reprimand = s1 + ", " + s2

In [25]:
str(reprimand)

'Be quiet, this is a library!'

In [26]:
# Uppercasing is a method in Python
reprimand.upper()

'BE QUIET, THIS IS A LIBRARY!'

In [27]:
# Also lowercase
reprimand.lower()

'be quiet, this is a library!'

In [28]:
# There are plenty of commands. let's try out Jupyter's autocomplete
# feature to see what we can do!
# reprimand.

In [29]:
# Let's have some fun with .replace()!
reprimand.replace("quiet", "loud").replace("library", "party").upper()

'BE LOUD, THIS IS A PARTY!'

In [30]:
# Also: An extremely useful method is .split()
reprimand.split(' ')

['Be', 'quiet,', 'this', 'is', 'a', 'library!']

## Slicin' Strings
We may also want to pick apart our strings. We can do this by **indexing** or **slicing**. In fact, you can index or slice several different types in Python. For example:

- Strings
- Lists
- Tuples
- Sets

---

All of the above types can be accessed using brackets in the following ways:

- **`s[0]`** References the first element
- **`s[0:4]`** References the first **4** elements of a string from index **`0`**.
- **`s[-1]`** Reference the _first_ item in reverse order (or the last item).
- **`s[-2]`** Reference the _second_ item in reverse order (second to last item).
- **`s[0:-3]`** Reference everyting _execept the last 3_ elements.


In [31]:
s = "Python programming is really fun"

In [32]:
len(s)

32

In [33]:
# First letter
s[0]

'P'

In [34]:
# Second letter
s[1]

'y'

In [35]:
# Second through fourth letter
s[1:4]

'yth'

In [36]:
# First 5 letters
s[:5]

'Pytho'

In [37]:
# Last letter
s[-1]

'n'

In [38]:
# Last 5 letters
s[-5:]

'y fun'

In [39]:
# THREAD: Get me the word "programming" from the string s.
# I want it two ways: Using slicing and using .split()

In [40]:
s[7:18]

'programming'

In [41]:
s.split(' ')[1]

'programming'

## Collection Types!

![](imgs/skittles.jpg)

We often want to store many values in one variable. A _collection_. There are several collection types in Python. The first and most common is...

### Lists
Lists are mutable, heterogeneous collections.

- **Mutable** = They can be changed
- **Heterogeneous** = They can hold values of different data types

In [42]:
names = ['Albert', 'Brenda', 'Carlos', 'Daenerys', 'Elon', 'Farnsworth']
type(names)

list

In [43]:
# Reference 1st item
names[0]

'Albert'

In [44]:
# Reference 2nd item
names[-1]

'Farnsworth'

In [45]:
# Every other name, starting with the third
names[2::2]

['Carlos', 'Elon']

In [46]:
# Backwards!
names[::-1]

['Farnsworth', 'Elon', 'Daenerys', 'Carlos', 'Brenda', 'Albert']

### List Operations

In [47]:
# Append
names.append('Gary')

In [48]:
names

['Albert', 'Brenda', 'Carlos', 'Daenerys', 'Elon', 'Farnsworth', 'Gary']

In [49]:
# Remove
names.remove('Daenerys')

In [50]:
names

['Albert', 'Brenda', 'Carlos', 'Elon', 'Farnsworth', 'Gary']

In [51]:
# Join???
'_'.join(names)

'Albert_Brenda_Carlos_Elon_Farnsworth_Gary'

### Tuples
Tuples are less used than lists, but very similar. They are immutable and heterogeneous

- **Immutable** = Once made, they can never be changed.
- **Heterogeneous** = They can contain values of different types

For our purposes, you can just think of tuples as immutable lists. Their existence is partly legacy from a time when they were more useful. Traditionally they're only used to hold short sequences of variables.

In [52]:
family = ('Ken', 'Tina', 'Jeremy')

In [53]:
# Can slice and index like normal
family[0]

'Ken'

In [54]:
# Bzzzt! Illegal. Tuples are immutable.
# family.append('Chloe')

### Slight aside: Tuple unpacking
Tuples can be "unpacked". So can lists, but this is most common with tuples. This means that you can assign tuples elements to variables if you separate them by comma, like this:

In [55]:
instructor = ("Tim", "Book")
first, last = instructor

In [56]:
first

'Tim'

In [57]:
last

'Book'

We'll see tuple unpacking a few times throughout the course.

## Sets
We'll see sets pretty much never, but they're worth mentioning very briefly. They're unordered, unique collections. Just like traditional sets in a math class.

In [58]:
my_grades = {'A', 'B+', 'A', 'C+', 'B-', 'B+'}
my_grades

{'A', 'B+', 'B-', 'C+'}

In [59]:
my_grades.add('A-')
my_grades

{'A', 'A-', 'B+', 'B-', 'C+'}

In [60]:
my_grades.remove('A')
my_grades

{'A-', 'B+', 'B-', 'C+'}

In [61]:
'B+' in my_grades

True

In [62]:
your_grades = {'B+', 'B-', 'F-'}

In [63]:
my_grades.intersection(your_grades)

{'B+', 'B-'}

In [64]:
my_grades.union(your_grades)

{'A-', 'B+', 'B-', 'C+', 'F-'}

## Dictionaries!

![](imgs/phonebook.jpeg)

Dictionaries are very common. They're unordered, mutable key-value pairs. Think of them like an actual dictionary. The key is the "word" and the value is the "definition".

In [65]:
music = {'doe': 'A deer, a female deer', 'ray': 'A drop of golden sun'}

In [66]:
# Indexing
music['doe']

'A deer, a female deer'

In [67]:
# Bzzt! Remember, dictionaries are unordered. No such thing as "first" element
# music[0]

In [68]:
music['me'] = 'A name I call myself'
music

{'doe': 'A deer, a female deer',
 'ray': 'A drop of golden sun',
 'me': 'A name I call myself'}

In [69]:
# This is how you can delete a key. But keep in mind, if you need to do this, you're
# better off with a different data type. Perhaps a custom class.
# (We'll learn about classes and OOP in a few weeks).
del music['doe']

In [70]:
music

{'ray': 'A drop of golden sun', 'me': 'A name I call myself'}

In [71]:
# What happens when we attempt to access a missing entry?
# music['doe']

In [72]:
# You often want to have a "default" value for keys that don't exist.
# We can do this with the .get() method.
# Fun fact: some people ONLY access dictionary keys with the .get().
# This is starting to gain some traction and is thought to be a pretty good idea.
music.get('doe', 'MISSING SOFLEGE!')

'MISSING SOFLEGE!'

## Dictionaries are a big deal!

Dictionaries can get really big and really complicated, like the one below. You might think this is excessive, but it's very common. This is a very efficient way to store complicated data that don't fit neatly in a spreadsheet. In fact, dictionaries are the data type used by most web APIs! We'll need to parse big dictionaries to get data from the internet!

In [73]:
authors = {
    "J.R.R. Tolkien": {
        "genre": "fantasy",
        "books": [
            "The Fellowship of the Ring",
            "The Two Towers",
            "The Return of the King"
        ],
        "active": False
    },
    "Brandon Sanderson": {
        "genre": "fantasy",
        "books": [
            "The Way of Kings",
            "Words of Radiance",
            "Oathbringer"
        ],
        "active": True,
        "phone": {
            "home": "(281) 330-8004",
            "work": "(877) CASH-NOW"
        }
    },
    "Frank Herbert": {
        "genre": "science fiction",
        "books": ["Dune"],
        "phone": None,
        "active": False
    }
}

In [74]:
# THREAD: Get me Tokien's second book

In [75]:
# THREAD: I need to call Brandon Sanderson about an idea for a screenplay.
# Can you get me his work number?

## Booleans

![](imgs/boole.jpg)

Booleans are variables that only have two different values: `True` and `False`. They're named after their founder, **George Boole** and will come in real handy when we discuss control flow this afternoon.

Booleans really only have three operations you can perform on them: `not`, `and`, and `or`.

In [76]:
# not: Simply gives the opposite
not True

False

In [77]:
not False

True

In [78]:
# and: A and B only yields True if both A and B are true
sky_blue = True
grass_green = True
pigs_fly = False

In [79]:
sky_blue and pigs_fly

False

In [80]:
sky_blue and grass_green

True

In [81]:
# or: A and B only yields false if both A and B are false
matt_cool = False

In [82]:
sky_blue or pigs_fly

True

In [83]:
pigs_fly or matt_cool

False

## Cool story, Boole
So what? We rarely actually define variables to be `True` or `False`. More often, we get them from asking Python math problems.

In [84]:
# Greater than
5 > 3

True

In [85]:
# Less than
5 < 3

False

In [86]:
# Greater than or equal to
3 >= 3

True

In [87]:
# THREAD: Fun stuff
(3 > 2) and ((5 <= 5) or (10 < 3))

True

In [88]:
# Not equals to
3 != 4

True

In [89]:
# Equals to
5 == 4

False

## Food for thought
- Why does `0.2 + 0.1 == 0.3` yield the answer it does?
- Why does `True == 1` yield the answer it does?
- Why does `"3" + 3` yield an error?
- What happens when you add two lists?
- What happens when you multiply a list (or a string!) by an integer? Why does this happen?
    - e.g. `"*" * 20` or `[1, 2, 3, 4] * 2`

## Today we covered...
- Basic Jupyter Notebook use
- Basic math in Python
- String manipulation in Python
- Collection data types in Python
- Booleans in Python