# CYPLAN255
### Urban Informatics and Visualization

# Lecture 04 -- The Python Standard Library
## Built-in Data Types, Functions, & Modules
*******
January 29, 2022

## Agenda
1. Announcements
2. The Python Standard Library
3. For next time
4. Questions


# 1. Announcements

## Announcements
- 2 Guest speakers locked in!
    - Mike Alston (Kittleson)
    - Kuan Butts (Mapbox)
- HW1 released

# 2. The Python Standard Library

In this session, we explore basic data types and modules that operate on them using Python methods associated with each type.  In this and subsequent notebooks, we draw on material from various sources, including Jean Mark Gawron's book "Python for Social Science", available [here](http://www-rohan.sdsu.edu/~gawron/python_for_ss/course_core/book_draft/index.html)

Today we will be covering material in sections 3.3 - 3.4.2

## 2.1 The Python Standard Library
The Python Standard Library refers to a collection of **modules** and standalone **functions** that are included with every Python installation. Because there is no need to install these separately, we say they are **built-in**. You can find a full list of built-in Python functions [here](https://docs.python.org/3/library/functions.html)

Let's review some of this terminology:

- **Function**: a Python object that stores a generic, reusable _statement_ or series of statements which define an operation to perform on some future input for the purpose of generating some future output. Always starts with `def do_somthing()` and ends with a `return` statement. For example:

In [None]:
def add_2(x):
    return x + 2

- **Method**: a function that is defined within a module. It's a bit more complicated than that in reality, but its OK to think of it that way for now.
- **Module**: a .py file that is _not_ a standalone script. Instead it stores a combination of statements and functions and variable definitons that can be referenced by other scripts or interactive sessions. 
- **Package/Library**: A collection of modules. Package and its modules must be _imported_ before you can use them. Some come pre-installed with Python, while others must be installed manually with a package manager like `pip` or `conda`.

## 2.2 Working with numeric data types

We have already seen some of the basic interactions with numbers in Python.  The main two numeric types are Int and Float.  In Python 2 there were two versions of integers (int and long), but these have been unified in Python 3.

In [None]:
# Integers are the simplest numeric type
type(12)

In [None]:
# Float or Floating Point numbers enable more precision
type(12.0000000000001)

Why not just use floats all the time?  They are more precise, after all.  A couple of reasons.  One is that it can be more complicated to do certain things, like compare numbers to see if they are equivalent.

In [None]:
x = 12.00000000000000001
y = 12

In [None]:
# Test whether x is equal to y.  
x == y

**Try the above by changing the number of decimal places to the original x and test it again...**

If two numbers are within some tolerance of each other, Python will consider them close enough to call equal in value.

A second reason is that floating point numbers require more space in memory and on disk if they are in a file.  This is not a problem for a single value, but if you were working with really large databases, it adds up, and could cause you to run out of memory or disk if you used float as the type for all your numeric data.

You can **cast** the type of a number to convert it to a specified type, like converting from float to int:


In [None]:
type(x)

In [None]:
x

In [None]:
print(x)
y = int(x)
print(y)

In [None]:
float(y)

### 2.2.1 Built-in operations for numerics

Reviewing some of the built-in methods in Python that apply to numeric data types:

In [None]:
x = 200
y = 12

In [None]:
# summation
x - y

In [None]:
# multiplication
x * y

In [None]:
# division
x / y

In [None]:
# integer division ("floored quotient")
x // y

In [None]:
# remainder (modulus) of x / y
x % y

### Question
How could you use one of these built-in operators to test whether or not a variable is an even number?

In [None]:
78 % 2

More numeric operators

In [None]:
# Flipping the sign
y = -x
y

In [None]:
# Works in the other direction as well
-y

In [None]:
# Raising x to the power of y
x = 10
y = 5
x ** y

In [None]:
pi = 3.141592653589793
round(pi,4)

### 2.2.2 Importing Additional Methods from the Math Module ###

In addition to the **functions** and **operators** above, many more are available from the `math` library, which is always available because it is part of the Python Standard Library. But `math` is its own **module**, so you must explicitly _import_ it before you have access to its **methods**. Like so:

In [None]:
import math
print(x)
math.sqrt(x)

You can see the full list of functions available in the math library by using tab after the name of the library and a dot:

In [None]:
math.

And you can get more documentation on a specific function by asking for it:

In [None]:
math.log?

In [None]:
math.log()

In [None]:
math.log(x)

What happens if we take the log of a number with a value of 0?

In [None]:
x = 0
math.log(x)

We could add a 1 to x to avoid this problem

In [None]:
math.log(x + 1)

Or we could use another built-in `math` function to do this for us and avoid the error

In [None]:
math.log1p(x)

### Question?
Obviously Python wouldn't come with `math.log1p()` built-in if it wasn't useful. But why would we want to do that? Doesn't it give us a wrong answer? Or even worse, FAKE DATA?

### 2.2.3 Even more math!

A common problem is division where the denominator has a value of zero

In [None]:
y / x

In [None]:
y / (x + 1)

Comparing two values to see if they are approximately the same, within some tolerance:

In [None]:
x = 12.1
z = 12.2
math.isclose(x, z, rel_tol=.01)

Of course you can put several operations together to compute things, like a quadratic equation.  

In [None]:
a = 2
b = 3
c = 4
y = a + b * x + c * x**2 
y

We will see how to do this on set of numbers a bit later.

#### Order of Operations
Note that ordering of operations matters.  Evaluation is not just left to right.  There is a term that might help remember the order by which calculations are done: Parenthesis, Exponents, Multiplication, Division, Addition, Subtraction (PEMDAS).  It is often helpful to use parentheses to group operations even just for readability, but it can make the difference between getting the result right or wrong.

In [None]:
a + b * x

In [None]:
(a + b ) * x

## 2.3 🧵Strings 🧶

### 2.3.1 Overview

Strings are just text, like in the introductory "Hello World!" example.  

Let's review quickly what we already know about strings.  We can assign any string to a variable like we would assign an integer or a float to a variable:

In [None]:
# Try this cell first just with a string of text assigned to a variable
a = CP255

In [None]:
# The string needs to be in quotes for this variable assignment to work
a = "CP255"
type(a)

In [None]:
# The quotes can be single or double, but have to match
a = 'CP255"

Let's explore some methods that operate on strings, and explain an important distinction between data types.

What if you need to create a string that has multiple lines?  There are two ways to create such a string.  The first uses triple quotes.

In [None]:
X = """
  The Zen of Programming:
  
  (Re)Set your expections.
  Try it yourself (T.I.Y.).
  Take a break.
  Try it yourself again (T.I.Y.A.).
  Not-knowing is what learning feels like.
  Learn to distinguish between knowledge you need, knowledge you don't, and trivia.
"""

print(X)

The second way uses `\n` to insert the line endings

In [None]:
X = "The Zen of Programming:\n\n(Re)Set your expections.\nTry it yourself (T.I.Y.).\nTake a break.\nTry it yourself again (T.I.Y.A.).\nNot-knowing is what learning feels like.\nLearn to distinguish between knowledge you need, knowledge you don't, and trivia."
print(X)

### Question
What will happen if I run a cell with just `X`?

In [None]:
X

### 2.3.2 Indexing and Slicing Strings

We can get individual elements of a string (characters) by using indexes, that give us pointers to the positions within a string.  

_Note that counting in Python starts from zero rather than 1. This can take a bit of getting used to -- think of it like the way building floors in Europe generally start with zero. The first floor in Europe would be a second floor in the U.S._

In [None]:
X[0]

We can use a the string indexing method to extract a range, or a specific section of a string, beginning from any position and ending in any position.  

Python uses a syntax that separates the starting from the ending index position by a colon.  If we leave out the first or last, then the indexing gives all the values up to (but not including) the second value, or all the ones from the first value to the end.  Some examples should make this clearer: 

In [None]:
X[5:7]

In [None]:
X[:5] == X[0:5]

In [None]:
X[8:]

### Question?
How would we get a slice of `X` that contains the first two elements?

### 2.3.3 More Strings!

In [None]:
a = 'This is CP255!'

A variable containing a string is still an object, and can do things like print itself

In [None]:
a

Print works with strings the same way as with numbers, suppressing the quotes and other special characters like `\n`:

In [None]:
print(a)

We can find the length of a string using the built-in len function

In [None]:
len(a)


Related to indexing, here is a built-in string function to look up a specific substring within a string, and return its index, or position:

In [None]:
str.find(a, 'C')

Bonus: who can tell me what `str` is above?

Let's see what other string functions are available, using tab completion after str.:

In [None]:
str.

Some of these function names are pretty self-explanatory, like `str.capitalize`, but others are less so.  As usual, you can look up some quick help on any of those functions:

In [None]:
str.expandtabs?

Note that since we assigned a string to a variable, a, that variable is now an object of type string, and it has access to the string methods directly:

In [None]:
print(a)
a.find('T')

We can check whether a string contains a character or substring:

In [None]:
'R' in a

We can remove specific characters in a string with the strip method:

In [None]:
a.strip('!')

To remove any leading and trailing spaces from a string, just use the strip function with no argument:

In [None]:
b = ' ' + a
print(b)
print(b.strip())

It is often helpful to put several operations together on one line, nesting them.  Going from left to right, we first take the values from the 8th index value to the end of the string, and then we strip the '!' from that result, and then we capitalize the result:

In [None]:
a[8:].strip('!').lower()

Another handy function lets you capitalize each word:

In [None]:
a.title()

Note that we cannot assign a new letter to part of the string by its index location.  This is because in Python, strings are an **immutable** data type.  As we will see shortly, other data types like lists are **mutable**.

In [None]:
a[0] = 't'

There is a function that will let you replace string values, however:

In [None]:
print(a)
print(a.replace('!', '?'))

We can also convert strings to lists of strings by splitting on a **delimeter**:

In [None]:
c = 'lastname,firstname,streetnumber,streetname,city'
c.split(',')

In [None]:
c.split(',')[0]

## 2.4 Converting between string and numeric types

In [None]:
rent = '2500'
type(rent)

Let's say we have a string object that contains numeric values and we want to do mathematical operations on it.  What happens?

In [None]:
rent * 2

In [None]:
rent * 1.5

If we need to do mathematical operations, we really need to convert this string object to a numeric type -- either an integer or a float.

In [None]:
rent_int = int(rent)
type(rent_int)

In [None]:
rent_int * 2

In [None]:
rent_float = float(rent)
rent_float

Recall that you can also convert an integer to a float by a mathematical operation that involves a floating point component so that the result is forced to type float:

In [None]:
rent_flt = rent_int * 1.5
rent_flt

But notice that the `int` method won't convert a string that looks like a floating point number:

In [None]:
rent_i = int('2500.0')

But you can do this if you first convert to float and then convert to int:

In [None]:
rent_i = int(float('2500.0'))
print(rent_i)
type(rent_i)

Of course, you sometimes may need to convert data from numeric to string type.  It works the same way:

In [None]:
rent_str = str(rent_int)
rent_str

## 2.5 Lists

You can think of strings as an ordered list of characters.  In Python, **lists** are another basic data type. Lists can contain any kind of object: strings, integers, floats, and others -- in any combination.  The syntax for lists is to include them as a sequence separated by commas, and enclosed in square brackets.  

### 2.5.1 Creating Lists

We can create an empty list, and add elements to it:

In [None]:
mylist = []
mylist.append('this')

In [None]:
mylist

Notice that we can add lists like we can add strings. This is called **concatenation**:

In [None]:
mylist = mylist + ['that']
mylist

We can also insert items in a specified location in a list

In [None]:
mylist.insert(1, 'and')
mylist

We can also convert a string that might be a sentence, or a line of data, to a list, so we can work with its elements more easily:

In [None]:
a = 'This,is,CP255!'
print('a = ', a)
b = str.split(a, ",")
print('b = ', b)

And recalling that `a` is not just text, but a string object, we can use the `split()` method directly from the string module:

In [None]:
b = a.split(",")
print(b)

### Questions
1. What's the difference between `str.split(a)` and `a.split()`?
2. Why doesn't `split()` need an argument?
3. Why doesn't `a.__class__` need a `()`?

### 2.5.2 Indexing Lists

Note that indexing works for lists like it does for strings.  And if you have a list of strings, you can index into both in a nested way.

What is the content of the first item in the list?

In [None]:
mylist[0]

What if I want the last item in a list?

In [None]:
mylist[-1]

What if I want the whole list but backwards?

In [None]:
mylist[::-1]

Note that when you index a list, it just returns a new list

In [None]:
print(type(mylist[0]))

That means that you can slap together as many indices as you want!

Knowing this, another way to get the last item in a list would be:

In [None]:
mylist[::-1][0]

To get a range of values from a list, use a **slice** of the index values: `[0:2]` would get the first through the 2nd entry, since the range goes up to, but does not include, the value of the index after the colon.

In [None]:
mylist[0:2]

How would we find the first character of the second word in our list?  We can 'nest' the indexing like this:

In [None]:
mylist[1][0]

### 2.5.3 More lists!

What functions are available for list objects?

In [None]:
list.

Find out the length of a list using len

In [None]:
len(mylist)

Let's count the number of times we encounter a character in the list, or a word

In [None]:
a.count('5')

You can check whether a list contains an item, just as we did with strings.

In [None]:
'this' in mylist

In [None]:
mylist

Delete the 3rd item in the list (remember it is indexed from 0).
But first let's make a copy of the list since `del` operates **in-place**

In [None]:
shortlist = mylist
del shortlist[2]
shortlist

Fortunately we only changed a copy of the list so everything should still be in the original list:

In [None]:
mylist

![](https://i.kym-cdn.com/photos/images/original/001/485/927/d74.jpg)

What just happened? Didn't we delete the item from the _copy_ of `mylist`? Why did `mylist` change too?


In [None]:
print(hex(id(mylist)))

In [None]:
print(hex(id(shortlist)))

In [None]:
newlist = mylist.copy()
print(hex(id(newlist)))

### 2.5.4 A brief lesson on copying objects in Python
- Assignment
   - `a = b`
- Shallow copy
   - `a = b.copy()` or `a = copy.copy(b)`
- Deep copy
   - `a = copy.deepcopy(b)`
   
<center><img src="https://miro.medium.com/max/1400/1*Vg2WLNOW_XKe4WjDt4kNYw.png" alt="Drawing" style="align: center; width: 50%;"/></center>

Image source: https://towardsdatascience.com/assignment-shallow-or-deep-a-story-about-pythons-memory-management-b8fad87bfa6c

### 2.5.5 OK back to lists!
Remember that strings are immutable and we were unable to directly substitute a value of a character based on its index position?  Well, **lists are mutable**, and it does work to replace a value directly by its index value:

In [None]:
b[2] = 'mutable!'
b

and we can put the list of strings together again to make a string from a list, inserting a space between each element:

In [None]:
c = str.join(' ', b)
c

We already saw how we could reverse a list using indexing, but in programming there is often more than one way to skin a cat. 

In this case, the second way is to use a built-in function of `list` type objects. Notice that this is another **in-place** operation.  Try it twice. Then try saving it as a new variable and seeing what that does.

In [None]:
b = b.reverse()
b

In [None]:
d = b.reverse()

### Question
How can you tell when a function operates in-place?

### 2.5.6 More with lists!
We can use the sort function to order the list.  Let's try it with a list of numbers first.

In [None]:
nums = [1, 3, 4, 5, 8, 6]
nums.sort()  # in-place
nums

And now with a list of words.

In [None]:
words = ['A', 'big', 'apple', 'pie']
words.sort()
print(words)

Note that -1 indexes the last item in a list

In [None]:
words[-1]

and that nesting a second, or nested, index slices into the string in an item in a list

In [None]:
words[-1][:-1]

We've already seen how the `range()` function can be used to create a list<sup>**</sup> of integers. It requires one argument (the length of the range) but can optionally accept arguments for the start, end, and step size of the range.

In [None]:
a = list(range(10))
print(a)

In [None]:
b = list(range(1, 5))
print(b)

In [None]:
c = list(range(10, 100, 5))
print(c)

<sup>**</sup>Does it really? Why did we have to wrap the range with a `list()` method?

Let's see what `range()` does by itself:

In [None]:
range(10, 100, 5)

It creates a special `range` object!

In [None]:
a = range(10)
type(a)

The Python `range()` method is very helpful in a wide range of contexts, including these simple examples used to create a list

### 2.5.7 PRACTICE

#### Creating and Sorting a List
Write code that creates a list with even numbers from 0 to 100 (including 100) and print the result in reverse order. 


#### Inserting Elements into a List
Write code that adds the name "Orphiucus" to the following list between Scorpio and Sagittarius

In [None]:
zodiac = [
    'Pisces', 'Aries', 'Taurus', 'Gemini', 'Cancer', 'Leo', 'Virgo',
    'Libra', 'Scorpio', 'Sagittarius', 'Capricorn', 'Aquarius',
]

#### List Indexing

Let's say we have a list called `thing` containing the integers from 1 to 7, and that we also have variables `low` whic equals 2 and `high` which equals 5.

For each operation below, first think about what you think the answer will be, then write it as code in the cell below and confirm that it does what you expected. For readibility add one cell at a time and execute it, starting by creating a list called 'thing' with the integer values 1...7, and variables low and high with values 2 and 5. Then answer each question below.

1. What does `thing[low:high]` do?

2. What does `thing[low:]` (without a value after the colon) do?

3. What does `thing[:high]` (without a value before the colon) do?

4. What does `thing[-1]` (just a colon) do?

5. What does `thing[:-1]` (just a colon) do?

6. What does `thing[:]` (just a colon) do?

7. How long is the list `thing[low:high]`?


In [None]:
# Create your variables (thing, low, and high) here


In [None]:
# 1.

In [None]:
# 2.

In [None]:
# 3.

In [None]:
# 4.

In [None]:
# 5.

In [None]:
# 6.

In [None]:
# 7.

## 2.6 Tuples

Continuing from numeric types, strings and lists, we now cover three more powerful data types in Python: **tuples**, **dictionaries**, and **arrays**.  We will cover how to create them, what they are used for, and how to use some of their methods.

Tuples are like lists, but are **immutable**.  The syntax is similar except tuples use parentheses instead of square brackets.

In [None]:
d = ('a', 'b', 'c')
print(d)

In [None]:
d[2] = 'z'

See?  It really is immutable.  You'll just get a traceback if you try.  Use immutables only when you don't want to allow them to be modified.

In [None]:
del d[2]

If you want to remove an element or update it, you could translate the tuple back to a list first.

In [None]:
print(d)
e = list(d)
e.remove('c')
print(e)

But notice that `e` is a list, not a tuple.  If we want the result to be a tuple, we have to convert it back from a list.

In [None]:
f = tuple(e)
print(f)

## 2.7 Dictionaries

A **dictionary** (`dict`) is another built-in Python data structure designed to store...data.

You can think of a `dict` like slightly more complicated, more useful version of a `list`. The main difference is how we access the items that each data structure contains. You already saw that with a `list` we had to use **positional indexing** to look-up the items. With a dictionary, however, you define your own indexes/look-ups called **keys**.

### 2.7.1 Tiny databases
A `dict` is a kind of **key-value store**, which means it is comprised of **key-value pairs**:
- **key**: the index, used to the describe the item in the container
- **value**: the item itself

Many of the most widely-used databases out there are no more complicated than this.

A few details to keep in mind:
- The **keys** have to be **unique** and **immutable**. The usual suspects are `str` and `int` objects.
- The **values** can be anything, including lists, and even other dictionaries (nested dictionaries):

- ~~key/value pairs are **unordered**. Even though they print in a particular way, this doesn't mean that one comes before the other.~~

### 2.7.2 Creating Dictionaries


There are a few different ways to create dictionaries.  The first two create an empty dictionary.

In [None]:
new_dict = {}

In [None]:
next_dict = dict()

Another way to create a dictionary is to provide **key: value** pairs in a list, and put these into curly brackets:

In [None]:
antonyms = {'hot': 'cold', 'fast': 'slow', 'good': 'bad'}
print(antonyms)

### 2.7.3 Storing data in a `dict`
We can either use assignment or the built-in `update()` function

In [None]:
new_dict.update({'new': 'item'})

In [None]:
new_dict["next"] = "thing"

In [None]:
new_dict

Another way to do create a dictionary is by combining a `list` of **keys** with a `list` of **values**. This is done with the `zip()` function. We'll take a closer look at `zip()` when we talk about loops and iterables. For now, just be glad to know that it exists

In [None]:
keys = ['hot', 'fast', 'good']
values = ['cold', 'slow', 'bad']
antonyms2 = dict(zip(keys, values))
print(antonyms2)

### 2.7.4 Getting data from a `dict`
We can retrieve the value of any dictionary entry by its key:

In [None]:
antonyms['hot']

Or with a built-in **method**

In [None]:
antonyms.get('hot')

Whats the difference?

In [None]:
antonyms['wavy']

In [None]:
antonyms.get('wavy', 'unwavy')

We can get the length, keys, and values of a dictionary:

In [None]:
len(antonyms)

To see all the keys in a dictionary, use the keys function:

In [None]:
print(antonyms.keys())

The same thing works to get the values:

In [None]:
print(antonyms.values())

### 2.7.5 Dictionaries are mutable

We already saw that we can add elements to a dictionary. But we can also change the value associated with a particular key by just assigning a new value:

In [None]:
antonyms['big'] = 'small'
antonyms

If you want to delete a dictionary entry, use del:

In [None]:
del new_dict['red']
new_dict

### 2.7.6 dictionaries vs. lists

In general, if you need data to be ordered or you have only simple data not needing to be subset, use a list.

If the data is complex or hierarchical, the dictionary's `key` / `value` structure can be very helpful. If you are only concerned about membership in a collection, dictionaries will always be much faster to reference, as the computer doesn't have to keep track of order. And to make a hierarchical or nested data structure, you can put a list (or even another dictionary!) inside a dictionary as the `value`.

**Looking Ahead**: when you begin looking at data embedded in websites, it is generally going to be in JSON format, which is comprised of, guess what? Nested Dictionaries!

# Sources

This notebook was heavily adapted from previous course material by Prof. Paul Waddell and Samuel Maurer.