# Unit 3: Algorithms in Python#




In [1]:
from shared import display_unit_toc
display_unit_toc('notebook.ipynb')

# Table of Contents

* [Unit 3: Algorithms in Python](#Unit-3:-Algorithms-in-Python)
 * [Functions](#Functions)
 * [`if/elif/else` Blocks](#`if/elif/else`-Blocks)
 * [`with` ... `as` ...](#`with`-...-`as`-...)
 * [Loops: `for` and `while`](#Loops:-`for`-and-`while`)
 * [List Comprehensions](#List-Comprehensions)
 * [`try`/`except` blocks](#`try`/`except`-blocks)
 * [Debugging](#Debugging)

## Functions#

Functions are defined with `def` as we saw in Unit 2.



In [2]:
import requests
def get_html_from_url(url):
    return requests.get(url).text

bbref_url = 'https://www.basketball-reference.com/leagues/NBA_2018.html'
html = get_html_from_url(bbref_url)

print(html[:500])
print('\n--\n')
print(len(html.split('<table')))



<!DOCTYPE html>
<html data-version="klecko-" data-root="/home/bbr/build" itemscope itemtype="https://schema.org/WebSite" lang="en" class="no-js" >
<head>
    <meta charset="utf-8">
    <meta http-equiv="x-ua-compatible" content="ie=edge">
    <meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=2.0" />
    <link rel="dns-prefetch" href="https://d2p3bygnnzw9w3.cloudfront.net/req/201808291" />

<!-- no:cookie fast load the css.           -->
<link rel="preconnect" h

--

90


## `if/elif/else` Blocks#

`if/elif/else` blocks work as described in Unit 2 and require proper indentation.


## `with` ... `as` ...#

We can open external files in Python for reading or writing by using the `open()` function.


In [3]:
with open('nba_season_2018.html', 'w') as f:
    f.write(html)


## Loops: `for` and `while` #

When we want to repeat a sequence of steps multiple times, we can use loops.

A `for` loop runs a fixed number of times, typically specified by `range()`:

It's useful to sketch the code you want to run in a loop using **pseudocode**, hybrid English/Python language. Pseudocode typically takes the form of comments listing the steps we plan to take. It may also take multiple iterations to translate an abstract idea completely into code.

Suppose, for instance, we want to evaluate the performance of Eastern Conference NBA teams by analyzing their average points per game over the past 10 years. We find a site that has the data like [Basketball Reference](https://www.basketball-reference.com/leagues/NBA_2018.html). So our pseudocode might look like this:

```
For every season between 2009-2018:
* Load the season's data
* Save the eason's data
```

In [4]:
# Starting to write in comments now....
# As we're able we'll convert the lines into line/s of code:

# For every season between 2009-2018
# Figure out the season's URL
# Get the data from the season URL
# Add the new data as rows into a data frame

In [5]:
import pandas as pd

df_all = pd.DataFrame()

for season in range(2009, 2019):
    print('Processing season ' + str(season))
    url = 'https://www.basketball-reference.com/leagues/NBA_{}.html'.format(season)
    html = requests.get(url).text
    df = pd.read_html(html)[0]
    df_all = pd.concat([df_all, df])

df_all

Processing season 2009
Processing season 2010
Processing season 2011
Processing season 2012
Processing season 2013
Processing season 2014
Processing season 2015
Processing season 2016
Processing season 2017
Processing season 2018


Unnamed: 0,Eastern Conference,W,L,W/L%,GB,PS/G,PA/G,SRS
0,Atlantic Division,,,,,,,
1,Boston Celtics* (2),62.0,20.0,0.756,—,100.9,93.4,7.44
2,Philadelphia 76ers* (6),41.0,41.0,0.500,21.0,97.4,97.3,0.16
3,New Jersey Nets (11),34.0,48.0,0.415,28.0,98.1,100.5,-2.31
4,Toronto Raptors (13),33.0,49.0,0.402,29.0,99.0,101.9,-2.54
5,New York Knicks (14),32.0,50.0,0.390,30.0,105.2,107.8,-2.33
6,Central Division,,,,,,,
7,Cleveland Cavaliers* (1),66.0,16.0,0.805,—,100.3,91.4,8.68
8,Chicago Bulls* (7),41.0,41.0,0.500,25.0,102.2,102.5,-0.16
9,Detroit Pistons* (8),39.0,43.0,0.476,27.0,94.2,94.7,-0.36


We see that this mostly works as intended but needs a few tweaks. As we discuss `DataFrame` objects further, try adding these features to the code:

* Rename the column/s `Eastern Conerence` to `Team` using `rename()` and a dictionary
* Add a `Season` field to every `df` as we're passing through the loop.
* Remove any rows where the name has the word division (either by searching `'Division' not in df.Team` or `pd.notnull()`)

`while` loops run until their condition becomes `False`. As a result, we have to manually ensure they eventually trigger a `False` condition or else we can start an infinite loop. As a last resort, we can add a `break` command inside the loop that gets triggered if, for example, the loop runs too many times. In general, we'll try to avoid `while` loops here.

## List Comprehensions#


Perhaps the most useful feature in Python is the **list comprehension**, a list created with `for` loop style syntax.


In [6]:
east_standings = pd.read_html('https://www.basketball-reference.com/leagues/NBA_2018.html')[0]
east_standings.rename(columns={'Eastern Conference': 'Team'}, inplace=True)

cities = [t.Team.split(' ')[0] for i, t in east_standings.iterrows()]
cities


['Toronto',
 'Boston',
 'Philadelphia',
 'Cleveland',
 'Indiana',
 'Miami',
 'Milwaukee',
 'Washington',
 'Detroit',
 'Charlotte',
 'New',
 'Brooklyn',
 'Chicago',
 'Orlando',
 'Atlanta']

The interpreter does the leading operation, `t.Team.split()` for each step of the `for` loop written afterwards, i.e. for each row of the data frame. This conveniently allows for operations over an entire list/set to be saved in vector form.

Still, note our rule didn't quite work. For `New York Knicks` we only captured `New`. So we need to adjust slightly:

In [7]:
cities = [' '.join(t.Team.split(' ')[:-1]) for i, t in east_standings.iterrows()]
cities

['Toronto',
 'Boston',
 'Philadelphia',
 'Cleveland',
 'Indiana',
 'Miami',
 'Milwaukee',
 'Washington',
 'Detroit',
 'Charlotte',
 'New York',
 'Brooklyn',
 'Chicago',
 'Orlando',
 'Atlanta']

## `try`/`except` blocks#

If we know our code might encounter an error, we can provide an alternative path with a `try/except` block.

The `try/except` block has two sub-blocks:
* The `try` block runs first, until it encounters an error
* The `except` block runs if and only if an error happens inside the `try` block


In [8]:
try:
    2 / 0
except:
    print('Shame!')
    

Shame!


## Debugging#

The `pdb` module, or Python debugger, allows us to set break points in code using `pdb.set_trace()`. It's another way to deal with finding and diagnosing errors.

To use the debugger, import it and run the function anywhere you want to stop within your code. While the break point is set, you can access variables as they exist at that moment. This is useful when testing our code - or if it's exhibiting unexpected behavior. We add break points at multiple points around the error, then print variables that may be contributing to the error. Hopefully, seeing our code run in action will allow us enough interaction to diagnose the behavior.  

At each break point type `continue` to move on.


In [9]:
import pdb
pdb.set_trace()

--Call--
> //anaconda/envs/itec696/lib/python3.6/site-packages/IPython/core/displayhook.py(247)__call__()
-> def __call__(self, result=None):
(Pdb) continue


If the line above reads `(Pdb) continue`, it's only because we input it as the user. When running the cell, it will move focus into the IPython console window and pause, waiting for user input.

## Scratchpad

Try adding`pdb.set_trace()` at different points below and check `x`.

In [10]:
x = 3.8

x = x - 1.5

x = x * 10

for n in range(5):
    x = x + 2.5
    
