<span class='note'>*Make me look good.* Click on the cell below and press <kbd>Ctrl</kbd>-<kbd>Enter</kbd>.</span>

In [None]:
from IPython.core.display import HTML
def css_styling():
    styles = open('css/custom.css', 'r').read()
    return HTML(styles)
css_styling()

<h5 class='prehead'>SA367 &middot; Mathematical Models for Decision Making &middot; Spring 2017 &middot; Uhan</h5>

<h5 class='lesson'>Lesson 11.</h5>

<h1 class='lesson_title'>Drafting a fantasy basketball team</h1>

## The problem

You're preparing for your upcoming fantasy basketball draft. You wonder: what is the best possible team you can draft?

You have the following data:

* Projected __auction prices__ for each player in the NBA.
* The __z-score__ for each player: the sum of the number of standard deviations above the mean in the following 9 categories:
    1. points per 36 minutes
    - 3 point field goals made per 36 minutes    
    - number of rebounds per 36 minutes
    - number of assists per 36 minutes
    - number of steals per 36 minutes
    - number of blocks per 36 minutes
    - _negative_ of the number of turnovers per 36 minutes
    - field goal percentage
    - free throw percentage
    
Your roster must have exactly 12 players, and you have a budget of \$50. You want to maximize the total z-score of your team.

Formulate this problem as a dynamic program by giving its shortest/longest path representation.

## Solving the DP

* <span class="rred">_Warning._</span> The code we're about to write isn't the most "Pythonic." However, it matches well with the mathematical notation we've been using in class.

* In the same folder as this notebook, there is a file called `fantasy_basketball_nba2017.csv` with the data described above.
    - The z-scores were computed using projected stats from [Basketball Reference](http://www.basketball-reference.com/friv/projections.cgi).
    - Actual average auction prices were taken from [Yahoo! Fantasy Sports](https://basketball.fantasysports.yahoo.com/nba/draftanalysis?tab=AD&pos=ALL&sort=DA_AP), normalized to a budget of \$50.
    

*  Let's take a look using pandas. First, let's import pandas:

In [None]:
# Import pandas
import pandas as pd

* Now we can read the csv file into a pandas DataFrame and inspect the first few rows:

In [None]:
# Read csv file with data
df = pd.read_csv('fantasy_basketball_nba2017.csv')

# Print the first 5 rows of df
df.head()

* As we can see, we even have some other data: each player's team and the positions each player plays.

* Let's use this data to create the shortest/longest path representation of our DP in networkx. 

* As usual, let's import networkx and bellmanford first:

In [None]:
# Import networkx and bellman ford
import networkx as nx
import bellmanford as bf

* There are two important constants in our problem: the budget, and the roster size. 

* Let's create variables to hold these constants.

* This way, we can easily adapt our code to accomodate similar DPs with different budgets and roster sizes.

In [None]:
# Create variables to hold constants: budget, roster size
BUDGET = 50
ROSTER_SIZE = 12

* Next, let's create some lists that correspond to the relevant columns of the dataset.

* Recall that we can grab a column from a DataFrame like this:

```python
df['COLUMN_NAME']
```

* The `list()` function turns any list-like object (such as a column of a pandas DataFrame) into a Python list.

* We can apply the `.str.split(",")` method to convert a comma-delimited string into a list. This will be helpful in parsing the positions that a player can play, since many players can play multiple positions.

In [None]:
# Create a list of players
players = list(df["PLAYER"])

# Create a list of zscores
zscores = list(df["ZSCORE"])

# Create a list of prices
prices = list(df["PRICE"])

# Create a list of positions
positions = list(df["POSITIONS"].str.split(","))

* Now we can look at player $t$ and his associated data like this: 

In [None]:
# Print out information about player 3 - Anthony Davis
print(players[3])
print(zscores[3])
print(prices[3])
print(positions[3])

* Let's also create a variable that holds the number of players:

In [None]:
# Create a variable for the number of players
n_players = len(players)

* Now we can use these lists and variables to construct the graph for the dynamic program.

* As usual, we start with an empty graph:

In [None]:
# Create empty digraph
G = nx.DiGraph()

* Next, let's add the nodes:

In [None]:
# Add stage-state nodes (t, n1, n2)
for t in range(0, n_players + 1):
    for n1 in range(0, BUDGET + 1):
        for n2 in range(0, ROSTER_SIZE + 1):
            G.add_node((t, n1, n2))

# Add the end node
G.add_node("end")

* How many nodes do we have in our graph?

In [None]:
# Print number of nodes in digraph
print(G.number_of_nodes())

* Now it's time to add the edges.

* Let's start with the edges corresponding to the decision of whether to take a player or not:

In [None]:
# Add edges corresponding to the decision of whether to take a player or not
for t in range(0, n_players):
    for n1 in range(0, BUDGET + 1):
        for n2 in range(0, ROSTER_SIZE + 1):
            
            # Don't take the player
            G.add_edge((t, n1, n2), (t + 1, n1, n2), length=0)

            # Take the player if there's enough left in the budget
            if n1 - prices[t] >= 0:
                G.add_edge((t, n1, n2), (t + 1, n1 - prices[t], n2 - 1), length=-zscores[t])

* Now we can add the edges from the last stage to the end node. Remember to only add edges from the last stage if the number of remaining roster spots $n_2$ is equal to 0!

In [None]:
# Add edges from last stage to end, 
# only when number of remaining roster spots is 0
for n1 in range(0, BUDGET + 1):
    G.add_edge((n_players, n1, 0), "end", length=0)

* How many edges do we have in our graph?

In [None]:
# Print number of edges
print(G.number_of_edges())

* Finally, let's solve the shortest path problem we've constructed using the Bellman-Ford algorithm:

In [None]:
# Solve the shortest path problem using the Bellman-Ford algorithm
length, nodes, negative_cycle = bf.bellman_ford(G, source=(0, BUDGET, ROSTER_SIZE), target="end", weight="length")

print("Negative cycle? {0}".format(negative_cycle))
print("Shortest path length: {0}".format(length))
print("Shortest path: {0}".format(nodes))

* It's easy to see what the maximum possible total z-score is... however, which players should we select to get this maximum total z-score?

* Instead of reading through the path of 400+ nodes to figure out which players to select, let's write some code to do this for us.

* We know that we select a player whenever the number of remaining roster spots $n_2$ goes down by 1 from stage to stage. So...

In [None]:
# Print selected players in a more user-friendly format
# Get number of nodes in shortest path
n_nodes = len(nodes)

# Go through each node in the shortest path
for i in range(n_nodes - 2):
    
    # Node in current stage
    (t, n1, n2) = nodes[i]
    
    # Node in next stage
    (next_t, next_n1, next_n2) = nodes[i + 1]
    
    # If n2 isn't the same from one stage to the next, print the player's info
    if n2 != next_n2:
        print("Node: {0}  Player: {1}  Positions: {2}, Price: {3}  Z-Score: {4}".format(nodes[t], players[t], positions[t], prices[t], zscores[t]))

## Incorporating other roster constraints

* Fantasy basketball leagues usually have some roster constraints &mdash; in particular, on player positions.

* For example, suppose our roster must have exactly 2 players that can play center (C).

* How can we modify our dynamic program to accomodate this? Write a new dynamic program on paper.

* How do we need to modify the code above to solve the new dynamic program?

* A hint:

    - To check if player $t$ can play center, we can write:

    ```python
    if "C" in positions[t]:
        ...
    ```

    - This code does what it looks like: it checks if `"C"` is in the list of positions `positions[t]` that player $t$ can play.

In [None]:
# Create empty digraph
H = nx.DiGraph()

# Add stage-state nodes (t, n1, n2, n3)
# t = player
# n1 = remaining budget
# n2 = remaining roster spots
# n3 = remaining C roster spots
for t in range(0, n_players):
    for n1 in range(0, BUDGET + 1):
        for n2 in range(0, ROSTER_SIZE + 1):
            for n3 in range(0, 3):
                G.add_node((t, n1, n2, n3))

# Add the end node
H.add_node("end")

# Add edges corresponding to the decision of whether to take a player or not
for t in range(0, n_players):
    for n1 in range(0, BUDGET + 1):
        for n2 in range(0, ROSTER_SIZE + 1):
            for n3 in range(0, 3):
            
                # Don't take the player
                H.add_edge((t, n1, n2, n3), (t + 1, n1, n2, n3), length=0)

                # Take the player if there's enough left in the budget
                if n1 - prices[t] >= 0:
                    
                    # If the player is a center, we can only add this edge if
                    # there are enough remaining C roster spots
                    if "C" in positions[t]:
                        if n3 > 0:
                            H.add_edge((t, n1, n2, n3), (t + 1, n1 - prices[t], n2 - 1, n3 - 1), length=-zscores[t])
                            
                    # Otherwise, the number of remaining C roster spots stays the same
                    else:
                        H.add_edge((t, n1, n2, n3), (t + 1, n1 - prices[t], n2 - 1, n3), length=-zscores[t])

# Add edges from last stage to end, 
# only when number of remaining roster spots is 0 and
# the number of remaining C roster spots is 0
for n1 in range(0, BUDGET + 1):
    H.add_edge((n_players, n1, 0, 0), "end", length=0)


# Solve the shortest path problem using the Bellman-Ford algorithm    
length, nodes, negative_cycle = bf.bellman_ford(H, source=(0, BUDGET, ROSTER_SIZE, 2), target="end", weight="length")

print("Negative cycle? {0}".format(negative_cycle))
print("Shortest path length: {0}".format(length))
print("Shortest path: {0}".format(nodes))

# Print selected players in a more user-friendly format
# Get number of nodes in shortest path
n_nodes = len(nodes)

# Go through each node in the shortest path
for i in range(n_nodes - 2):
    
    # Node in current stage
    (t, n1, n2, n3) = nodes[i]
    
    # Node in next stage
    (next_t, next_n1, next_n2, next_n3) = nodes[i + 1]
    
    # If n2 isn't the same from one stage to the next, print the player's info
    if n2 != next_n2:
        print("Node: {0}  Player: {1}  Positions: {2}, Price: {3}  Z-Score: {4}".format(nodes[t], players[t], positions[t], prices[t], zscores[t]))

## Food for thought

* Can the dynamic programs we solved above help with an actual fantasy basketball draft? Why or why not?

_Write your notes here. Double-click to edit._