An example of how the brisk solution calculates the group of ids who have the most points and a cost below or equal to the test_salary.

This is the second part of the optimization, where we've already merged the individual positions, which is why you'll see there are more than one index for each position we're combining.

The ids refer to some way to know which players have these values. In the code, this is the id for the player in the Pandas DataFrame in which we read the csv file. At the end, when we have those ids, we can check with the DataFrame to get the names of the players.

In [1]:
import numpy as np
import pandas as pd
import ast #used for conversion of the id list pandas is reading
df = pd.read_csv('vals.csv', converters={0:lambda x: ast.literal_eval(x)})
df

Unnamed: 0,ids,sal,pts,pos
0,"[0, 4]",5,3,PG
1,"[1, 5]",7,5,PG
2,"[2, 6]",8,6,PG
3,"[3, 7]",9,7,PG
4,"[8, 13]",2,2,SG
5,"[9, 14]",3,4,SG
6,"[10, 15]",5,6,SG
7,"[11, 16]",6,6,SG
8,"[12, 17]",8,7,SG


In [2]:
pgdf = df[df.pos=='PG']
pgdf

Unnamed: 0,ids,sal,pts,pos
0,"[0, 4]",5,3,PG
1,"[1, 5]",7,5,PG
2,"[2, 6]",8,6,PG
3,"[3, 7]",9,7,PG


In [3]:
sgdf = df[df.pos=='SG']
sgdf

Unnamed: 0,ids,sal,pts,pos
4,"[8, 13]",2,2,SG
5,"[9, 14]",3,4,SG
6,"[10, 15]",5,6,SG
7,"[11, 16]",6,6,SG
8,"[12, 17]",8,7,SG


In [4]:
pgids = np.array(pgdf.ids.tolist())
pgsals = np.array(pgdf.sal.tolist())
pgpts = np.array(pgdf.pts.tolist())
print("Ids:", pgids)
print("Salaries:", pgsals)
print("Points:", pgpts)

Ids: [[0 4]
 [1 5]
 [2 6]
 [3 7]]
Salaries: [5 7 8 9]
Points: [3 5 6 7]


What you'll see here is that I zip up the values again, which is the same as the basic solution. The reason for this is because that's how the data is started as when we try to merge the positions in the brisk solution here.

In [5]:
pgids_list = pgdf.ids.tolist()
pgsals_list = pgdf.sal.tolist()
pgpts_list = pgdf.pts.tolist()
pgs_list = list(zip(pgsals_list, pgpts_list, pgids_list))
print("PG Ids:", pgids_list)
print("PG Salaries:", pgsals_list)
print("PG Points:", pgpts_list)
print("PG Zipped:", pgs_list)

PG Ids: [[0, 4], [1, 5], [2, 6], [3, 7]]
PG Salaries: [5, 7, 8, 9]
PG Points: [3, 5, 6, 7]
PG Zipped: [(5, 3, [0, 4]), (7, 5, [1, 5]), (8, 6, [2, 6]), (9, 7, [3, 7])]


In [6]:
sgids_list = sgdf.ids.tolist()
sgsals_list = sgdf.sal.tolist()
sgpts_list = sgdf.pts.tolist()
sgs_list = list(zip(sgsals_list, sgpts_list, sgids_list))
print("SG Ids:", sgids_list)
print("SG Salaries:", sgsals_list)
print("SG Points:", sgpts_list)
print("SG Zipped:", sgs_list)

SG Ids: [[8, 13], [9, 14], [10, 15], [11, 16], [12, 17]]
SG Salaries: [2, 3, 5, 6, 8]
SG Points: [2, 4, 6, 6, 7]
SG Zipped: [(2, 2, [8, 13]), (3, 4, [9, 14]), (5, 6, [10, 15]), (6, 6, [11, 16]), (8, 7, [12, 17])]


These next two lines, where we take the values from the lists and make them into `np.array`s is because that's what the brisk_solution needs to do. If you read the post, you'll see that this is a huge slowdown for brisk.

I feel it's important to keep in mind that when taking a solution, like the basic one, and making it faster, it won't happen all at once. When modifying basic into brisk, I knew this was going to be slow and that I should change the data. But before that, there was more important work to do, which was removing the for loops. When moving to the fast solution, which you'll see next, you'll note that I don't do this.

In [7]:
pgsals = np.array([sal for sal,_,_ in pgs_list])
pgpts = np.array([pts for _,pts,_ in pgs_list])
pgids = np.array([ind for _,_,ind in pgs_list])
print("PG Ids:", pgids)
print("PG Salaries:", pgsals)
print("PG Points:", pgpts)

PG Ids: [[0 4]
 [1 5]
 [2 6]
 [3 7]]
PG Salaries: [5 7 8 9]
PG Points: [3 5 6 7]


In [8]:
sgsals = np.array([sal for sal,_,_ in sgs_list])
sgpts = np.array([pts for _,pts,_ in sgs_list])
sgids = np.array([ind for _,_,ind in sgs_list])

print("SG Ids:", sgids)
print("SG Salaries:", sgsals)
print("SG Points:", sgpts)

SG Ids: [[ 8 13]
 [ 9 14]
 [10 15]
 [11 16]
 [12 17]]
SG Salaries: [2 3 5 6 8]
SG Points: [2 4 6 6 7]


Now we're goign to combine the salary and points arrays into a 2d matrix which has the sums of all the combinations together. If you pick and `(x,y)` value, such as `(0,3)`, with the salary of 11, you'll have 9 points.

In [9]:
full_sals = pgsals[:,np.newaxis] + sgsals
full_sals

array([[ 7,  8, 10, 11, 13],
       [ 9, 10, 12, 13, 15],
       [10, 11, 13, 14, 16],
       [11, 12, 14, 15, 17]])

In [10]:
full_points = pgpts[:,np.newaxis] + sgpts
full_points

array([[ 5,  7,  9,  9, 10],
       [ 7,  9, 11, 11, 12],
       [ 8, 10, 12, 12, 13],
       [ 9, 11, 13, 13, 14]])

In [11]:
test_salary = 11

In [12]:
valids = full_sals <= test_salary #groupings that are correct with the
valids

array([[ True,  True,  True,  True, False],
       [ True,  True, False, False, False],
       [ True,  True, False, False, False],
       [ True, False, False, False, False]])

With numpy, a True / False arary is also a 1 / 0 array, so when we multiply the two together, we get zeros.

In [13]:
possibilities = full_points * valids
possibilities

array([[ 5,  7,  9,  9,  0],
       [ 7,  9,  0,  0,  0],
       [ 8, 10,  0,  0,  0],
       [ 9,  0,  0,  0,  0]])

In [14]:
x, y = np.unravel_index(possibilities.argmax(), possibilities.shape)
x, y #x #refers to index of ids, y refers to index of ids2

(2, 1)

From the possibilities matrix, we can see that the top points that are valid according to the salary to get those points is 10. Based on how the `full_sals` and `full_points` matricies were created, the ids for this value are calculated with the `top_players` line below.

In [15]:
top_points = full_points[x][y]
top_sals = full_sals[x][y]
top_players = np.concatenate([pgids[x], sgids[y]])

In [16]:
print("Top players:", top_players)
print("Top points:", top_points)
print("Combined salaries:", top_sals)

Top players: [ 2  6  9 14]
Top points: 10
Combined salaries: 11
