# NumPy - Boolean indexing

In this unit, you are going to learn how to create subsets of arrays with boolean indexing.

You will learn about:
- creating subsets using boolean indexing
- doing calculations with arrays
- calculating statistics using NumPy functions and methods

### Examples from the introduction to boolean indexing

In [1]:
import numpy as np

my_array = np.array([1,2,3,4,5])

# creating a boolean array using logical operators
print(my_array <= 3)

[ True  True  True False False]


In [10]:
# using a boolean array to subset an array

my_array = np.array([1,2,3,4,5])

bool_array = my_array >= 3

new_array = my_array[bool_array]

print(new_array)


[3 4 5]


In [13]:
# without creating a variable for the boolean array

my_array = np.array([1,2,3,4,5])

new_array = my_array[my_array >= 3]

print(new_array)


[3 4 5]


In [15]:
# Using boolean indexing to select rows from a 2D array

my_array = np.array([[1, 1.88, 90], 
                     [2, 1.75, 65], 
                     [3, 1.62, 55],
                     [4, 1.78, 80]])

bool_array = my_array[:,1] > 1.75

tall = my_array[bool_array]

print(tall)


[[ 1.    1.88 90.  ]
 [ 4.    1.78 80.  ]]


### Task 1: Convert the lakers data to an array

1. Import numpy.
2. Run the cell below to create the lists `lakers_data`, `column_names` and `player_names`.
3. Convert the lakers_data list to a numpy array.

In [2]:
# run this cell to load the lists into Python

lakers_data = [[1.0, 1.0, 67.0, 34.6, 9.6, 19.4, 3.9, 5.7, 7.8, 10.2, 1.2, 0.5, 25.3],
               [2.0, 4.0, 62.0, 34.4, 8.9, 17.7, 7.2, 8.5, 9.3, 3.2, 1.5, 2.3, 26.1],
               [3.0, 2.0, 69.0, 25.5, 3.4, 7.3, 1.1, 1.5, 2.1, 1.6, 0.8, 0.2, 9.3],
               [4.0, 4.0, 61.0, 25.0, 4.8, 11.0, 1.9, 2.5, 4.5, 1.3, 0.5, 0.4, 12.8],
               [5.0, 2.0, 68.0, 24.8, 2.9, 7.0, 0.5, 0.7, 3.3, 1.3, 1.3, 0.5, 8.0],
               [6.0, 2.0, 49.0, 24.2, 3.5, 7.8, 0.4, 0.5, 2.3, 1.3, 0.9, 0.1, 8.6],
               [7.0, 2.0, 7.0, 23.6, 4.4, 10.4, 2.0, 2.3, 1.9, 2.4, 0.6, 0.6, 11.9],
               [8.0, 1.0, 48.0, 20.5, 2.9, 6.8, 0.6, 0.9, 3.0, 5.0, 0.8, 0.0, 7.1],
               [9.0, 5.0, 69.0, 18.9, 2.9, 4.0, 1.6, 3.1, 7.3, 0.7, 0.4, 1.1, 7.5],
               [10.0, 2.0, 64.0, 18.4, 1.9, 4.5, 1.1, 1.5, 1.9, 1.9, 1.1, 0.3, 5.5],
               [11.0, 5.0, 68.0, 16.6, 2.9, 4.5, 0.8, 1.2, 5.7, 0.5, 0.5, 1.4, 6.6],
               [12.0, 4.0, 14.0, 14.2, 2.0, 4.9, 0.4, 0.4, 3.2, 0.6, 0.4, 0.4, 5.3],
               [13.0, 3.0, 6.0, 13.5, 2.3, 5.0, 0.3, 0.7, 1.2, 1.0, 1.3, 0.2, 5.7],
               [14.0, 2.0, 6.0, 13.2, 1.2, 3.7, 0.3, 0.3, 0.8, 0.5, 0.2, 0.0, 2.8],
               [15.0, 1.0, 44.0, 11.5, 2.1, 4.9, 0.3, 0.3, 1.2, 1.1, 0.3, 0.0, 5.1],
               [16.0, 2.0, 41.0, 11.1, 1.5, 3.9, 0.2, 0.4, 1.1, 0.3, 0.2, 0.1, 4.2],
               [17.0, 5.0, 1.0, 9.0, 3.0, 6.0, 0.0, 0.0, 5.0, 1.0, 0.0, 0.0, 6.0],
               [18.0, 4.0, 45.0, 8.1, 0.5, 1.3, 0.0, 0.0, 1.2, 0.6, 0.3, 0.1, 1.5],
               [19.0, 4.0, 5.0, 4.0, 0.6, 0.6, 0.2, 0.4, 0.6, 0.4, 0.0, 0.0, 1.4],
               [20.0, 2.0, 2.0, 2.5, 0.0, 0.5, 0.0, 0.0, 0.5, 0.0, 0.0, 0.0, 0.0]]

column_names = ['Rank',
                'Position',
                'Games',
                'Minutes',
                'Shots made',
                'Shots attempted',
                'Free throws',
                'Free throws attempted',
                'Rebounds',
                'Assists',
                'Steals',
                'Blocks',
                'Points']

player_names = ['LeBron James',
                'Anthony Davis',
                'Kentavious Caldwell-Pope',
                'Kyle Kuzma',
                'Danny Green',
                'Avery Bradley',
                'Dion Waiters',
                'Rajon Rondo',
                'Dwight Howard',
                'Alex Caruso',
                'JaVale McGee',
                'Markieff Morris',
                'Talen Horton-Tucker',
                'J.R. Smith',
                'Quinn Cook',
                'Troy Daniels',
                'Devontae Cacok',
                'Jared Dudley',
                'Kostas Antetokounmpo',
                'Zach Norvell']

In [3]:
# import numpy 


# convert lakers_data to lakers_np




#### Printing columns and indices

In [18]:
for i in range(len(column_names)):
    print(str(i), column_names[i])

0 Rank
1 Position
2 Games
3 Minutes
4 Shots made
5 Shots attempted
6 Free throws
7 Free throws attempted
8 Rebounds
9 Assists
10 Steals
11 Blocks
12 Points


### Task 2: High scorers

In this task you will use boolean indexing to create a subset of the players with the highest points per game average.

1. Assign the 'Points' column to the variable `ppg`.
2. Print `ppg`.
3. Create a subset of the `ppg` array for values above 20 and assign to `high_ppg`.
4. Print `high_ppg`.
5. Use the `.shape` attribute on `high_ppg` to get the number of elements in the array and assign to `n_high_scorers`.
6. Print `n_high_scorers`.

In [None]:
# assign points column to variable ppg


# Print ppg


# Create a subset of the ppg array for values above 20 and assign to high_ppg


# Print high_ppg.


# Use the .shape attribute on high_ppg to get the number of elements in the array and assign to n_high_scorers.


# Print n_high_scorers.



### Task 3: Player efficiency

Use boolean indexing and array-methods or numpy functions to explore player efficiency, as indicated by points per minute.

1. Create a new array for the points per minute and assign to the variable `ppm`. You will have to use the minutes per game and points per game column to calculate the points per minute.
2. Print `ppm`.
3. Calculate a lower bound for the points per minute, by subtracting one standard deviation from the mean of `ppm`. Assign the result to the variable `lower_bound`.
4. Calculate an upper bound for the points per minute, by adding one standard deviation from the mean of `ppm`. Assign the result to the variable `upper_bound`.
5. Create a subset called `high_ppm` for the points per minutes that are higher than the `upper_bound`.
6. Print `high_ppm`.
7. Create a subset called `low_ppm` for the points per minutes that are lower than the `lower_bound`.
8. Print `low_ppm`.

In [None]:
# Create a new array for the points per minute and assign to the variable ppm.


# Print ppm.


# Calculate a lower bound for the points per minute. Assign the result to the variable lower_bound.


# Calculate an upper bound for the points per minute. Assign the result to the variable upper_bound.


# Create a subset called high_ppm for the points per minutes that are higher than the upper_bound.


# Print high_ppm.


# Create a subset called low_ppm for the points per minutes that are lower than the lower_bound.


# Print low_ppm.

### Task 4: Comparing small and big players

In this task you will compare short players (position 1 and 2) and tall players (position 3, 4 and 5) on their overall production (rebounds+assists+points).

1. Create a subset of the short players (position 1 and 2) and assign to the variable `short`.
2. Print `short`.
3. Create a subset of the tall players (position 3, 4 and 5) and assign to the variable `tall`.
4. Print `tall`.
5. Calculate the average points, rebounds and assists for short players and assign to `short_pra`. You will have to set the axis argument caluclate the mean for each column. Otherwise Numpy will calculate a single mean for all values in the subset.
6. Print `short_pra`.
7. Calculate the average points, rebounds and assists for tall players and assign to `tall_pra`.
8. Print `tall_pra`.

In [None]:
# Create a subset of the short players (position 1 and 2) and assign to the variable short.


# Print short.


# Create a subset of the tall players (position 3, 4 and 5) and assign to the variable tall.


# Print tall.


# Calculate the average points, rebounds and assists for short players and assign to short_pra.


# Print short_pra.


# Calculate the average points, rebounds and assists for tall players and assign to tall_pra.


# Print tall_pra.