##### <img src="../SDSS-Logo.png" style="display:inline; width:500px" />


## Learning Objectives
- Understand Boolean variables
- Combinations of Boolean variables
- Operations that lead to Boolean values
- Arrays of Booleans as masks or selectors


### Boolean data types and using arrays of Booleans as selectors or masks
- A Boolean variable can take on only one of two values, `True` or `False`
- Operations on Boolean variables lead to oter Boolean variables
- Boolean variables, especially arrays of Boolean variables are used as selectors or masks to subset data in NumPy arrays
 
### A good reference for this lesson is [Vanderplas's Data Science book](https://jakevdp.github.io/PythonDataScienceHandbook/02.06-boolean-arrays-and-masks.html).


In [13]:

# imports needed for setup
import comp116
import numpy as np
import pickle
import matplotlib.pyplot as plt

with open('Unit-5-4-Numpy.data.pickle', 'rb') as fid:
    (state_names, state_co2, state_co2_categories) = pickle.load(fid)

## Boolean values result from a number of important operations, for example, whenever we compare things. Arrays of Boolean values are useful as selectors for choosing values from Numpy arrays

* We've touched on booleans on and off.  Today we'll cover:
    * Boolean _logical_ operators
       * **and**, **or**, and **not**
    * Booleans resulting from comparisons
       * ==, !=, >, <, >=, <=
    * Boolean _element by element_ operations
       * **&amp;**, **&brvbar;**, and **~**
    * Booleans for array indexing
        * This is a very powerful and sometimes confusing way to slice an array.
        * Instead of using `start:stop:step` you use an array of booleans that is the size of the dimension being indexed.
   
### The goal for today's class is to understand booleans enough to use them as <a href="https://www.python-course.eu/numpy_masking.php" target="_blank">boolean selectors</a> in an array instead of using `start:stop:step`.

### Essentially a boolean selector (or mask) is an array of Boolean values that for each row or columns answers the question, 'Do you want this row/column? True or False?'

### Some tutorials are also [here](https://thomas-cokelaer.info/tutorials/python/boolean.html)  

## Recap booleans, integers and arrays as booleans

* Booleans are either `True` or `False`.
    * (Remember, that is not the same as true or false. The capitalized word is a reserved word in Python!)


* Booleans _may be_ treated as zeros and ones (integers).
    * Looking ahead this means that you can use `np.count_nonzero` or `np.sum` on NumPy arrays of Boolean values

* The reverse is that integers and other objects _may be_ treated as Booleans.
    * Zero is treated as False while any non-zero value is treated as True.
    * Floats, arrays, and all objects have their own mapping to True and False.
    * An array with any elements is considered True.
    * An empty array is considered False.

* You can use the operator `bool()` to cast any variable or value to a Boolean
    * This is similar to using `int()` to cast a value to an integer.


## What is treated as `True` and `False` in python?

In [3]:
# What is the Boolean value of 0?
print('The boolean value of an integer 0 is', bool(0))

# What is the Boolean value of -1?


print('The boolean value of an integer -1 is', bool(-1))


# What is the Boolean value of an array of integers?

print('The Boolean value of an array of integers is', bool(np.array([0, 22, -45, 0])))


The boolean value of an integer 0 is False
The boolean value of an integer -1 is True


ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

### Oops!! What happened in the last statement there? Think about it.

## A [*truth table* ](https://en.wikipedia.org/wiki/Truth_table) can be used to explain boolean operations


In [4]:
bool_table = np.array([[0, 0, 0, 0, 1, 1],
                      [0, 1, 0, 1, 1, 0],
                      [1, 0, 0, 1, 0, 1],
                      [1, 1, 1, 1, 0, 0]], dtype=bool)
row_names = [''] * 4
comp116.array_to_html(bool_table, row_names=row_names, col_names=['a', 'b', 'a and b', 'a or b', 'not a', 'not b'])

Unnamed: 0,a,b,a and b,a or b,not a,not b
,False,False,False,False,True,True
,False,True,False,True,True,False
,True,False,False,True,False,True
,True,True,True,True,False,False


## Boolean comparison operators

* In many situations, you want to compare two things, with the result being a `True` or `False` i.e. Boolean value.
* Common examples are where we want to tell if something is less than, greater than, equal to, something else.

* In python, the operators to do this are `==`, `!=`, `>`, `<`, `>=`, `<=`.

### Single element comparison operators
* Boolean comparisons can be done with a single element.

* Change the words within *<>* below to do the specified comparison operation.

In [5]:
# Boolean comparision of a single element
x = 5


print('x equals 5 is', x == 5)
print('x is not equal to 5 is', x !=5)

print('x is greater than 5 is', x > 5)

print('x is less than 5 is', x < 5)

print('x is greater than or equal to 5 is', x >= 5)

print('x is less than or equal to 5 is', x <= 5)

x equals 5 is True
x is not equal to 5 is False
x is greater than 5 is False
x is less than 5 is False
x is greater than or equal to 5 is True
x is less than or equal to 5 is True


### Comparing an array of values to a single element 

* You can also an array of elements to a single value.

* In the cell below, do the same operations as before, but use the variable `arr` for comparison.  

In [6]:

# Create the array # arr = [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 ]
arr = np.arange(10) 
print('arr is', arr)

# Boolean array from comparing an array to a scalar

print('arr elements are equal to 5 is', arr == 5)

print('arr elements are greater than 5 is', arr > 5)

print('arr elements are less than 5 is', arr < 5)

print('arr elements are greater than or equal to 5 is', arr >= 5)

print('arr elements are less than or equal to 5 is', arr <= 5)


arr is [0 1 2 3 4 5 6 7 8 9]
arr elements are equal to 5 is [False False False False False  True False False False False]
arr elements are greater than 5 is [False False False False False False  True  True  True  True]
arr elements are less than 5 is [ True  True  True  True  True False False False False False]
arr elements are greater than or equal to 5 is [False False False False False  True  True  True  True  True]
arr elements are less than or equal to 5 is [ True  True  True  True  True  True False False False False]


### So comparing an array of values to a single element results in an array of Booleans of the same size

* In the cell below, let us confirm this

In [9]:
print('len(arr) is', len(arr), 'and it has shape', arr.shape)

print('The number of elements in arr == 5 is', len(arr == 5), 'and it has shape', (arr == 5).shape)
print('The number of elements in arr != 5 ', len(arr != 5), 'and it has shape', (arr != 5).shape)

print('The number of elements in arr > 5  ', len(arr > 5), 'and it has shape', (arr > 5).shape)

print('The number of elements in arr < 5  ', len(arr < 5), 'and it has shape', (arr < 5).shape)

print('The number of elements in arr >= 5 ', len(arr >= 5), 'and it has shape', (arr >= 5).shape)

print('The number of elements in arr <= 5 ', len(arr <= 5), 'and it has shape', (arr <= 5).shape)

len(arr) is 10 and it has shape (10,)
The number of elements in arr == 5 is 10 and it has shape (10,)
The number of elements in arr != 5  10 and it has shape (10,)
The number of elements in arr > 5   10 and it has shape (10,)
The number of elements in arr < 5   10 and it has shape (10,)
The number of elements in arr >= 5  10 and it has shape (10,)
The number of elements in arr <= 5  10 and it has shape (10,)


### NumPy has functions that work on an array of Boolean values
* `np.any()` returns `True` if any value in the array is `True` and `False` otherwise
* `np.all()` returns `True` if every value in the array is `True` and `False` otherwise

In [10]:
# Let us check out some examples of `np.any` and `np.all`
arr1 = np.arange(10)
print('arr1 is', arr1)
print('Is any value in arr1 > 5?', np.any(arr1 > 5))
print('Are all values in arr1 > 5?', np.all(arr1 > 5))
print('Are all values in arr1 >= 0?', np.all(arr1 >= 0))

arr1 is [0 1 2 3 4 5 6 7 8 9]
Is any value in arr1 > 5? True
Are all values in arr1 > 5? False
Are all values in arr1 >= 0? True


## Revisit the data on CO2 emission in different categories by US states in 2016

These are the state $CO_2$ datasets we will be using:
* `state_names` is an array of state names.
Each state name corresponds to a row of `state_co2`.
* `state_co2_categories` is an array of category names corresponding to the $CO_2$ source.
Each category name corresponds to a column of `state_co2`.
* `state_co2` is an array of `n`&Cross;`m` where the `n` rows are the `state_names` and the `m` columns `state_co2_categories` of the number of million metric tons of $CO_2$ produced by that state for that category.
* For example, state with name `state_name[5]` had `state_co[5]` $CO_2$ in 2016 in categories `state_co2_categories` for each element in the row.

* This means that `len(state_names) == state_co2.shape[0]` and `len(state_co2_categories) == state_co2.shape[1]`

In [14]:
with open('Unit-5-4-Numpy.data.pickle', 'rb') as fid:
    (state_names, state_co2, state_co2_categories) = pickle.load(fid)

In [11]:
comp116.array_to_html(state_co2, row_names=state_names, col_names=state_co2_categories,
                      title='2016 State CO2 emissions by source type')

Unnamed: 0,Total coal,Total petro,Total nat gas,Transportation coal,Transportation petro,Transportation nat gas,Electrical coal,Electrical petro,Electrical nat gas,Industrial coal,Industrial petro,Industrial nat gas,Commercial coal,Commercial petro,Commercial nat gas
Florida,40.24242,114.793297,75.027273,0.0,102.558839,1.037895,39.011506,2.806961,64.085503,1.230914,5.141591,5.661359,0.0,3.935123,3.406185
Georgia,37.702088,59.912014,38.627747,0.0,53.573344,0.465678,36.658015,0.076222,20.743087,1.044073,3.944018,8.302466,0.0,1.79649,2.797678
Hawaii,1.551589,16.884229,0.008648,0.0,10.166408,0.000106,1.526119,5.050966,0.0,0.02547,1.301315,0.001539,0.0,0.322886,0.005412
Iowa,28.120345,28.183284,16.828779,0.0,20.855202,0.503826,23.57353,0.069273,1.062884,4.267646,5.564446,9.637584,0.279169,0.695859,2.511543
Idaho,0.22857,12.298536,5.854803,0.0,10.443904,0.315952,0.0,0.0,1.253994,0.22857,1.210882,1.929671,0.0,0.390809,0.9755
Illinois,66.214247,82.398941,55.453767,0.0,66.971641,1.324931,58.533361,0.056472,7.813125,7.459538,12.628601,13.842272,0.221348,1.574014,11.51786
Indiana,89.142017,51.322376,41.38871,0.0,42.854033,0.506691,73.458255,0.54409,9.650106,15.541507,6.116061,20.328662,0.142255,1.036828,4.058198
Kansas,23.903218,23.402334,14.757128,0.0,17.592105,1.079915,23.688839,0.027943,1.122042,0.214379,4.89786,7.683401,0.0,0.484509,1.906644
Kentucky,69.55201,39.296103,15.072179,0.0,31.725487,0.504993,67.595258,1.370545,3.634753,1.922734,4.925973,6.618554,0.034018,0.863352,1.829871
Alaska,1.563846,15.789255,17.556345,0.0,11.82811,0.024884,0.91076,0.340367,1.497896,0.001598,2.535799,14.241524,0.651488,0.533472,0.847315


## Using Boolean arrays for selection
### Using Boolean arrays as selectors or masks is common and useful - see [this](https://jakevdp.github.io/PythonDataScienceHandbook/02.06-boolean-arrays-and-masks.html) from the book by Jake Vanderplas

### For example, we can use == to create a Boolean array that selects a single row (in this case a state)

#### When is the state name `Idaho`?

* Set the variable `idaho_boolean_array` to an array of Booleans that is the samesize as `state_names`  and has `True` corresponding to the index of `Idaho` and `False` elsewhere.

In [15]:
print('state_names=', state_names)

idaho_boolean_array = (state_names == 'Idaho')

print('Boolean array for selecting "Idaho" is', idaho_boolean_array)

state_names= ['Florida' 'Georgia' 'Hawaii' 'Iowa' 'Idaho' 'Illinois' 'Indiana' 'Kansas'
 'Kentucky' 'Alaska']
Boolean array for selecting "Idaho" is [False False False False  True False False False False False]


### What is the offset (index) of `Idaho` in `state_names`?

* Set variable `idaho_offset` to the offset in `idaho_boolean_array` that corresponds to'Idaho' is `state_names`

* **Note:** You should consider using the fact that `True` is `1` in python and `False` is `0`.

In [27]:

idaho_offset = np.argmax(idaho_boolean_array)

print('Idaho is at offset', idaho_offset, 'within', state_names)

Idaho is at offset 4 within ['Florida' 'Georgia' 'Hawaii' 'Iowa' 'Idaho' 'Illinois' 'Indiana' 'Kansas'
 'Kentucky' 'Alaska']


### What are the $CO_2$ emissions from different categories for the state of Idaho?
#### In other words, get the row that corresponds to `Idaho` in `state_co2`


In [23]:

idaho_co2= state_co2[idaho_offset, :]

# Don't worry about this reshape, just wanted to print it as a table.
comp116.array_to_html(np.reshape(idaho_co2, (1,len(idaho_co2))), 
                      row_names=['Idaho'], col_names=state_co2_categories)

# And compare to the table to make sure we got it right
comp116.array_to_html(state_co2, row_names=state_names, col_names=state_co2_categories,
                      title='2016 State CO2 emissions by source type')

Unnamed: 0,Total coal,Total petro,Total nat gas,Transportation coal,Transportation petro,Transportation nat gas,Electrical coal,Electrical petro,Electrical nat gas,Industrial coal,Industrial petro,Industrial nat gas,Commercial coal,Commercial petro,Commercial nat gas
Idaho,0.22857,12.298536,5.854803,0.0,10.443904,0.315952,0.0,0.0,1.253994,0.22857,1.210882,1.929671,0.0,0.390809,0.9755


Unnamed: 0,Total coal,Total petro,Total nat gas,Transportation coal,Transportation petro,Transportation nat gas,Electrical coal,Electrical petro,Electrical nat gas,Industrial coal,Industrial petro,Industrial nat gas,Commercial coal,Commercial petro,Commercial nat gas
Florida,40.24242,114.793297,75.027273,0.0,102.558839,1.037895,39.011506,2.806961,64.085503,1.230914,5.141591,5.661359,0.0,3.935123,3.406185
Georgia,37.702088,59.912014,38.627747,0.0,53.573344,0.465678,36.658015,0.076222,20.743087,1.044073,3.944018,8.302466,0.0,1.79649,2.797678
Hawaii,1.551589,16.884229,0.008648,0.0,10.166408,0.000106,1.526119,5.050966,0.0,0.02547,1.301315,0.001539,0.0,0.322886,0.005412
Iowa,28.120345,28.183284,16.828779,0.0,20.855202,0.503826,23.57353,0.069273,1.062884,4.267646,5.564446,9.637584,0.279169,0.695859,2.511543
Idaho,0.22857,12.298536,5.854803,0.0,10.443904,0.315952,0.0,0.0,1.253994,0.22857,1.210882,1.929671,0.0,0.390809,0.9755
Illinois,66.214247,82.398941,55.453767,0.0,66.971641,1.324931,58.533361,0.056472,7.813125,7.459538,12.628601,13.842272,0.221348,1.574014,11.51786
Indiana,89.142017,51.322376,41.38871,0.0,42.854033,0.506691,73.458255,0.54409,9.650106,15.541507,6.116061,20.328662,0.142255,1.036828,4.058198
Kansas,23.903218,23.402334,14.757128,0.0,17.592105,1.079915,23.688839,0.027943,1.122042,0.214379,4.89786,7.683401,0.0,0.484509,1.906644
Kentucky,69.55201,39.296103,15.072179,0.0,31.725487,0.504993,67.595258,1.370545,3.634753,1.922734,4.925973,6.618554,0.034018,0.863352,1.829871
Alaska,1.563846,15.789255,17.556345,0.0,11.82811,0.024884,0.91076,0.340367,1.497896,0.001598,2.535799,14.241524,0.651488,0.533472,0.847315


### Design patterns:
* In software development, certain sequence of operations to achieve some thing occur over and over again.
* In these cases, it is worthwhile to capture the sequence as what is called a Design Pattern.
* The sequence of steps we used to isolate the data in `state_co2` for `Idaho` can be looked at as a design pattern

<br>

* Now, from looking at the table, we can see that `Idaho` is at index (offset) 4
* So why not just use that to isolate the row corresponding to `Idaho`?


In [21]:
# Why is this not a good idea in general?
idaho_co2= state_co2[4, :]
comp116.array_to_html(np.reshape(idaho_co2, (1,len(idaho_co2))), 
                      row_names=['Idaho'], col_names=state_co2_categories)

Unnamed: 0,Total coal,Total petro,Total nat gas,Transportation coal,Transportation petro,Transportation nat gas,Electrical coal,Electrical petro,Electrical nat gas,Industrial coal,Industrial petro,Industrial nat gas,Commercial coal,Commercial petro,Commercial nat gas
Idaho,0.22857,12.298536,5.854803,0.0,10.443904,0.315952,0.0,0.0,1.253994,0.22857,1.210882,1.929671,0.0,0.390809,0.9755


### In the next cell, write the code to get the row in `state_co2` that corresponds to `Alaska`

In [22]:
# Get the state_co2 row for Alaska

alaska_boolean_array = (state_names == 'Alaska')
alaska_offset = np.argmax(alaska_boolean_array)
alaska_co2= state_co2[alaska_offset, :]


comp116.array_to_html(np.reshape(alaska_co2, (1,len(alaska_co2))), 
                      row_names=['Alaska'], col_names=state_co2_categories)


Unnamed: 0,Total coal,Total petro,Total nat gas,Transportation coal,Transportation petro,Transportation nat gas,Electrical coal,Electrical petro,Electrical nat gas,Industrial coal,Industrial petro,Industrial nat gas,Commercial coal,Commercial petro,Commercial nat gas
Alaska,1.563846,15.789255,17.556345,0.0,11.82811,0.024884,0.91076,0.340367,1.497896,0.001598,2.535799,14.241524,0.651488,0.533472,0.847315


### So we have seen the design pattern for selecting a row from a NumPy array.
### But you can use the pattern for selecting a column also

* In the cell below, get the data that corresponds to the total $CO_2$ emitted by the different states from the use of Natural Gas i.e. isolate the column corresponding to `Total nat gas` in `state_co2`

In [28]:
# Getting the column that corresponds to 'Total nat gas'
print(state_co2_categories)


totalNG_boolean_array = (state_co2_categories == 'Total nat gas')
totalNG_offset = np.argmax(totalNG_boolean_array)
totalNG_states = state_co2[:, totalNG_offset]
comp116.array_to_html(totalNG_states, 
                      row_names=state_names, col_names=['Total nat gas'])


['Total coal' 'Total petro' 'Total nat gas' 'Transportation coal'
 'Transportation petro' 'Transportation nat gas' 'Electrical coal'
 'Electrical petro' 'Electrical nat gas' 'Industrial coal'
 'Industrial petro' 'Industrial nat gas' 'Commercial coal'
 'Commercial petro' 'Commercial nat gas']


Unnamed: 0,Total nat gas
Florida,75.027273
Georgia,38.627747
Hawaii,0.008648
Iowa,16.828779
Idaho,5.854803
Illinois,55.453767
Indiana,41.38871
Kansas,14.757128
Kentucky,15.072179
Alaska,17.556345


## Element by Element boolean operators

* We see that Boolean selectors or masks are important for data selection.
* These are arrays of Boolean (`True` or `False` values.
* Can we do Boolean operations between these selectors?
* Let us try it in the cell below
The element by element operations are formally know as _bitwise_ operations.  
That was because a byte was made up of bits and the logical operation happened on each bit within a byte.

In [47]:
# Create a Boolean selectors called florida_mask and kansas_mask that can be used to select the rows for Florida
# and Kansas respectively

florida_mask = (state_names == 'Florida')
kansas_mask = (state_names == 'Kansas')


# Now take the logical or of florida_mask and kansas_mask to select either Florida or Kansas

florida_or_kansas_mask = florida_mask or kansas_mask


ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

### What happened there?

* The problem is that `and`, `or` etc apply to the complete python object that are the operands
* What we want here are element-by-element operations

### Element-by-element Boolean operations between Boolean arrays do exist
### These are `&`, `|` and `~`


* The comparison is done element-by-element.  
* These are also called bitwise logical operations because they can be used to do logical operations on bits in a bte.
* Look at the examples in the cell below.

In [29]:
a = np.array([ False, True, False, True ]) 
b = np.array([False, False, True, True])
print('a is', a)
print('b is', b)
print('element by element a & b', a & b)
print('element by element a | b', a | b)
print('element by element ~a', ~a)

a is [False  True False  True]
b is [False False  True  True]
element by element a & b [False False False  True]
element by element a | b [False  True  True  True]
element by element ~a [ True False  True False]


### So let us revisit creating a Boolean mask that will pick the Florida and Kansas rows

In [36]:
# Create a Boolean selectors called florida_mask and kansas_mask that can be used to select the rows for Florida
# and Kansas respectively

florida_mask = (state_names == 'Florida')
kansas_mask = (state_names == 'Kansas')

# Now take the logical or of florida_mask and kansas_mask to select either Florida or Kansas

florida_or_kansas_mask = florida_mask | kansas_mask


print('State names are', state_names)
print("Florida: ", florida_mask)
florida_mask[3] = True
print("Florida: ", florida_mask)
print("Florida: ", np.argmax(florida_mask))
print("Kansas: ", kansas_mask)
print("Kansas: ", np.argmax(kansas_mask))
print('Florida or Kansas mask is', florida_or_kansas_mask)

State names are ['Florida' 'Georgia' 'Hawaii' 'Iowa' 'Idaho' 'Illinois' 'Indiana' 'Kansas'
 'Kentucky' 'Alaska']
Florida:  [ True False False False False False False False False False]
Florida:  [ True False False  True False False False False False False]
Florida:  0
Kansas:  [False False False False False False False  True False False]
Kansas:  7
Florida or Kansas mask is [ True False False False False False False  True False False]


### So now let us use `florida_or_kansas_mask` to pull out the `state_co2` rows that correspond to these two states

In [56]:
# Use florida_or_kansas_mask to select the rows corresponding to these two states

florida_kansas_co2 = state_co2[florida_or_kansas_mask, :]

print(florida_or_kansas_mask)
comp116.array_to_html(florida_kansas_co2, row_names=state_names[florida_or_kansas_mask], col_names=state_co2_categories)


[ True False False False False False False  True False False]


Unnamed: 0,Total coal,Total petro,Total nat gas,Transportation coal,Transportation petro,Transportation nat gas,Electrical coal,Electrical petro,Electrical nat gas,Industrial coal,Industrial petro,Industrial nat gas,Commercial coal,Commercial petro,Commercial nat gas
Florida,40.24242,114.793297,75.027273,0.0,102.558839,1.037895,39.011506,2.806961,64.085503,1.230914,5.141591,5.661359,0.0,3.935123,3.406185
Kansas,23.903218,23.402334,14.757128,0.0,17.592105,1.079915,23.688839,0.027943,1.122042,0.214379,4.89786,7.683401,0.0,0.484509,1.906644


### Selecting multiple columns using Boolean Selectors

* Just like rows, Boolean element-wise operations can be used to select multiple columns at the same time
* Let us say that we just want to look at the columns that correspond to the total $CO_2$ emissions from coal, petrol adn natural gas i.e. the `Ttal coal`, `Total petro` and `Total nat gas` columns.
* Set variable `total_column_selector` to a boolean selector that picks the `state_co2_categores` corresponding to `Total coal`, `Total petro`,
or `Total nat gas`.
* Let us do this in 3 steps in the next few cells.


In [48]:
# In the first step, create Boolean selectors for the three columns of interest
print('The state CO2 categories are\n', state_co2_categories)


total_coal_boolean_array = (state_co2_categories == 'Total coal')
total_petro_boolean_array = (state_co2_categories ==  'Total petro')
total_nat_gas_boolean_array = (state_co2_categories == 'Total nat gas')


print('The Boolean selector for Total coal is\n', np.argmax(total_coal_boolean_array))
print('The Boolean selector for Total petro is\n', np.argmax(total_petro_boolean_array))
print('The Boolean selector for Total nat gas is\n', np.argmax(total_nat_gas_boolean_array))

The state CO2 categories are
 ['Total coal' 'Total petro' 'Total nat gas' 'Transportation coal'
 'Transportation petro' 'Transportation nat gas' 'Electrical coal'
 'Electrical petro' 'Electrical nat gas' 'Industrial coal'
 'Industrial petro' 'Industrial nat gas' 'Commercial coal'
 'Commercial petro' 'Commercial nat gas']
The Boolean selector for Total coal is
 0
The Boolean selector for Total petro is
 1
The Boolean selector for Total nat gas is
 2


### In step 2, create a `total_boolean_array` that chooses the three total $CO_2$ emissions by element-by-element ops on the boolean arrays you created in the last step.


In [None]:
# In step 2, use Boolean element-by-element operations to choose 
# the three total CO2 emissions for all states
print('The state CO2 categories are\n', state_co2_categories)


total_boolean_array = total_coal_boolean_array | total_petro_boolean_array | \
    total_nat_gas_boolean_array


print('The Boolean selector that picks all total columns is', total_boolean_array)


The state CO2 categories are
 ['Total coal' 'Total petro' 'Total nat gas' 'Transportation coal'
 'Transportation petro' 'Transportation nat gas' 'Electrical coal'
 'Electrical petro' 'Electrical nat gas' 'Industrial coal'
 'Industrial petro' 'Industrial nat gas' 'Commercial coal'
 'Commercial petro' 'Commercial nat gas']
The Boolean selector that picks all total columns is 0


### Finally, in Step 3, use `total_boolean_array` to display the total $CO_2$ emissions from coal, petroleum and natural gas for all states.

In [53]:
# In step 3, use total_boolean_array to show
# the three total CO2 emissions for all states
print('The state CO2 categories are\n', state_co2_categories)
print('The state names are', state_names)

total_co2_all_states = state_co2[:, total_boolean_array]
comp116.array_to_html(total_co2_all_states, 
                      row_names=state_names, 
                      col_names=state_co2_categories[total_boolean_array])

The state CO2 categories are
 ['Total coal' 'Total petro' 'Total nat gas' 'Transportation coal'
 'Transportation petro' 'Transportation nat gas' 'Electrical coal'
 'Electrical petro' 'Electrical nat gas' 'Industrial coal'
 'Industrial petro' 'Industrial nat gas' 'Commercial coal'
 'Commercial petro' 'Commercial nat gas']
The state names are ['Florida' 'Georgia' 'Hawaii' 'Iowa' 'Idaho' 'Illinois' 'Indiana' 'Kansas'
 'Kentucky' 'Alaska']


Unnamed: 0,Total coal,Total petro,Total nat gas
Florida,40.24242,114.793297,75.027273
Georgia,37.702088,59.912014,38.627747
Hawaii,1.551589,16.884229,0.008648
Iowa,28.120345,28.183284,16.828779
Idaho,0.22857,12.298536,5.854803
Illinois,66.214247,82.398941,55.453767
Indiana,89.142017,51.322376,41.38871
Kansas,23.903218,23.402334,14.757128
Kentucky,69.55201,39.296103,15.072179
Alaska,1.563846,15.789255,17.556345


### Finally, finally, can we combine row and column selectors to isolate a subset of the data?
* Can we use `florida_or_and_kansas_mask` and `total_boolean_array` to get the total $CO_2$ emissions from coal, petroleum and natura gas for the states of Florida and Kansas?
* We are doing row and column selection here
* Write your code in the cell below.

In [61]:
# Pulling out the total Co2 emissions for Florida and Kansas from the state_co2 array

print(state_co2.shape)
print(florida_or_kansas_mask.shape)
print(total_boolean_array.shape)
total_co2_florida_kansas = state_co2[florida_or_kansas_mask,:][:, total_boolean_array]


comp116.array_to_html(total_co2_florida_kansas, 
                      row_names=state_names[florida_or_kansas_mask], 
                      col_names=state_co2_categories[total_boolean_array])
#comp116.array_to_html(state_co2, row_names = state_names, col_names = state_co2_categories)

(10, 15)
(10,)
(15,)


Unnamed: 0,Total coal,Total petro,Total nat gas
Florida,40.24242,114.793297,75.027273
Kansas,23.903218,23.402334,14.757128


Things to try:
 * Select category names by different names.
 * Select states by comparing names *greater* than something
 * Select $CO_2$ values greater than a certain value using a boolean selector
