25/05/2020 - DataScience - SoloLearn
https://www.sololearn.com/Play/data-science/

# Data Manipulation

45 U.S. President heights (cm), age when started their presidency, height in chronological order

In [0]:
import numpy as np

heights = [189, 170, 189, 163, 183, 171, 185, 168, 173, 183, 173, 173, 175, 178, 183, 193, 178, 173, 174, 183, 183, 180, 168, 180, 170, 178, 182, 180, 183, 178, 182, 188, 175, 179, 183, 193, 182, 183, 177, 185, 188, 188, 182, 185, 191]
ages = [57, 61, 57, 57, 58, 57, 61, 54, 68, 51, 49, 64, 50, 48, 65, 52, 56, 46, 54, 49, 51, 47, 55, 55, 54, 42, 51, 56, 55, 51, 54, 51, 60, 62, 43, 55, 56, 61, 52, 69, 64, 46, 54, 47, 70]

heights_arr = np.array(heights)

heights_and_ages = heights + ages 
# convert a list to a numpy array
height_age_arr = np.array(heights_and_ages)

height_age_arr = height_age_arr.reshape((2,45))
height_age_arr = height_age_arr.transpose()
height_age_arr

array([[189,  57],
       [170,  61],
       [189,  57],
       [163,  57],
       [183,  58],
       [171,  57],
       [185,  61],
       [168,  54],
       [173,  68],
       [183,  51],
       [173,  49],
       [173,  64],
       [175,  50],
       [178,  48],
       [183,  65],
       [193,  52],
       [178,  56],
       [173,  46],
       [174,  54],
       [183,  49],
       [183,  51],
       [180,  47],
       [168,  55],
       [180,  55],
       [170,  54],
       [178,  42],
       [182,  51],
       [180,  56],
       [183,  55],
       [178,  51],
       [182,  54],
       [188,  51],
       [175,  60],
       [179,  62],
       [183,  43],
       [193,  55],
       [182,  56],
       [183,  61],
       [177,  52],
       [185,  69],
       [188,  64],
       [188,  46],
       [182,  54],
       [185,  47],
       [191,  70]])

## Numpy (**Num**erical **P**ython)

45 U.S. president heights in centimeters in chronological order and stored them in a list, a built-in data type in python.

In [0]:
heights = [189, 170, 189, 163, 183, 171, 185, 168, 173, 183, 173, 173, 175, 178, 183, 193, 178, 173, 174, 183, 183, 180, 168, 180, 170, 178, 182, 180, 183, 178, 182, 188, 175, 179, 183, 193, 182, 183, 177, 185, 188, 188, 182, 185, 191]

In this example, George Washington was the first president, and his height was 189 cm.

If we wanted to know how many presidents are taller than 188cm, we could iterate through the list, compare each element against 188, and increase the count by 1 as the criteria is met.

In [0]:
cnt = 0
for height in heights:
  if height > 188:
    cnt +=1
print(cnt)

5


This shows that there are five presidents who are taller than 188 cm.

### Same problem with Numpy

In [0]:
import numpy as np
heights_arr = np.array(heights)
print((heights_arr > 188).sum())

NameError: ignored

The import statement allows us to access the functions and modules inside the numpy library. The library will be used frequently, so by convention numpy is imported under a shorter name, np. The second line is to convert the list into a numpy array object, via np.array(), that tools provided in numpy can work with. The last line provides a simple and natural solution, enabled by numpy, to the original question.

As our datasets grow larger and more complicated, numpy allows us the use of a more efficient and for-loop-free method to manipulate and analyze our data. Our dataset example in this module will include the US Presidents' height, age and party.

Fill in the blanks to create a numpy array 'arr' from the list 'lst' given below:
```
import numpy as np
lst = [1,0,1,0]
arr = ________(lst)
```

In [0]:
import numpy as np
lst = [1,0,1,0]
arr = np.array(lst)

### Size (~len: length) and Shape (dimension)

An array class in Numpy is called an ndarray or n-dimensional array. We can use this to count the number of presidents in heights_arr, use attribute numpy.ndarray.size:

In [0]:
heights_arr.size

45

Note that once an array is created in numpy, its size cannot be changed.

Size tells us how big the array is, shape tells us the dimension. To get current shape of an array use attribute shape:

In [0]:
heights_arr.shape

(45,)

The output is a tuple, recall that the built-in data type tuple is immutable whereas a list is mutable, containing a single value, indicating that there is only one dimension, i.e., axis 0. Along axis 0, there are 45 elements (one for each president) Here, heights_arr is a 1d array.

Attribute size in numpy is similar to the built-in method len in python that is used to compute the length of iterable python objects like str, list, dict, etc.

### Reshape

In [0]:
ages = [57, 61, 57, 57, 58, 57, 61, 54, 68, 51, 49, 64, 50, 48, 65, 52, 56, 46, 54, 49, 51, 47, 55, 55, 54, 42, 51, 56, 55, 51, 54, 51, 60, 62, 43, 55, 56, 61, 52, 69, 64, 46, 54, 47, 70]

Since both heights and ages are all about the same presidents, we can combine them:

In [0]:
heights_and_ages = heights + ages 
# convert a list to a numpy array
heights_and_ages_arr = np.array(heights_and_ages)
heights_and_ages_arr.shape

(90,)

This produces one long array. It would be clearer if we could align height and age for each president and reorganize the data into a 2 by 45 matrix where the first row contains all heights and the second row contains ages. To achieve this, a new array can be created by calling numpy.ndarray.reshape with new dimensions specified in a tuple:

In [0]:
print(heights_and_ages_arr.reshape((2,45)))
heights_and_ages_arr = heights_and_ages_arr.reshape((2,45))


[[189 170 189 163 183 171 185 168 173 183 173 173 175 178 183 193 178 173
  174 183 183 180 168 180 170 178 182 180 183 178 182 188 175 179 183 193
  182 183 177 185 188 188 182 185 191]
 [ 57  61  57  57  58  57  61  54  68  51  49  64  50  48  65  52  56  46
   54  49  51  47  55  55  54  42  51  56  55  51  54  51  60  62  43  55
   56  61  52  69  64  46  54  47  70]]


The reshaped array is now a 2darray, yet note that the original array is not changed. We can reshape an array in multiple ways, as long as the size of the reshaped array matches that of the original.

Numpy can calculate the shape (dimension) for us if we indicate the unknown dimension as -1. For example, given a 2darray `arr` of shape (3,4), arr.reshape(-1) would output a 1darray of shape (12,), while arr.reshape((-1,2)) would generate a 2darray of shape (6,2).

Review the code below and reshape the 1darray of shape (45,) to a 2darray with a shape of (5, 9).
```
heights_arr.______
>>> (45, )
heights_arr_reshaped =
  heights_arr._______((5, 9))
```

In [0]:
heights_arr.shape
heights_arr_reshaped = heights_arr.reshape((5, 9))

### Data Type

Another characteristic about numpy array is that it is homogeneous, meaning each element must be of the same data type.

For example, in heights_arr, we recorded all heights in whole numbers; thus each element is stored as an integer in the array. To check the data type, use numpy.ndarray.dtype

In [0]:
heights_arr.dtype

dtype('int64')

If we mixed a float number in, say, the first element is 189.0 instead of 189:

In [0]:
heights_float = [189.0, 170, 189, 163, 183, 171, 185, 168, 173, 183, 173, 173, 175, 178, 183, 193, 178, 173, 174, 183, 183, 180, 168, 180, 170, 178, 182, 180, 183, 178, 182, 188, 175, 179, 183, 193, 182, 183, 177, 185, 188, 188, 182, 185, 191]

Then after converting the list into an array, we’d see all other numbers are coerced into floats:

In [0]:
heights_float_arr = np.array(heights_float)
heights_float_arr
heights_float_arr.dtype

dtype('float64')

Numpy supports several data types such as int (integer), float (numeric floating point), and bool (boolean values, True and False). The number after the data type, ex. int64, represents the bitsize of the data type.

What is the data type of heights_and_ages_arr?

Recall:

```
heights = [189, 170, 189, 163, ... 182, 185, 191]
ages = [57, 61, 57, ... 62, 43, 55, 56]
```



float, bool, int ?

int

### Indexing

We can use array indexing to select individual elements from arrays. Like Python lists, numpy index starts from 0.

To access the height of the 3rd president Thomas Jefferson in the 1darray 'heights_arr':

In [0]:
heights_arr[2]

189

In a 2darray, there are two axes, axis 0 and 1. Axis 0 runs downward down the rows whereas axis 1 runs horizontally across the columns.

In the 2darrary heights_and_ages_arr, recall that its dimensions are (2, 45). To find Thomas Jefferson’s age at the beginning of his presidency you would need to access the second row where ages are stored:

In [0]:
heights_and_ages_arr[1,2]

57

In 2darray, the row is axis 0 and the column is axis 1, therefore, to access a 2darray, numpy first looks for the position in rows, then in columns. So in our example heights_and_ages_arr[1,2], we are accessing row 2 (ages), column 3 (third president) to find Thomas Jefferson’s age.

Which of the following would correctly select row 5 and column 1 in the 2darray 'arr'?
```
arr[1,5]
arr[5,1]
arr[0,4]
arr[4,0]
```



arr[4,0]

### Slicing

What if we want to inspect the first three elements from the first row in a 2darray? We use ":" to select all the elements from the index up to but not including the ending index. This is called slicing.

In [0]:
heights_and_ages_arr[0, 0:3]

array([189, 170, 189])

When the starting index is 0, we can omit it as shown below:

In [0]:
heights_and_ages_arr[0, :3]

array([189, 170, 189])

What if we’d like to see the entire third column? Specify this by using a ":" as follows

In [0]:
heights_and_ages_arr[:, 3]

array([163,  57])

Numpy slicing syntax follows that of a python list: arr[start:stop:step]. When any of these are unspecified, they default to the values start=0, stop=size of dimension, step=1

Which of the following is the correct syntax to select the second column of the 2darray heights_and_ages_arr of shape (2, 45)?
```
heights_and_ages_arr[:, 1]
heights_and_ages_arr[, 1]
heights_and_ages_arr[1, :]
```



In [0]:
heights_and_ages_arr[:, 1]

array([170,  61])

### Assigning values

Sometimes you need to change the values of particular elements in the array. For example, we noticed the fourth entry in the heights_arr was incorrect, it should be 165 instead of 163, we can re-assign the correct number by:

In [0]:
heights_arr[3] = 165

In a 2darray, single values can be assigned easily. You can use indexing for one element. For example, change the fourth entry in heights_arr to 165:

In [0]:
heights_and_ages_arr[0, 3] = 165
heights_and_ages_arr

array([[189, 170, 189, 165, 183, 171, 185, 168, 173, 183, 173, 173, 175,
        178, 183, 193, 178, 173, 174, 183, 183, 180, 168, 180, 170, 178,
        182, 180, 183, 178, 182, 188, 175, 179, 183, 193, 182, 183, 177,
        185, 188, 188, 182, 185, 191],
       [ 57,  61,  57,  57,  58,  57,  61,  54,  68,  51,  49,  64,  50,
         48,  65,  52,  56,  46,  54,  49,  51,  47,  55,  55,  54,  42,
         51,  56,  55,  51,  54,  51,  60,  62,  43,  55,  56,  61,  52,
         69,  64,  46,  54,  47,  70]])

Or we can use slicing for multiple elements. For example, to replace the first row by its mean 180 in heights_and_ages_arr:

In [0]:
heights_and_ages_arr[0,:] = 180
heights_and_ages_arr

array([[180, 180, 180, 180, 180, 180, 180, 180, 180, 180, 180, 180, 180,
        180, 180, 180, 180, 180, 180, 180, 180, 180, 180, 180, 180, 180,
        180, 180, 180, 180, 180, 180, 180, 180, 180, 180, 180, 180, 180,
        180, 180, 180, 180, 180, 180],
       [ 57,  61,  57,  57,  58,  57,  61,  54,  68,  51,  49,  64,  50,
         48,  65,  52,  56,  46,  54,  49,  51,  47,  55,  55,  54,  42,
         51,  56,  55,  51,  54,  51,  60,  62,  43,  55,  56,  61,  52,
         69,  64,  46,  54,  47,  70]])

We can also combine slicing to change any subset of the array. For example, to reassign 0 to the left upper corner:

In [0]:
heights_and_ages_arr[:2, :2] = 0
heights_and_ages_arr

array([[  0,   0, 180, 180, 180, 180, 180, 180, 180, 180, 180, 180, 180,
        180, 180, 180, 180, 180, 180, 180, 180, 180, 180, 180, 180, 180,
        180, 180, 180, 180, 180, 180, 180, 180, 180, 180, 180, 180, 180,
        180, 180, 180, 180, 180, 180],
       [  0,   0,  57,  57,  58,  57,  61,  54,  68,  51,  49,  64,  50,
         48,  65,  52,  56,  46,  54,  49,  51,  47,  55,  55,  54,  42,
         51,  56,  55,  51,  54,  51,  60,  62,  43,  55,  56,  61,  52,
         69,  64,  46,  54,  47,  70]])

It is easy to update values in a subarray when you combine arrays with slicing. For more on basic slicing and advanced indexing in numpy check out this [link](https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html).

Replace the value in the second row and third column of the array heights_and_ages_arr with 2.
```
heights_and_ages_arr[__,__] = 2
```

In [0]:
heights_and_ages_arr[1,2] = 2
heights_and_ages_arr

array([[  0,   0, 180, 180, 180, 180, 180, 180, 180, 180, 180, 180, 180,
        180, 180, 180, 180, 180, 180, 180, 180, 180, 180, 180, 180, 180,
        180, 180, 180, 180, 180, 180, 180, 180, 180, 180, 180, 180, 180,
        180, 180, 180, 180, 180, 180],
       [  0,   0,   2,   2,  58,  57,  61,  54,  68,  51,  49,  64,  50,
         48,  65,  52,  56,  46,  54,  49,  51,  47,  55,  55,  54,  42,
         51,  56,  55,  51,  54,  51,  60,  62,  43,  55,  56,  61,  52,
         69,  64,  46,  54,  47,  70]])

### Assigning an array to an array

In addition, a 1darray or a 2darry can be assigned to a subset of another 2darray, as long as their shapes match. Recall the 2darray heights_and_ages_arr:

In [0]:
heights_and_ages_arr

array([[  0,   0, 180, 180, 180, 180, 180, 180, 180, 180, 180, 180, 180,
        180, 180, 180, 180, 180, 180, 180, 180, 180, 180, 180, 180, 180,
        180, 180, 180, 180, 180, 180, 180, 180, 180, 180, 180, 180, 180,
        180, 180, 180, 180, 180, 180],
       [  0,   0,   2,   2,  58,  57,  61,  54,  68,  51,  49,  64,  50,
         48,  65,  52,  56,  46,  54,  49,  51,  47,  55,  55,  54,  42,
         51,  56,  55,  51,  54,  51,  60,  62,  43,  55,  56,  61,  52,
         69,  64,  46,  54,  47,  70]])

If we want to update both height and age of the first president with new data, we can supply the data in a list:

In [0]:
heights_and_ages_arr[:, 0] = [190, 58]
heights_and_ages_arr

array([[190,   0, 180, 180, 180, 180, 180, 180, 180, 180, 180, 180, 180,
        180, 180, 180, 180, 180, 180, 180, 180, 180, 180, 180, 180, 180,
        180, 180, 180, 180, 180, 180, 180, 180, 180, 180, 180, 180, 180,
        180, 180, 180, 180, 180, 180],
       [ 58,   0,   2,   2,  58,  57,  61,  54,  68,  51,  49,  64,  50,
         48,  65,  52,  56,  46,  54,  49,  51,  47,  55,  55,  54,  42,
         51,  56,  55,  51,  54,  51,  60,  62,  43,  55,  56,  61,  52,
         69,  64,  46,  54,  47,  70]])

We can also update data in a subarray with a numpy array as such:

In [0]:
new_record = np.array([[180, 183, 190], [54, 50, 69]])
heights_and_ages_arr[:, 42:] = new_record
heights_and_ages_arr

array([[190,   0, 180, 180, 180, 180, 180, 180, 180, 180, 180, 180, 180,
        180, 180, 180, 180, 180, 180, 180, 180, 180, 180, 180, 180, 180,
        180, 180, 180, 180, 180, 180, 180, 180, 180, 180, 180, 180, 180,
        180, 180, 180, 180, 183, 190],
       [ 58,   0,   2,   2,  58,  57,  61,  54,  68,  51,  49,  64,  50,
         48,  65,  52,  56,  46,  54,  49,  51,  47,  55,  55,  54,  42,
         51,  56,  55,  51,  54,  51,  60,  62,  43,  55,  56,  61,  52,
         69,  64,  46,  54,  50,  69]])

Note the last three columns' values have changed.

Updating a multidimensional array with a new record is straightforward in numpy as long as their shapes match.

Drag and drop to update both heights and ages for the first five presidents in heights_and_ages_arr with a numpy array new_record:
```
new_record = np.__([[188, 190, 189, 165, 180],
                       [58, 62, 55, 68, 80]])
new_record.shape
>>> __
heights_and_ages_arr[:2,__] = new_record

ndarray 
(2, 5)
(10, 1)
:5
:4
array
list
```

In [0]:
new_record = np.array([[188, 190, 189, 165, 180],
                       [58, 62, 55, 68, 80]])
new_record.shape

(2, 5)

In [0]:
heights_and_ages_arr[:2,:5] = new_record
heights_and_ages_arr

array([[188, 190, 189, 165, 180, 180, 180, 180, 180, 180, 180, 180, 180,
        180, 180, 180, 180, 180, 180, 180, 180, 180, 180, 180, 180, 180,
        180, 180, 180, 180, 180, 180, 180, 180, 180, 180, 180, 180, 180,
        180, 180, 180, 180, 183, 190],
       [ 58,  62,  55,  68,  80,  57,  61,  54,  68,  51,  49,  64,  50,
         48,  65,  52,  56,  46,  54,  49,  51,  47,  55,  55,  54,  42,
         51,  56,  55,  51,  54,  51,  60,  62,  43,  55,  56,  61,  52,
         69,  64,  46,  54,  50,  69]])

### Combining two arrays

Oftentime we obtain data stored in different arrays and we need to combine them into one to keep it in one place. For example, instead of having the ages stored in a list, it could be stored in a 2darray:

In [0]:
heights_arr = np.array(heights)
heights_arr.shape
ages_arr = np.array(ages)
ages_arr.shape
ages_arr[:3,]

array([57, 61, 57])

If we reshape the heights_arr to (45,1), the same as 'ages_arr', we can stack them horizontally (by column) to get a 2darray using 'hstack':

In [0]:
#heights_arr = heights_arr.reshape((45,1))
height_age_arr = np.hstack((heights_arr, ages_arr))
height_age_arr.shape
height_age_arr[:3,]

array([189, 170, 189])

Now height_age_arr has both heights and ages for the presidents, each column corresponds to the height and age of one president.

Similarly, if we want to combine the arrays vertically (by row), we can use 'vstack'.

In [0]:
heights_arr = heights_arr.reshape((1,45))
ages_arr = ages_arr.reshape((1,45))

height_age_arr = np.vstack((heights_arr, ages_arr))
height_age_arr.shape
height_age_arr[:,:3]

array([[189, 170, 189],
       [ 57,  61,  57]])

To combine more than two arrays horizontally, simply add the additional arrays into the tuple.

Fill in the blank: To combine arr1 of shape (10, 2) and arr2 of shape (5, 2) into a new array arr3 of shape (15, 2):
```
arr3 = np.___((arr1, arr2))
```

In [0]:
arr3 = np.vstack((arr1, arr2))

NameError: ignored

Visualization 😁

hstack:
[ [ 🍔 🍕 ] [ 🍔 🍕 ] [ 🍔 🍕 ] ]

vstack:
[ [ 🍔 🍔 🍔 ]
  [🍕 🍕 🍕  ] ]

### Concatenate

More generally, we can use the function numpy.concatenate. If we want to concatenate, link together, two arrays along rows, then pass 'axis = 0' to achieve the same result as using numpy.hstack; and pass 'axis = 1' if you want to combine arrays vertically.

In the example from the previous part, we were using hstack to combine two arrays horizontally, instead:

In [0]:
height_age_arr = np.concatenate((heights_arr, ages_arr), axis=0)

You can use np.hstack to concatenate arrays ONLY if they have the same number of rows.

Fill in the blanks: To concatenate arr1 of shape (5, 2) and arr2 of shape (5, 1) horizontally:

```
np.concatenate((arr1, arr2), axis= __)
```



In [0]:
arr1 = np.random.rand(5,2)
arr2 = np.random.rand(5,1)
np.concatenate((arr1, arr2), axis= 1)

array([[0.80701412, 0.50448331, 0.1066778 ],
       [0.03232503, 0.75743669, 0.40894384],
       [0.09415603, 0.14354504, 0.02587719],
       [0.54053785, 0.61959336, 0.05005252],
       [0.37322634, 0.04261709, 0.86178055]])

### Operations

Performing mathematical operations on arrays is straightforward. For instance, to convert the heights from centimeters to feet, knowing that 1 centimeter is equal to 0.0328084 feet, we can use multiplication:

Other mathematical operations for addition, subtraction, division and power (+, -, /, **) work the same way on arrays.

In [0]:
height_age_arr[:,0]*0.0328084

array([6.2007876, 1.8700788])

Now we have all heights in feet. Note that this operation won’t change the original array, it returns a new 1darray where 0.0328084 has been multiplied to each element in the first column of 'heights_age_arr'.

### Method

Other operations, such as .min(), .max(), .mean(), work in a similar way to .sum().

In addition, there are several methods in numpy to perform more complex calculations on arrays. For example, the sum() method finds the sum of all the elements in an array:

In [0]:
height_age_arr.sum()

NameError: ignored

The sum of all heights and ages is 10575. In order to sum all heights and sum all ages separately, we can specify axis=0 to calculate the sum across the rows, that is, it computes the sum for each column, or column sum. On the other hand, to obtain the row sums specify axis=1. In this example, we want to calculate the total sum of heights and ages, respectively:

In [0]:
height_age_arr.sum(axis=0)

The output is the row sums: heights of all presidents (i.e., the first row) add up to 8100, and the sum of ages (i.e., the second row) is 2475.

### Comparisons

To find out how many rows satisfy the condition, use .sum() on the resultant 1d boolean array, e.g., (height_age_arr[:, 1] == 51).sum(), to see that there were exactly five presidents who started the presidency at age 51. True is treated as 1 and False as 0 in the sum.

In practicing data science, we often encounter comparisons to identify rows that match certain values. We can use operations including "<", ">", ">=", "<=", and "==" to do so. For example, in the height_age_arr dataset, we might be interested in only those presidents who started their presidency younger than 55 years old.

In [0]:
height_age_arr[:, 1] < 55

array([False, False, False, False, False, False, False,  True, False,
        True,  True, False,  True,  True, False,  True, False,  True,
        True,  True,  True,  True, False, False,  True,  True,  True,
       False, False,  True,  True,  True, False, False,  True, False,
       False, False,  True, False, False,  True,  True,  True, False])


The output is a 1darray with boolean values that indicates which presidents meet the criteria. If we are only interested in which presidents started their presidency at 51 years of age, we can use "==" instead.

In [0]:
height_age_arr[:, 1] == 51

array([False, False, False, False, False, False, False, False, False,
        True, False, False, False, False, False, False, False, False,
       False, False,  True, False, False, False, False, False,  True,
       False, False,  True, False,  True, False, False, False, False,
       False, False, False, False, False, False, False, False, False])

Complete the code to output a boolean array that indicates exactly how many presidents had a height of 170cm at the start of their presidency.

In [0]:
(height_age_arr[:, 0] == 170).sum()

2

### Mask and Subsetting

Masking is used to extract, modify, count, or otherwise manipulate values in an array based on some criterion. In our example, the criteria was height of 182cm or taller.

Now that rows matching certain criteria can be identified, a subset of the data can be found. For example, instead of the entire dataset, we want only tall presidents, that is, those presidents whose height is greater than or equal to 182 cm. We first create a mask, 1darray with boolean values:

In [0]:
mask = height_age_arr[:, 0] >= 182
mask.sum()

23

Then pass it to the first axis of `height_age_arr` to filter presidents who don’t meet the criteria:

In [0]:
tall_presidents = height_age_arr[mask, ]
tall_presidents.shape

(23, 2)

This is a subarray of height_age_arr, and all presidents in tall_presidents were at least 182cm tall.

Fill in the blanks to obtain a subset of presidents who started presidency under age 50.
```
mask = height_age_arr[:,1] __ 50
young_president = height_age_arr[__,]
```



In [0]:
mask = height_age_arr[:,1] < 50
young_president = height_age_arr[mask,]
young_president

array([[173,  49],
       [178,  48],
       [173,  46],
       [183,  49],
       [180,  47],
       [178,  42],
       [183,  43],
       [188,  46],
       [185,  47]])

### Multiple Criteria


We can create a mask satisfying more than one criteria. For example, in addition to height, we want to find those presidents that were 50 years old or younger at the start of their presidency. To achieve this, we use & to separate the conditions and each condition is encapsulated with parentheses "()" as shown below:

In [0]:
mask = (height_age_arr[:, 0]>=182) & (height_age_arr[:,1]<=50)
height_age_arr[mask,]

23

The results show us that there are four presidents who satisfy both conditions.

Type in the code to create a mask that identifies the rows where the presidents are shorter than 180cm and started their presidency younger than 60 years old.
```
__ = (height_age_arr[:,0]<180) & (height_age_arr[:, 1]< __)
```



In [0]:
mask = (height_age_arr[:,0]<180) & (height_age_arr[:, 1]< 60)
height_age_arr[mask,]

array([[163,  57],
       [171,  57],
       [168,  54],
       [173,  49],
       [175,  50],
       [178,  48],
       [178,  56],
       [173,  46],
       [174,  54],
       [168,  55],
       [170,  54],
       [178,  42],
       [178,  51],
       [177,  52]])

### Quiz

Fill in the blanks to create a numpy array from the list given:
```
import numpy as __
lst = [1,-1,1,-1]
arr = np.__(lst)
list array arr np
```



In [0]:
import numpy as np
lst = [1,-1,1,-1]
arr = np.array(lst)
arr

array([ 1, -1,  1, -1])

What is the correct shape of  arr = np.array([1,2,3])?
```
(3, )
(3, 1)
(1, 3)
```

In [0]:
arr = np.array([1,2,3])
arr.shape

(3,)

To compute the column sums of the 2darray `arr`, which sum() method would we use?
```
arr.sum(axis = 0)
arr.sum()
arr.sum(axis = 1)
```



In [0]:
arr.sum(axis = 0)

6

Fill in the blanks to find the column minimums.
```
import numpy as np
arr = np.array([[ 1, 2, 3], [2, 4, 6]])
arr.__(__=0)
```


In [0]:
arr = np.array([[ 1, 2, 3], [2, 4, 6]])
arr.min(axis=0)

array([1, 2, 3])

Which of the following commands is to multiply the first row of a 2darray arr by 3:
```
arr[:,0]*3
arr[0,:]*3
arr[0] * 3
```



In [0]:
arr[0,:]*3

array([3, 6, 9])

# Data Analysis