## [Numpy Exercises for Data Analysis](https://www.machinelearningplus.com/101-numpy-exercises-python/)

<span style="color: red">1.Import numpy as np and see the version</span>

Q. Import numpy as `np` and print the version number.

In [1]:
import numpy as np
print(np.__version__)

1.13.3


<span style="color: red">2.How to create a 1D array?</span>

Q. Create a 1D array of numbers from 0 to 9

In [2]:
arr = np.arange(10)
arr

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

<span style="color: red">3.How to create a boolean array?</span>

Q. Create a 3×3 numpy array of all True’s

In [3]:
np.ones((3, 3), dtype=bool)

array([[ True,  True,  True],
       [ True,  True,  True],
       [ True,  True,  True]], dtype=bool)

In [4]:
np.zeros((3, 3), dtype=bool)

array([[False, False, False],
       [False, False, False],
       [False, False, False]], dtype=bool)

<span style="color: red">4.How to extract items that satisfy a given condition from 1D array?</span>

Input:  
`arr = np.arange(10)`  
Desired Output:  
`array([1, 3, 5, 7, 9])`

In [5]:
arr = np.arange(10)
arr[arr % 2 ==1]

array([1, 3, 5, 7, 9])

<span style="color: red">5.How to replace items that satisfy a condition with another value in numpy array?</span>

Q. Replace all odd numbers in **arr** with -1

In [6]:
arr = np.arange(10)
arr[arr % 2 == 1] = -1
arr

array([ 0, -1,  2, -1,  4, -1,  6, -1,  8, -1])

<span style="color: red">6.How to replace items that satisfy a condition without affecting the original array?</span>

Q. Replace all odd numbers in **arr** with -1 without changing **arr**

In [7]:
arr = np.arange(10)
np.where(arr % 2 == 1, -1, arr)

array([ 0, -1,  2, -1,  4, -1,  6, -1,  8, -1])

In [8]:
arr

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

<span style="color: red">7.How to reshape an array?</span>

Q. Convert a 1D array to a 2D array with 2 rows

In [9]:
arr = np.arange(10)
arr.reshape((2, 5))

array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])

<span style="color: red">8.How to stack two arrays vertically?</span>

Q. Stack arrays **a** and **b** vertically.

In [10]:
a = np.arange(10).reshape(2, -1)
b = np.repeat(1, 10).reshape(2, -1)
a

array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])

In [11]:
b

array([[1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1]])

In [12]:
np.r_[a, b]

array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9],
       [1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1]])

<span style="color: red">9.How to stack two arrays horizontally?</span>

Q. Stack the arrays **a** and **b** horizontally.

In [13]:
np.c_[a, b]

array([[0, 1, 2, 3, 4, 1, 1, 1, 1, 1],
       [5, 6, 7, 8, 9, 1, 1, 1, 1, 1]])

<span style="color: red">10.How to generate custom sequences in numpy without hardcoding?</span>

Q. Create the following pattern without hardcoding. Use only numpy functions and the below input array **a**.

Input:  
`a = np.array((1, 2, 3))`  
Desired Output:  
`array([1, 1, 1, 2, 2, 2, 3, 3, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3])`

In [14]:
a = np.array((1, 2, 3))
np.r_[np.repeat(a, 3), np.tile(a, 3)]

array([1, 1, 1, 2, 2, 2, 3, 3, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3])

<span style="color: red">11.How to get the common items between two python numpy arrays?</span>

Q. Get the common items between **a** and **b**.

In [15]:
a = np.array([1, 2, 3, 2, 3, 4, 3, 4, 5, 6])
b = np.array([7, 2, 10, 2, 7, 4, 9, 4, 9, 8])
np.intersect1d(a, b)

array([2, 4])

<span style="color: red">12.How to remove from one array those items that exist in another?</span>

Q. From array **a** remove all items present in array **b**.

In [16]:
a = np.array([1, 2, 3, 4, 5])
b = np.array([5, 6, 7, 8, 9])
np.setdiff1d(a, b)

array([1, 2, 3, 4])

<span style="color: red">13.How to get the positions where elements of two arrays match?</span>

Q. Get the positions where elements of **a** and **b** match

In [17]:
a = np.array([1, 2, 3, 2, 3, 4, 3, 4, 5, 6])
b = np.array([7, 2, 10, 2, 7, 4, 9, 4, 9, 8])
np.where(a == b)

(array([1, 3, 5, 7], dtype=int64),)

<span style="color: red">14.How to extract all numbers between a given range from a numpy array?</span>

Q. Get all items between 5 and 10 from **a**.

In [18]:
a = np.arange(15)
a[(a >= 5) & (a <=10)]

array([ 5,  6,  7,  8,  9, 10])

In [19]:
np.where((a >= 5) & (a <= 10))

(array([ 5,  6,  7,  8,  9, 10], dtype=int64),)

In [20]:
np.where(np.logical_and(a>=5, a<=10))

(array([ 5,  6,  7,  8,  9, 10], dtype=int64),)

<span style="color: red">15.How to make a python function that handles scalars to work on numpy arrays?</span>

Q. Convert the function maxx that works on two scalars, to work on two arrays.

Input:  
```python
def maxx(x, y):
    """Get the maximum of two items."""
    if x >= y:
        return x
    else:
        return y

#> maxx(1, 5)
#> 5
```  
Desired Output:
```python
a = np.array([5, 7, 9, 8, 6, 4, 5])
b = np.array([6, 3, 4, 8, 9, 7, 1])

#> pair_max(a, b)
#> array([6., 7., 9., 8., 7., 5.])
```

In [21]:
def maxx(x, y):
    """Get the maximum of two items."""
    if x >= y:
        return x
    else:
        return y

pair_max = np.vectorize(maxx, otypes=[float])

a = np.array([5, 7, 9, 8, 6, 4, 5])
b = np.array([6, 3, 4, 8, 9, 7, 1])

pair_max(a, b)

array([ 6.,  7.,  9.,  8.,  9.,  7.,  5.])

<span style="color: red">16.How to swap two columns in a 2d numpy array?</span>

Q. Swap columns 1 and 2 in the array **arr**.

In [22]:
arr = np.arange(9).reshape((3, 3))
arr[:, [1, 0, 2]]

array([[1, 0, 2],
       [4, 3, 5],
       [7, 6, 8]])

<span style="color: red">17.How to swap two rows in a 2d numpy array?</span>

Q. Swap rows 1 and 2 in the array **arr**.

In [23]:
arr = np.arange(9).reshape((3, 3))
arr[[1, 0, 2]]

array([[3, 4, 5],
       [0, 1, 2],
       [6, 7, 8]])

<span style="color: red">18.How to reverse the rows of a 2D array?</span>

Q. Reverse the rows of a 2D array **arr**.

In [24]:
arr = np.arange(9).reshape((3, 3))
arr[::-1]

array([[6, 7, 8],
       [3, 4, 5],
       [0, 1, 2]])

In [25]:
arr

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

<span style="color: red">19.How to reverse the columns of a 2D array?</span>

Q. Reverse the columns of a 2D array **arr**.

In [26]:
arr[:, ::-1]

array([[2, 1, 0],
       [5, 4, 3],
       [8, 7, 6]])

<span style="color: red">20.How to create a 2D array containing random floats between 5 and 10?</span>

Q. Create a 2D array of shape 5x3 to contain random decimal numbers between 5 and 10.

In [27]:
np.random.randint(5, 10, size=(5, 3)) + np.random.rand(5, 3)

array([[ 8.77305711,  6.57468646,  9.17556093],
       [ 7.40076307,  7.48461222,  7.37856231],
       [ 7.83176895,  8.18174803,  7.66970361],
       [ 9.28652074,  5.45747331,  6.33414947],
       [ 7.16296112,  6.42996746,  9.05844405]])

<span style="color: red">21.How to print only 3 decimal places in python numpy array?</span>

Q. Print or show only 3 decimal places of the numpy array **rand_arr**.

In [28]:
np.set_printoptions(precision=3)
rand_arr = np.random.random((5, 3))
rand_arr

array([[ 0.401,  0.632,  0.54 ],
       [ 0.604,  0.815,  0.249],
       [ 0.403,  0.795,  0.287],
       [ 0.759,  0.135,  0.457],
       [ 0.613,  0.985,  0.026]])

<span style="color: red">22.How to pretty print a numpy array by suppressing the scientific notation (like 1e10)?</span>

Q. Pretty print **rand_arr** by suppressing the scientific notation (like 1e10)

如何通过禁用科学计数法（如1e10）打印Numpy数组**rand_arr**.

In [29]:
np.random.seed(1000)
np.set_printoptions(precision=6, suppress=True)
rand_arr = np.random.random((3, 3)) / 1e3
rand_arr

array([[ 0.000654,  0.000115,  0.00095 ],
       [ 0.000482,  0.000872,  0.000212],
       [ 0.000041,  0.000397,  0.000233]])

<span style="color: red">23.How to limit the number of items printed in output of numpy array?</span>

Q. Limit the number of items printed in python numpy array a to a maximum of 6 elements.

如何限制Numpy数组输出中项m的数目。

In [30]:
np.set_printoptions(threshold=6)
a = np.arange(15)
a

array([ 0,  1,  2, ..., 12, 13, 14])

<span style="color: red">24.How to print the full numpy array without truncating?</span>

Q. Print the full numpy array a without truncating.

如何在不截断数组的前提下打印出完整的Numpy数组？

In [31]:
a = np.arange(15)
np.set_printoptions(threshold=np.nan)
a

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

<span style="color: red">25.How to import a dataset with numbers and texts keeping the text intact in python numpy?</span>

Q. Import the iris dataset keeping the text intact.

In [32]:
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris = np.genfromtxt(url, delimiter=',', dtype='object')
names = ('sepallength', 'sepalwidth', 'petallength', 'petalwidth', 'species')
iris[:3]

array([[b'5.1', b'3.5', b'1.4', b'0.2', b'Iris-setosa'],
       [b'4.9', b'3.0', b'1.4', b'0.2', b'Iris-setosa'],
       [b'4.7', b'3.2', b'1.3', b'0.2', b'Iris-setosa']], dtype=object)

<span style="color: red">26.How to extract a particular column from 1D array of tuples?</span>

Q. Extract the text column *species* from the 1D iris imported in previous question.

如何从1维元组数组中提取特定的列？

In [33]:
species = np.array([row[4] for row in iris])
species[: 5]

array([b'Iris-setosa', b'Iris-setosa', b'Iris-setosa', b'Iris-setosa',
       b'Iris-setosa'],
      dtype='|S15')

<span style="color: red">27.How to convert a 1d array of tuples to a 2d numpy array?</span>

Q. Convert the 1D **iris** to 2D array **iris_2d** by omitting the *species* text field.

如何将1维元组转换成2维Numpy数组？

In [34]:
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris_1d = np.genfromtxt(url, delimiter=',', dtype=None)
iris_1d[:10]
iris_2d = np.array([row.tolist()[:4] for row in iris_1d])
iris_2d[:5]

array([[ 5.1,  3.5,  1.4,  0.2],
       [ 4.9,  3. ,  1.4,  0.2],
       [ 4.7,  3.2,  1.3,  0.2],
       [ 4.6,  3.1,  1.5,  0.2],
       [ 5. ,  3.6,  1.4,  0.2]])

<span style="color: red">28.How to compute the mean, median, standard deviation of a numpy array?</span>

Q. Find the mean, median, standard deviation of iris's sepallength (1st column)


In [35]:
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris = np.genfromtxt(url, delimiter=',', dtype='object')
sepallength = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0])

mu, med, std = np.mean(sepallength), np.median(sepallength), np.std(sepallength)
print(mu, med, std)

5.84333333333 5.8 0.825301291785


<span style="color: red">29.How to normalize an array so the values range exactly between 0 and 1?</span>

Q. Create a normalized form of iris's sepallength whose values range exactly between 0 and 1 so that the minimum has value 0 and maximum has value 1.

In [36]:
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
sepallength = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0])
(sepallength - sepallength.min()) / (sepallength.max() - sepallength.min())

array([ 0.222222,  0.166667,  0.111111,  0.083333,  0.194444,  0.305556,
        0.083333,  0.194444,  0.027778,  0.166667,  0.305556,  0.138889,
        0.138889,  0.      ,  0.416667,  0.388889,  0.305556,  0.222222,
        0.388889,  0.222222,  0.305556,  0.222222,  0.083333,  0.222222,
        0.138889,  0.194444,  0.194444,  0.25    ,  0.25    ,  0.111111,
        0.138889,  0.305556,  0.25    ,  0.333333,  0.166667,  0.194444,
        0.333333,  0.166667,  0.027778,  0.222222,  0.194444,  0.055556,
        0.027778,  0.194444,  0.222222,  0.138889,  0.222222,  0.083333,
        0.277778,  0.194444,  0.75    ,  0.583333,  0.722222,  0.333333,
        0.611111,  0.388889,  0.555556,  0.166667,  0.638889,  0.25    ,
        0.194444,  0.444444,  0.472222,  0.5     ,  0.361111,  0.666667,
        0.361111,  0.416667,  0.527778,  0.361111,  0.444444,  0.5     ,
        0.555556,  0.5     ,  0.583333,  0.638889,  0.694444,  0.666667,
        0.472222,  0.388889,  0.333333,  0.333333, 

<span style="color: red">30.How to compute the softmax score?</span>

Q. Compute the softmax score of **sepallength**.

```python
def softmax(x):
    """Compute softmax values for each sets of scores in x."""
    e_x = np.exp(x - np.max(x))
    return e_x / e_x.sum(axis=0)
```

In [37]:
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
sepallength = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0])

def softmax(x):
    e_x = np.exp(x - np.max(x))
    return e_x / e_x.sum(axis=0)

softmax(sepallength)

array([ 0.00222 ,  0.001817,  0.001488,  0.001346,  0.002008,  0.002996,
        0.001346,  0.002008,  0.001102,  0.001817,  0.002996,  0.001644,
        0.001644,  0.000997,  0.00447 ,  0.004044,  0.002996,  0.00222 ,
        0.004044,  0.00222 ,  0.002996,  0.00222 ,  0.001346,  0.00222 ,
        0.001644,  0.002008,  0.002008,  0.002453,  0.002453,  0.001488,
        0.001644,  0.002996,  0.002453,  0.003311,  0.001817,  0.002008,
        0.003311,  0.001817,  0.001102,  0.00222 ,  0.002008,  0.001218,
        0.001102,  0.002008,  0.00222 ,  0.001644,  0.00222 ,  0.001346,
        0.002711,  0.002008,  0.01484 ,  0.008144,  0.013428,  0.003311,
        0.009001,  0.004044,  0.007369,  0.001817,  0.009947,  0.002453,
        0.002008,  0.00494 ,  0.005459,  0.006033,  0.003659,  0.010994,
        0.003659,  0.00447 ,  0.006668,  0.003659,  0.00494 ,  0.006033,
        0.007369,  0.006033,  0.008144,  0.009947,  0.01215 ,  0.010994,
        0.005459,  0.004044,  0.003311,  0.003311, 

<span style="color: red">31.How to find the percentile scores of a numpy array?</span>

Q. Find the 5th and 95th percentile of iris's **sepallength**.

如何找到Numpy数组的百分位数？

In [38]:
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
sepallength = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0])

np.percentile(sepallength, [5, 95])

array([ 4.6  ,  7.255])

<span style="color: red">32.How to insert values at random positions in an array?</span>

Q. Insert np.nan values at 20 random positions in iris_2d dataset.

如何在数组的随机位置插入值？

In [39]:
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris_2d = np.genfromtxt(url, delimiter=',', dtype='object')

np.random.seed(100)
iris_2d[np.random.randint(150, size=20), np.random.randint(4, size=20)] = np.nan
iris_2d[:10]

array([[b'5.1', b'3.5', b'1.4', b'0.2', b'Iris-setosa'],
       [b'4.9', b'3.0', b'1.4', b'0.2', b'Iris-setosa'],
       [b'4.7', b'3.2', b'1.3', b'0.2', b'Iris-setosa'],
       [b'4.6', b'3.1', b'1.5', b'0.2', b'Iris-setosa'],
       [b'5.0', b'3.6', b'1.4', b'0.2', b'Iris-setosa'],
       [b'5.4', b'3.9', b'1.7', b'0.4', b'Iris-setosa'],
       [b'4.6', b'3.4', b'1.4', b'0.3', b'Iris-setosa'],
       [b'5.0', b'3.4', b'1.5', b'0.2', b'Iris-setosa'],
       [b'4.4', nan, b'1.4', b'0.2', b'Iris-setosa'],
       [b'4.9', b'3.1', b'1.5', b'0.1', b'Iris-setosa']], dtype=object)

<span style="color: red">33.How to find the position of missing values in numpy array?</span>

Q. Find the number and position of missing values in **iris_2d**'s *sepallength* (1st column)

如何在Numpy数组中找出缺失值的位置？

In [40]:
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris_2d = np.genfromtxt(url, delimiter=',', dtype='float')
iris_2d[np.random.randint(150, size=20), np.random.randint(4, size=20)] = np.nan

print("Number of missing values: \n", np.isnan(iris_2d[:, 0]).sum())
print("Position of missing values: \n", np.where(np.isnan(iris_2d[:, 0])))

Number of missing values: 
 5
Position of missing values: 
 (array([ 38,  80, 106, 113, 121], dtype=int64),)


<span style="color: red">34.How to filter a numpy array based on two or more conditions?</span>

Q. Filter the rows of **iris_2d** that has *petallength (3rd column) > 1.5* and *sepallength (1st column) < 5.0*.

如何基于两个或以上的条件过滤Numpy数组？

In [41]:
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris_2d = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0,1,2,3])

condition = (iris_2d[:, 2] > 1.5) & (iris_2d[:, 0] < 5.0)
iris_2d[condition]

array([[ 4.8,  3.4,  1.6,  0.2],
       [ 4.8,  3.4,  1.9,  0.2],
       [ 4.7,  3.2,  1.6,  0.2],
       [ 4.8,  3.1,  1.6,  0.2],
       [ 4.9,  2.4,  3.3,  1. ],
       [ 4.9,  2.5,  4.5,  1.7]])

<span style="color: red">35.How to drop rows that contain a missing value from a numpy array?</span>

Q. Select the rows of iris_2d that does not have any nan value.

如何在Numpy数组中删除包含缺失值的行？

In [42]:
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris_2d = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0,1,2,3])
iris_2d[np.random.randint(150, size=20), np.random.randint(4, size=20)] = np.nan

any_nan_in_row = np.array([~np.any(np.isnan(row)) for row in iris_2d])
iris_2d[any_nan_in_row][:5]

array([[ 4.9,  3. ,  1.4,  0.2],
       [ 4.7,  3.2,  1.3,  0.2],
       [ 4.6,  3.1,  1.5,  0.2],
       [ 5. ,  3.6,  1.4,  0.2],
       [ 4.6,  3.4,  1.4,  0.3]])

<span style="color: red">36.How to find the correlation between two columns of a numpy array?</span>

Q. Find the correlation between SepalLength(1st column) and PetalLength(3rd column) in iris_2d

如何找出Numpy数组中两列之间的相关性？

In [43]:
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris_2d = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0,1,2,3])

np.corrcoef(iris_2d[:, 0], iris_2d[:, 2])

array([[ 1.      ,  0.871754],
       [ 0.871754,  1.      ]])

In [44]:
from scipy.stats.stats import pearsonr
corr, p_value = pearsonr(iris_2d[:, 0], iris_2d[:, 2])
print(corr)

0.871754157305


<span style="color: red">37.How to find if a given array has any null values?</span>

Q. Find out if iris_2d has any missing values.

如何确定数组是否有空值？

In [45]:
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris_2d = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0,1,2,3])

np.isnan(iris_2d).any()

False

> 没有空值。

<span style="color: red">38.How to replace all missing values with 0 in a numpy array?</span>

Q. Replace all ccurrences of nan with 0 in numpy array

In [46]:
# Input
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris_2d = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0,1,2,3])
iris_2d[np.random.randint(150, size=20), np.random.randint(4, size=20)] = np.nan


iris_2d[np.isnan(iris_2d)] = 0

<span style="color: red">39.How to find the count of unique values in a numpy array?</span>

Q. Find the unique values and the count of unique values in iris's *species*.

如何在Numpy数组中找出唯一值的数量？

In [47]:
# Input
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris = np.genfromtxt(url, delimiter=',', dtype='object')
names = ('sepallength', 'sepalwidth', 'petallength', 'petalwidth', 'species')

species = np.array([row[4] for row in iris])
# species = np.array([row.tolist()[4] for row in iris])
np.unique(species, return_counts=True)

(array([b'Iris-setosa', b'Iris-versicolor', b'Iris-virginica'],
       dtype='|S15'), array([50, 50, 50], dtype=int64))

<span style="color: red">40.How to convert a numeric to a categorical (text) array?</span>

如何将一个数值转换为一个类别（文本）数组？

Q. Bin the petal length (3rd) column of iris_2d to form a text array, such that if petal length is:
+ Less than 3 --> 'small'
+ From 3 to 5 --> 'medium'
+ Greater than 5 --> 'large'

In [48]:
# Input
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris = np.genfromtxt(url, delimiter=',', dtype='object')
names = ('sepallength', 'sepalwidth', 'petallength', 'petalwidth', 'species')

# bin petallength
petal_length_bin = np.digitize(iris[:, 2].astype('float'), [0, 3, 5, 10])
petal_length_bin

array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 3, 2, 2, 2, 2, 2, 3, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 2, 3, 3, 3, 3, 3, 3, 3, 3,
       3, 3, 3, 3, 3, 3, 2, 3, 2, 3, 3, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
       2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3], dtype=int64)

In [49]:
# map it to respective category
label_map = {1: 'small', 2: 'medium', 3: 'large', 4: np.nan}
petal_length_cat = [label_map[x] for x in petal_length_bin]
petal_length_cat[:10]

['small',
 'small',
 'small',
 'small',
 'small',
 'small',
 'small',
 'small',
 'small',
 'small']

<span style="color: red">41.How to create a new column from existing columns of a numpy array?</span>

如何基于Numpy数组现有列创建一个新的列？

Q. Create a new column for volume in iris_2d, where volume is $(pi \times petallength \times sepallength^2)\div3$

In [50]:
# Input
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris_2d = np.genfromtxt(url, delimiter=',', dtype='object')
names = ('sepallength', 'sepalwidth', 'petallength', 'petalwidth', 'species')

petallength = iris_2d[:, 2].astype('float')
sepallength = iris_2d[:, 0].astype('float')
volume = np.pi * petallength * np.power(sepallength, 2) / 3

# add a new dimension to volume
volume[:, np.newaxis]

# add the new column
# new_iris_2d = np.c_[iris_2d, volume]
np.c_[iris_2d, volume][:4]

array([[b'5.1', b'3.5', b'1.4', b'0.2', b'Iris-setosa', 38.13265162927291],
       [b'4.9', b'3.0', b'1.4', b'0.2', b'Iris-setosa', 35.200498485922445],
       [b'4.7', b'3.2', b'1.3', b'0.2', b'Iris-setosa', 30.0723720777127],
       [b'4.6', b'3.1', b'1.5', b'0.2', b'Iris-setosa', 33.238050274980004]], dtype=object)

<span style="color: red">42.How to do probabilistic sampling in numpy?</span>

如何在Numpy中执行概率采样？

Q. Randomly sample **iris**'s **species** such that **setosa** is twice the number of **versicolor** and **virginica**.

In [51]:
# Import iris keeping the text column intact
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris = np.genfromtxt(url, delimiter=',', dtype='object')

species = iris[:, 4]


# Approach 1: Generate Probablistically
np.random.seed(100)
a = np.array(['Iris-setosa', 'Iris-versicolor', 'Iris-virginica'])
species_out = np.random.choice(a, 150, p=[.5, .25, .25])
np.unique(species_out, return_counts=True)

(array(['Iris-setosa', 'Iris-versicolor', 'Iris-virginica'],
       dtype='<U15'), array([77, 37, 36], dtype=int64))

In [52]:
# Approach 2: Probablistic Sampling(perferred)
np.random.seed(100)
probs = np.r_[np.linspace(0, 0.500), np.linspace(0.501, 0.750), np.linspace(.751, 1.0)]
index = np.searchsorted(probs, np.random.random(150))
species_out = species[index]
np.unique(species_out, return_counts=True)

(array([b'Iris-setosa', b'Iris-versicolor', b'Iris-virginica'], dtype=object),
 array([77, 37, 36], dtype=int64))

> Approach 2 is preferred because it creates an index variable that can be used to sample 2d tabular data.

<span style="color: red">43.How to get the second largest value of an array when grouped by another array?</span>

如何在多维数组中找到一维的最大值？

Q. What is the value of second longest **petallength** of species **setosa**.

In [53]:
# Input
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris = np.genfromtxt(url, delimiter=',', dtype='object')
names = ('sepallength', 'sepalwidth', 'petallength', 'petalwidth', 'species')

# Get the species and petal lenget column
iris_setosa = iris[iris[:, 4] == b'Iris-setosa', 2].astype('float')

# np.unique(iris_setosa)
np.unique(iris_setosa)[-2]

1.7

<span style="color: red">44.How to sort a 2D array by a column</span>

如何用给定列将二维数组排序？

Q. Sort the iris dataset based on *sepallength* column.

In [54]:
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris = np.genfromtxt(url, delimiter=',', dtype='object')
names = ('sepallength', 'sepalwidth', 'petallength', 'petalwidth', 'species')

iris[iris[:, 0].argsort()][: 20]

array([[b'4.3', b'3.0', b'1.1', b'0.1', b'Iris-setosa'],
       [b'4.4', b'3.2', b'1.3', b'0.2', b'Iris-setosa'],
       [b'4.4', b'3.0', b'1.3', b'0.2', b'Iris-setosa'],
       [b'4.4', b'2.9', b'1.4', b'0.2', b'Iris-setosa'],
       [b'4.5', b'2.3', b'1.3', b'0.3', b'Iris-setosa'],
       [b'4.6', b'3.6', b'1.0', b'0.2', b'Iris-setosa'],
       [b'4.6', b'3.1', b'1.5', b'0.2', b'Iris-setosa'],
       [b'4.6', b'3.4', b'1.4', b'0.3', b'Iris-setosa'],
       [b'4.6', b'3.2', b'1.4', b'0.2', b'Iris-setosa'],
       [b'4.7', b'3.2', b'1.3', b'0.2', b'Iris-setosa'],
       [b'4.7', b'3.2', b'1.6', b'0.2', b'Iris-setosa'],
       [b'4.8', b'3.0', b'1.4', b'0.1', b'Iris-setosa'],
       [b'4.8', b'3.0', b'1.4', b'0.3', b'Iris-setosa'],
       [b'4.8', b'3.4', b'1.9', b'0.2', b'Iris-setosa'],
       [b'4.8', b'3.4', b'1.6', b'0.2', b'Iris-setosa'],
       [b'4.8', b'3.1', b'1.6', b'0.2', b'Iris-setosa'],
       [b'4.9', b'2.4', b'3.3', b'1.0', b'Iris-versicolor'],
       [b'4.9', b'2.5', b'4

<span style="color: red">45.How to find the most frequent value in a numpy array?</span>

如何在Numpy数组中找到最频繁出现的值？

Q. Find the most frequent value of petal length (3rd column) in iris dataset.

In [55]:
# Input
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris = np.genfromtxt(url, delimiter=',', dtype='object')
names = ('sepallength', 'sepalwidth', 'petallength', 'petalwidth', 'species')

vals, counts = np.unique(iris[:, 3], return_counts=True)
print('Vals:', vals, '\nCounts:', counts)

Vals: [b'0.1' b'0.2' b'0.3' b'0.4' b'0.5' b'0.6' b'1.0' b'1.1' b'1.2' b'1.3'
 b'1.4' b'1.5' b'1.6' b'1.7' b'1.8' b'1.9' b'2.0' b'2.1' b'2.2' b'2.3'
 b'2.4' b'2.5'] 
Counts: [ 6 28  7  7  1  1  7  3  5 13  8 12  4  2 12  5  6  6  3  8  3  3]


In [56]:
vals[np.argmax(counts)]

b'0.2'

<span style="color: red">46.How to find the position of the first occurrence of a value greater than a given value?</span>

如何找到第一个大于给定值的数的位置？

Q. Find the position of the first occurrence of a value greater than 1.0 in petalwidth 4th column of iris dataset.

In [57]:
# Input
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris = np.genfromtxt(url, delimiter=',', dtype='object')
names = ('sepallength', 'sepalwidth', 'petallength', 'petalwidth', 'species')
np.argmax(iris[:, 3].astype('float') > 1.0)

50

<span style="color: red">47.How to replace all values greater than a given value to a given cutoff?</span>

如何将数组中所有大于给定值的数替换为给定的cutoff值？

Q. From the array **a**, replace all values greater than 30 to 30 and less than 10 to 10.

In [58]:
np.random.seed(100)
a = np.random.uniform(1, 50, 20)

np.where(a > 30, 30, np.where(a < 10, 10, a))  # Solution 1

array([ 27.626842,  14.6401  ,  21.801362,  30.      ,  10.      ,
        10.      ,  30.      ,  30.      ,  10.      ,  29.179573,
        30.      ,  11.250904,  10.081083,  10.      ,  11.765177,
        30.      ,  30.      ,  10.      ,  30.      ,  14.429614])

In [59]:
np.clip(a, 10, 30)  # Solution 2

array([ 27.626842,  14.6401  ,  21.801362,  30.      ,  10.      ,
        10.      ,  30.      ,  30.      ,  10.      ,  29.179573,
        30.      ,  11.250904,  10.081083,  10.      ,  11.765177,
        30.      ,  30.      ,  10.      ,  30.      ,  14.429614])

<span style="color: red">48.How to get the positions of top n values from a numpy array?</span>

如何在数组中找到top-n的数值的位置？

Q. Get the positions of top 5 maximum values in a given array **a**.

In [60]:
np.random.seed(100)
a = np.random.uniform(1, 50, 20)

# Solution 1
np.argsort(a)[-5:]

array([18,  7,  3, 10, 15], dtype=int64)

In [61]:
np.argpartition(-a, 5)[:5]  # Solution 2

array([15, 10,  3,  7, 18], dtype=int64)

In [62]:
np.argpartition(a, kth=-5)[-5:]  # Solution 3

array([18,  7,  3, 10, 15], dtype=int64)

In [63]:
np.argpartition(a, kth=16)[-5:]   # Solution 4

array([18,  7,  3, 10, 15], dtype=int64)

In [64]:
np.sort(a)[-5:]  # Method 1

array([ 40.995013,  41.466785,  42.39403 ,  44.674776,  48.952565])

In [65]:
a[a.argsort()][-5:]  # Method 2

array([ 40.995013,  41.466785,  42.39403 ,  44.674776,  48.952565])

> Method 1 and 2 will get you the values.

<span style='color: red'>49.How to compute the row wise counts of all possible values in an array?</span>

如何逐行计算数组中所有值的数量？

Q. Compute the counts of unique values row-wise.

In [66]:
np.random.seed(100)
arr = np.random.randint(1, 11, size=(6, 10))
arr

array([[ 9,  9,  4,  8,  8,  1,  5,  3,  6,  3],
       [ 3,  3,  2,  1,  9,  5,  1, 10,  7,  3],
       [ 5,  2,  6,  4,  5,  5,  4,  8,  2,  2],
       [ 8,  8,  1,  3, 10, 10,  4,  3,  6,  9],
       [ 2,  1,  8,  7,  3,  1,  9,  3,  6,  2],
       [ 9,  2,  6,  5,  3,  9,  4,  6,  1, 10]])

Input:
```python
array([[ 9,  9,  4,  8,  8,  1,  5,  3,  6,  3],
     [ 3,  3,  2,  1,  9,  5,  1, 10,  7,  3],
     [ 5,  2,  6,  4,  5,  5,  4,  8,  2,  2],
     [ 8,  8,  1,  3, 10, 10,  4,  3,  6,  9],
     [ 2,  1,  8,  7,  3,  1,  9,  3,  6,  2],
     [ 9,  2,  6,  5,  3,  9,  4,  6,  1, 10]])
```
Desired Output:
```python
array([[1, 0, 2, 1, 1, 1, 0, 2, 2, 0],
     [2, 1, 3, 0, 1, 0, 1, 0, 1, 1],
     [0, 3, 0, 2, 3, 1, 0, 1, 0, 0],
     [1, 0, 2, 1, 0, 1, 0, 2, 1, 2],
     [2, 2, 2, 0, 0, 1, 1, 1, 1, 0],      
     [1, 1, 1, 1, 1, 2, 0, 0, 2, 1]] 
```

Output contains 10 columns representing numbers from 1 to 10. The values are the counts of the numbers in the respective rows.  
For example, Cell[0, 2] have the value 2, which means, the number 3 occurs exactly 2 times in the 1st row.

输出包含 10 个列，表示从 1 到 10 的数字。这些数值分别代表每一行的计数数量。
例如，Cell(0,2) 中有值 2，这意味着，数字 3 在第一行出现了两次。

In [67]:
def counts_of_all_values_rowwise_1(arr2d):
    # the number of columns of returned array
    column_num = len(np.unique(arr2d))
    xixi = np.zeros((arr2d.shape[0], column_num))
    for i in range(arr2d.shape[0]):
        # Unique values and its counts row wise
        vals, counts = np.unique(arr2d[i], return_counts=True)
        for j in range(column_num):
            if j+1 in vals:
                xixi[i, j] = counts[vals.tolist().index(j+1)]
            else:
                xixi[i, j] = 0
    return xixi.astype('int')

counts_of_all_values_rowwise_1(arr)

array([[1, 0, 2, 1, 1, 1, 0, 2, 2, 0],
       [2, 1, 3, 0, 1, 0, 1, 0, 1, 1],
       [0, 3, 0, 2, 3, 1, 0, 1, 0, 0],
       [1, 0, 2, 1, 0, 1, 0, 2, 1, 2],
       [2, 2, 2, 0, 0, 1, 1, 1, 1, 0],
       [1, 1, 1, 1, 1, 2, 0, 0, 2, 1]])

In [68]:
def counts_of_all_values_rowwise_2(arr2d):
    '''Reference answer.'''
    # Unique values and its counts row wise
    num_counts_array = [np.unique(row, return_counts=True) for row in arr2d]
    # Counts of all values row wise
    return ([[int(b[a == i]) if i in a else 0 for i in np.unique(arr2d)]
             for a, b in num_counts_array])

In [69]:
counts_of_all_values_rowwise_2(arr)

[[1, 0, 2, 1, 1, 1, 0, 2, 2, 0],
 [2, 1, 3, 0, 1, 0, 1, 0, 1, 1],
 [0, 3, 0, 2, 3, 1, 0, 1, 0, 0],
 [1, 0, 2, 1, 0, 1, 0, 2, 1, 2],
 [2, 2, 2, 0, 0, 1, 1, 1, 1, 0],
 [1, 1, 1, 1, 1, 2, 0, 0, 2, 1]]

In [70]:
# Another example
arr2 = np.array([np.array(list('bill clinton')),
                 np.array(list('narendramodi')),
                np.array(list('jjayalalitha'))])
arr2

array([['b', 'i', 'l', 'l', ' ', 'c', 'l', 'i', 'n', 't', 'o', 'n'],
       ['n', 'a', 'r', 'e', 'n', 'd', 'r', 'a', 'm', 'o', 'd', 'i'],
       ['j', 'j', 'a', 'y', 'a', 'l', 'a', 'l', 'i', 't', 'h', 'a']],
      dtype='<U1')

In [71]:
np.unique(arr2)

array([' ', 'a', 'b', 'c', 'd', 'e', 'h', 'i', 'j', 'l', 'm', 'n', 'o',
       'r', 't', 'y'],
      dtype='<U1')

In [72]:
counts_of_all_values_rowwise_2(arr2)

[[1, 0, 1, 1, 0, 0, 0, 2, 0, 3, 0, 2, 1, 0, 1, 0],
 [0, 2, 0, 0, 2, 1, 0, 1, 0, 0, 1, 2, 1, 2, 0, 0],
 [0, 4, 0, 0, 0, 0, 1, 1, 2, 2, 0, 0, 0, 0, 1, 1]]

<span style="color: red">50.How to convert an array of arrays into a flat 1d array?</span>

如何将 array_of_arrays 转换为平面 1 维数组？

Q. Convert **array_of_arrays** into a flat linear 1d array.

In [73]:
# Input
arr1 = np.arange(3)
arr2 = np.arange(3, 7)
arr3 = np.arange(7, 10)

array_of_arrays = np.array([arr1, arr2, arr3])
array_of_arrays

array([array([0, 1, 2]), array([3, 4, 5, 6]), array([7, 8, 9])], dtype=object)

In [74]:
np.ravel(array_of_arrays)

array([array([0, 1, 2]), array([3, 4, 5, 6]), array([7, 8, 9])], dtype=object)

In [75]:
np.array([a for row in array_of_arrays for a in row])

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [76]:
np.concatenate(array_of_arrays)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

<span style="color: red">51.How to generate one-hot encodings for an array in numpy?</span>

如何为 NumPy 数组生成 one-hot 编码？

Q. Compute the one-hot encodings (dummy binary variables for each unique value in the array)

In [77]:
# Input
np.random.seed(101)
arr = np.random.randint(1, 4 ,size=6)

def one_hot_encoding(arr):
    uniqs = np.unique(arr)
    out = np.zeros((arr.shape[0], uniqs.shape[0]))
    for i, k in enumerate(arr):
        out[i, k-1] = 1
    return out

one_hot_encoding(arr)

array([[ 0.,  1.,  0.],
       [ 0.,  0.,  1.],
       [ 0.,  1.,  0.],
       [ 0.,  1.,  0.],
       [ 0.,  1.,  0.],
       [ 1.,  0.,  0.]])

<span style="color: red">52.How to create row numbers grouped by a categorical variable?</span>

如何创建由类别变量分组确定的一维数值？

Q. Create row numbers grouped by a categorical variable. Use the following sample from **iris** **species** as input.

In [78]:
# Input
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
species = np.genfromtxt(url, delimiter=',', dtype='str', usecols=4)
species_small = np.sort(np.random.choice(species, size=20))
species_small

array(['Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa',
       'Iris-setosa', 'Iris-versicolor', 'Iris-versicolor',
       'Iris-versicolor', 'Iris-versicolor', 'Iris-versicolor',
       'Iris-versicolor', 'Iris-virginica', 'Iris-virginica',
       'Iris-virginica', 'Iris-virginica', 'Iris-virginica',
       'Iris-virginica', 'Iris-virginica', 'Iris-virginica',
       'Iris-virginica'],
      dtype='<U15')

In [79]:
print([index for val in np.unique(species_small)
       for index, value in enumerate(species_small[species_small == val])])

[0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 6, 7, 8]


<span style='color: red'>53.How to create groud ids based on a given categorical variable?</span>

如何基于给定的类别变量创建分组 id？

Q. Create group ids based on a given categorical variable. Use the following sample from **iris species** as input.

In [80]:
# Input
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
species = np.genfromtxt(url, delimiter=',', dtype='str', usecols=4)
species_small = np.sort(np.random.choice(species, size=20))
species_small

array(['Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa',
       'Iris-setosa', 'Iris-setosa', 'Iris-versicolor', 'Iris-versicolor',
       'Iris-versicolor', 'Iris-versicolor', 'Iris-versicolor',
       'Iris-versicolor', 'Iris-virginica', 'Iris-virginica',
       'Iris-virginica', 'Iris-virginica', 'Iris-virginica',
       'Iris-virginica', 'Iris-virginica', 'Iris-virginica'],
      dtype='<U15')

In [81]:
vals, counts = np.unique(species_small, return_counts=True)
output = np.repeat([i for i, val in enumerate(vals)], counts)
# output.tolist()
output

array([0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2])

In [82]:
# Reference solution
[np.argwhere(np.unique(species_small) == s).tolist()[0][0]
for val in np.unique(species_small) for s in species_small[species_small == val]]

[0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2]

In [83]:
# Reference solution: For loop version
output = []
uniqs = np.unique(species_small)

for val in uniqs:  # unique values in group
    for s in species_small[species_small == val]:  # each element in group
        groupid = np.argwhere(uniqs == s).tolist()[0][0]  # groupid
        output.append(groupid)
        
print(output)

[0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2]


<span style='color: red'>54.How to rank items in an array using numpy?</span>

如何使用 NumPy 对数组中的项进行排序？

Q. Create the ranks for the given numeric array **a**.

In [84]:
# Input
np.random.seed(10)
a = np.random.randint(20, size=10)
a

array([ 9,  4, 15,  0, 17, 16, 17,  8,  9,  0])

Desired Output:
```python
[4 2 6 0 8 7 9 3 5 1]
```

In [85]:
np.argsort(a)

array([3, 9, 1, 7, 0, 8, 2, 5, 4, 6], dtype=int64)

In [86]:
np.argsort(np.argsort(a))

array([4, 2, 6, 0, 8, 7, 9, 3, 5, 1], dtype=int64)

<span style="color: red">55.How to rank items in a multidimensional array using numpy?</span>

如何使用 NumPy 对多维数组中的项进行排序？

Q. Create a rank array of the same shape as a given numeric array **a**.

In [87]:
# Input
np.random.seed(10)
a = np.random.randint(20, size=(2, 5))
a

array([[ 9,  4, 15,  0, 17],
       [16, 17,  8,  9,  0]])

Desired Output
```python
[[4 2 6 0 8]
[7 9 3 5 1]]
```

In [88]:
print(np.ravel(a).argsort().argsort().reshape(a.shape))

[[4 2 6 0 8]
 [7 9 3 5 1]]


<span style="color: red">56.How to find the maximum value in each row of a numpy array 2d?</span>

如何在 2 维 NumPy 数组中找到每一行的最大值？

Q. Compute the maximum for each row in the given array.

In [89]:
# Input
np.random.seed(100)
a = np.random.randint(1, 10, size=[5, 3])

np.max(a, axis=1)

array([9, 8, 6, 3, 9])

In [90]:
# Reference solution 1
np.amax(a, axis=1)

array([9, 8, 6, 3, 9])

In [91]:
# Reference solution 2
np.apply_along_axis(np.max, arr=a, axis=1)

array([9, 8, 6, 3, 9])

<span style="color: red">57.How to compute the min-by-max for each row for a numpy array 2d?</span>

如何计算 2 维 NumPy 数组每一行的 min-by-max？

Q. Compute the min-by-max for each row for given 2d numpy array.

In [92]:
# Input 
np.random.seed(100)
a = np.random.randint(1, 10, size=(5, 3))
a

array([[9, 9, 4],
       [8, 8, 1],
       [5, 3, 6],
       [3, 3, 3],
       [2, 1, 9]])

In [93]:
np.apply_along_axis(lambda x: np.min(x)/np.max(x), arr=a, axis=1)

array([ 0.444444,  0.125   ,  0.5     ,  1.      ,  0.111111])

<span style="color: red">58.How to find the duplicate records in a numpy array?</span>

如何在 NumPy 数组中找到重复条目？

Q. Find the duplicate entries (2nd occurrence onwards) in the given numpy array and mark them as `True`. First time occurrences should be `False`.

问题：在给定的 NumPy 数组中找到重复条目（从第二次出现开始），并将其标记为 `True`。第一次出现的条目需要标记为 `False`。

In [94]:
# Input
np.random.seed(100)
a = np.random.randint(0, 5, size=10)
a

array([0, 0, 3, 0, 2, 4, 2, 2, 2, 2])

Desired Output:
```python
[False True False True False False True True True True]
```

In [95]:
# Create an all True array
out = np.full(a.shape[0], True)
# Find the index positions of uniques elements
unique_positions = np.unique(a, return_index=True)[1]
# Mark those positions as False
out[unique_positions] = False
out

array([False,  True, False,  True, False, False,  True,  True,  True,  True], dtype=bool)

<span style="color: red">59.How to find the grouped mean in numpy?</span>

如何找到 NumPy 的分组平均值

Q. Find the mean of a numeric column grouped by a categorical column in a 2D numpy array.

问题：在 2 维 NumPy 数组的类别列中找到数值的平均值。

In [96]:
# Input
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris = np.genfromtxt(url, delimiter=',', dtype='object')
names = ('sepallength', 'sepalwidth', 'petallength', 'petalwidth', 'species')

In [97]:
numeric_column = iris[:, 1].astype('float')  # sepal length
grouping_species = iris[:, 4]  # species

In [98]:
# For loop version
output = []
for group_val in np.unique(grouping_species):
    output.append([group_val, numeric_column[grouping_species == group_val].mean()])
output

[[b'Iris-setosa', 3.4180000000000001],
 [b'Iris-versicolor', 2.7700000000000005],
 [b'Iris-virginica', 2.9740000000000002]]

In [99]:
# List comprehension version
[[group_val, numeric_column[grouping_species == group_val].mean()]
 for group_val in np.unique(grouping_species)]

[[b'Iris-setosa', 3.4180000000000001],
 [b'Iris-versicolor', 2.7700000000000005],
 [b'Iris-virginica', 2.9740000000000002]]

<span style="color: red">60.How to convert a PIL image to numpy array?</span>

如何将 PIL 图像转换成 NumPy 数组？

Q. Import the image from the following URL and convert it to a numpy array.

URL = 'https://upload.wikimedia.org/wikipedia/commons/8/8b/Denali_Mt_McKinley.jpg'

In [100]:
from io import BytesIO
import PIL
import requests

# Import image from URL
URL = 'https://upload.wikimedia.org/wikipedia/commons/8/8b/Denali_Mt_McKinley.jpg'
response = requests.get(URL)

# Read it as Image
I = PIL.Image.open(BytesIO(response.content))

# Optionally resize
I = I.resize([150, 150])

# Convert to numpy array
arr = np.asarray(I)

# Optionaly convert it back to an image and show
im = PIL.Image.fromarray(np.uint8(arr))
PIL.Image.Image.show(im)

<span style="color: red">61.How to drop all missing values from a numpy array?</span>

如何删除 NumPy 数组中所有的缺失值？

Q. Drop all `nan` values from a 1D numpy array.

In [101]:
# Input
a = np.array([1, 2, 3, np.nan, 5, 6, 7, np.nan])

Desired Output:
```python
array([1., 2., 3., 4., 5., 6., 7.])
```

In [102]:
np.isnan(a)

array([False, False, False,  True, False, False, False,  True], dtype=bool)

In [103]:
a[~np.isnan(a)]

array([ 1.,  2.,  3.,  5.,  6.,  7.])

<span style='color: red'>62.How to compute the euclidean distance between two arrays?</span>

Q. Compute the euclidean distance between two arrays **a** and **b**.

In [104]:
# Input
a = np.array([1, 2, 3, 4, 5])
b = np.array([4, 5, 6, 7, 8])

np.linalg.norm(a-b, ord=2)

6.7082039324993694

<span style='color:red'>63.How to find all the local maxima (or peaks) in a 1d array?</span>

如何在一个 1 维数组中找到所有的局部极大值（peak）？

Q. Find all the peaks in a 1D numpy array **a**. Peaks are points surrounded by smaller values on both sides.

In [105]:
# Input
a = np.array([1, 3, 7, 1, 2, 6, 0, 1])

Desired Output:
```python
array([2, 5])
```
where, 2 and 5 are the positions of peak values 7 an 6.

In [106]:
# Solution
doublediff = np.diff(np.sign(np.diff(a)))
peak_locations = np.where(doublediff == -2)[0] + 1
peak_locations

array([2, 5], dtype=int64)

<span style="color: red">64.How to subtract a 1d array from a 2d array, where each item of 1d array subtracts from respective row?</span>

如何从 2 维数组中减去 1 维数组，从 2 维数组的每一行分别减去 1 维数组的每一项？

Q. Subtract the 1d array **b_1d** from the 2d array **a_2d**, such that each item of **b_1d** subtracts from respective row of **a_2d**.

In [107]:
# Input
a_2d = np.array([[3, 3, 3], [4, 4, 4], [5, 5, 5]])
b_1d = np.array([1, 2, 3])

Desired Output:
```python
[[2 2 2]
 [2 2 2]
 [2 2 2]]
```

In [108]:
a_2d - b_1d[:, np.newaxis]

array([[2, 2, 2],
       [2, 2, 2],
       [2, 2, 2]])

In [109]:
# Reference solution
a_2d - b_1d[:, None]

array([[2, 2, 2],
       [2, 2, 2],
       [2, 2, 2]])

In [110]:
b_1d[:, np.newaxis]

array([[1],
       [2],
       [3]])

In [111]:
b_1d[:, None]

array([[1],
       [2],
       [3]])

<span style="color: red">65.How to find the index of n'th repetition of an item in an array?</span>

如何在数组中找出某个项的第 n 个重复索引？

Q. Find the index of 5th repetition of number 1 in **x**.

In [112]:
# Input
x = np.array([1, 2, 1, 1, 3, 4, 3, 1, 1, 2, 1, 1, 2])

In [113]:
# Solution 1: List comprehension
n = 5
[index for index, value in enumerate(x) if value == 1]

[0, 2, 3, 7, 8, 10, 11]

In [114]:
[index for index, value in enumerate(x) if value == 1][n-1]

8

In [115]:
# Solution 2: Numpu version
np.where(x == 1)[0][n-1]

8

<span style="color: red">66.How to convert numpy's datetime64 object to datetime's datetime object?</span>

如何将 NumPy 的 datetime64 对象（object）转换为 datetime 的 datetime 对象？

Q. Convert numpy's `datetime64` object to datetime's `datetime` object.

In [116]:
# Input: a numpy datetime64 object
dt64 = np.datetime64('2018-03-08 15:21:15')
dt64

numpy.datetime64('2018-03-08T15:21:15')

In [117]:
from datetime import datetime

In [118]:
dt64.astype(datetime)

datetime.datetime(2018, 3, 8, 15, 21, 15)

In [119]:
# Solution 2
dt64.tolist()

datetime.datetime(2018, 3, 8, 15, 21, 15)

<span style='color: red'>67.How to compute the moving average of a numpy array?</span>

如何计算 NumPy 数组的移动平均数？

Q. Compute the moving average of window size 3, for the given 1D array.

给定 1 维数组，计算 window size 为 3 的移动平均数。

In [120]:
# Input
np.random.seed(100)
Z = np.random.randint(10, size=10)
Z

array([8, 8, 3, 7, 7, 0, 4, 2, 5, 2])

In [121]:
# Solution 1
def moving_average(a, n=3):
    ret = np.cumsum(a, dtype=float)
    ret[n:] = ret[n:] - ret[:-n]
    return ret[n-1:] / n

moving_average(Z).round(2)

array([ 6.33,  6.  ,  5.67,  4.67,  3.67,  2.  ,  3.67,  3.  ])

In [122]:
# Solution 2
np.convolve(Z, np.ones(3)/3, mode='valid').round(2)

array([ 6.33,  6.  ,  5.67,  4.67,  3.67,  2.  ,  3.67,  3.  ])

> `np.ones(3)/3` gives equal weights. Use `np.ones(4)/4` for window size 4.

<span style="color: red">68.How to create a numpy array sequence given only the starting point, length and the step?</span>

给定起始数字、length 和步长，如何创建一个 NumPy 数组序列？

Q. Create a numpy array of length 10, starting from 5 and has a step of 3 between consecutive numbers.

问题：从 5 开始，创建一个 length 为 10 的 NumPy 数组，相邻数字的差是 3。

In [123]:
np.arange(5, 5+3*10, 3)

array([ 5,  8, 11, 14, 17, 20, 23, 26, 29, 32])

In [124]:
def seq(start, length, step):
    end = start + length * step
    return np.arange(start, end, step)

seq(5, 10, 3)

array([ 5,  8, 11, 14, 17, 20, 23, 26, 29, 32])

<span style="color: red">69.How to fill in missing dates in an irregular series of numpy dates?</span>

如何在不规则 NumPy 日期序列中填充缺失日期？

Q. Given an array of a non-continuous sequence of dates. Make it a continuous sequence of dates, by filling in the missing dates.

问题：给定一个非连续日期序列的数组，通过填充缺失的日期，使其变成连续的日期序列。

In [125]:
# Input
dates = np.arange(np.datetime64('2018-02-01'), np.datetime64('2018-02-25'), 2)
dates

array(['2018-02-01', '2018-02-03', '2018-02-05', '2018-02-07',
       '2018-02-09', '2018-02-11', '2018-02-13', '2018-02-15',
       '2018-02-17', '2018-02-19', '2018-02-21', '2018-02-23'], dtype='datetime64[D]')

In [126]:
# Solution
filled_in = np.array([np.arange(date, (date+d))
                      for date, d in zip(dates, np.diff(dates))]).reshape(-1)
filled_in

array(['2018-02-01', '2018-02-02', '2018-02-03', '2018-02-04',
       '2018-02-05', '2018-02-06', '2018-02-07', '2018-02-08',
       '2018-02-09', '2018-02-10', '2018-02-11', '2018-02-12',
       '2018-02-13', '2018-02-14', '2018-02-15', '2018-02-16',
       '2018-02-17', '2018-02-18', '2018-02-19', '2018-02-20',
       '2018-02-21', '2018-02-22'], dtype='datetime64[D]')

In [127]:
# Solution: For loop version
out = []
for date, d in zip(dates, np.diff(dates)):
    out.append(np.arange(date, date+d))
    
filled_in = np.array(out).reshape(-1)

# add the last day
output = np.hstack([filled_in, dates[-1]])
output

array(['2018-02-01', '2018-02-02', '2018-02-03', '2018-02-04',
       '2018-02-05', '2018-02-06', '2018-02-07', '2018-02-08',
       '2018-02-09', '2018-02-10', '2018-02-11', '2018-02-12',
       '2018-02-13', '2018-02-14', '2018-02-15', '2018-02-16',
       '2018-02-17', '2018-02-18', '2018-02-19', '2018-02-20',
       '2018-02-21', '2018-02-22', '2018-02-23'], dtype='datetime64[D]')

<span style="color: red">70.How to create strides from a given 1D array?</span>

如何基于给定的 1 维数组创建 strides？

Q. From the given 1d array **arr**, generate a 2d matrix using strides, with a window length of 4 and strides of 2, like [[0,1,2,3], [2,3,4,5], [4,5,6,7]..].

问题：给定 1 维数组 arr，使用 strides 生成一个 2 维矩阵，其中 window length 等于 4，strides 等于 2，例如 [[0,1,2,3], [2,3,4,5], [4,5,6,7]..]。

In [128]:
# Input
arr = np.arange(15)
arr

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

In [129]:
# Solution
def gen_strides(arr, stride_len=5, window_len=5):
    n_strides = ((arr.size-window_len)//stride_len) + 1
    # return np.array([a[s:(s+window_len)]
    # for s in np.arange(0, a.size, stride_len)[:n_strides]])
    return np.array([arr[s:(s+window_len)] for s in np.arange(0, n_strides*stride_len, stride_len)])

print(gen_strides(arr, stride_len=2, window_len=4))

[[ 0  1  2  3]
 [ 2  3  4  5]
 [ 4  5  6  7]
 [ 6  7  8  9]
 [ 8  9 10 11]
 [10 11 12 13]]
