# 101 Numpy Exercises for Data Analysis (Python)
source: Machine Leanring +

**1. Import numpy as np and see the version**

In [1]:
import numpy as np

print(np.__version__)

2.1.3


**2. How to create a 1D array?**

In [3]:
arr = np.array([0,1,2,3,4,5,6,7,8,9])

# arr = np.arange(10) (alternate answer)

arr

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

**3. How to create a boolean array?**

In [6]:
arr = np.array([[True, True, True], [True, True, True],[True, True, True]])

# commnet: need to include dtype=bool in the array

arr

array([[ True,  True,  True],
       [ True,  True,  True],
       [ True,  True,  True]])

**4. How to extract items that satisfy a given condition from 1D array?**

In [8]:
# Example: Extract all odd numbers from `arr`

arr = np.arange(10)

arr[arr % 2 == 1] # "modulo is 1" => the number is odd

array([1, 3, 5, 7, 9])

**5. How to replace items that satisfy a condition with another value in numpy array?**

In [12]:
# Example: Replace all odd numbers in `arr` with -1

arr = np.arange(10)

np.where((arr % 2 == 1), arr*-1, arr)

# alternate answer: arr[arr % 2 == 1] = -1

array([ 0, -1,  2, -3,  4, -5,  6, -7,  8, -9])

**6. How to replace items that satisfy a condition without affecting the original array?**

Q: Replace all odd numbers in `arr` with -1 without changing arr


In [30]:


arr = np.arange(10)
out = arr.copy()

np.where((out % 2 == 1), -1, out) # np.where it self makes a COPY!!

# alternate answer: 
# arr = np.arange(10)
# out = np.where(arr % 2 == 1, -1, arr)

# print(arr)
# out

array([ 0, -1,  2, -1,  4, -1,  6, -1,  8, -1])

**7. How to reshape an array?**
Q: Convert a 1D array to a 2D array with 2 rows

In [29]:


arr = np.arange(10)

arr.reshape(2,5)

# alternate answer:
# arr.reshape(2, -1) setting -1 automatically decides the number of cols

array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])

**8. How to stack two arrays vertically?**

Q: Stack arrays `a` and `b` vertically

In [28]:

a = np.arange(10).reshape(2, -1)
b = np.repeat(1, 10).reshape(2, -1)

np.vstack([a,b])

# alternate answers:
# method 1:
np.concatenate([a,b], axis=0)

#method 2
np.r_[a,b]

array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9],
       [1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1]])

**9. How to stack two arrays horizontally?**

Q: Stack the arrays `a` and `b` horizontally

In [34]:
a = np.arange(10).reshape(2, -1)
b = np.repeat(1, 10).reshape(2, -1)

# method 1
np.hstack([a,b])

# method 2

np.concatenate([a,b], axis=1)

# alternate answer:

# methods 3
np.c_[a,b]

array([[0, 1, 2, 3, 4, 1, 1, 1, 1, 1],
       [5, 6, 7, 8, 9, 1, 1, 1, 1, 1]])

**10. How to generate custom sequences in numpy without hardcoding?**

Q: Create the following pattern without hardcoding. Use only numpy functions and the below input array `a`.

In [37]:
a = np.array([1,2,3])

repeat = np.repeat(a, 3)
tile = np.tile(a, 3)

np.hstack([repeat, tile])

array([1, 1, 1, 2, 2, 2, 3, 3, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3])

**11. How to get the common items between two python numpy arrays?**

Q: Get the common items between `a` and `b`

In [38]:
a = np.array([1,2,3,2,3,4,3,4,5,6])
b = np.array([7,2,10,2,7,4,9,4,9,8])

np.intersect1d(a,b)

array([2, 4])

**12. How to remove from one array those items that exist in another?**

Q: From array `a` remove all items present in array `b`

In [43]:
a = np.array([1,2,3,4,5])
b = np.array([5,6,7,8,9])

a[~np.isin(a,b)]

# alternate answer:
np.setdiff1d(a,b)

array([1, 2, 3, 4])

**13. How to get the positions where elements of two arrays match?**

Q: Get the positions where elements of `a` and `b` match

In [48]:
a = np.array([1,2,3,2,3,4,3,4,5,6])
b = np.array([7,2,10,2,7,4,9,4,9,8])

np.argwhere(a==b).flatten()

# alternate answer:
np.where(a==b)

(array([1, 3, 5, 7]),)

**14. How to extract all numbers between a given range from a numpy array?**

Q: Get all items between 5 and 10 from `a`.

In [53]:
a = np.array([2,6,1,9,10,3,27])

a[np.where((a >= 5) & (a <= 10))]

array([ 6,  9, 10])

**15. How to make a python function that handles scalars to work on numpy arrays?**

Q: Convert the function `maxx` that works on two scalars, to work on two arrays.

In [60]:
def maxx(x, y):
    """Get the maximum of two items"""

    if x >= y:
        return x
    else:
        return y
    

pair_max = np.vectorize(maxx, otypes=[float])

a = np.array([5,7,9,8,6,4,5])
b = np.array([6,3,4,8,9,7,1])

pair_max(a,b)

array([6., 7., 9., 8., 9., 7., 5.])

**16. How to swap two columns in a 2d numpyt array?**

Q: Swap columns 1 and 2 in the array `arr`.

In [64]:
arr = np.arange(9).reshape(3,3)

# Answer:
arr[:, [1,0,2]]

array([[1, 0, 2],
       [4, 3, 5],
       [7, 6, 8]])

**17. How to swap two rows in a 2d numpy array?**

Q: Swap rows 1 and 2 in the array `arr`

In [65]:
arr = np.arange(9).reshape(3,3)

arr[[1,0,2],:]

array([[3, 4, 5],
       [0, 1, 2],
       [6, 7, 8]])

**18. How to reverse the rows of a 2D array?**

Q: Reverse the rows of a 2D array `arr`

In [69]:
arr = np.arange(9).reshape(3,3)

arr[::-1, :] # can just be arr[::-1]

array([[6, 7, 8],
       [3, 4, 5],
       [0, 1, 2]])

**19. How to reverse the columns of a 2D array?**

Q: Reverse the columns of a 2D array `arr`

In [70]:
arr = np.arange(9).reshape(3,3)

arr[:, ::-1]

array([[2, 1, 0],
       [5, 4, 3],
       [8, 7, 6]])

**20. How to create a 2D array containing random floats between 5 and 10?**

Q: Create a 2D array of shape 5x3 to contain random decimal numbers between 5 and 10.

In [85]:
arr = np.random.uniform(5, 10, (5,3))

arr

array([[9.53 , 7.474, 6.087],
       [6.954, 5.219, 6.285],
       [6.059, 8.571, 5.807],
       [8.22 , 8.408, 5.566],
       [9.029, 9.378, 6.856]])

**21. How to print only 3 decimal places in python numpy array?**

Q: Print or show only 3 decimal places of the numpy array `rand_arr`

In [75]:
rand_arr = np.random.random((5,3))

rand_arr.round(3)


# Answer:

rand_arr = np.random.random((5,3))

rand_arr = np.random.random([5,3])

np.set_printoptions(precision=3)
rand_arr[:4]

array([[0.976, 0.653, 0.003],
       [0.168, 0.526, 0.863],
       [0.112, 0.713, 0.25 ],
       [0.624, 0.812, 0.438]])

**22. How to pretty print a numpy array by suppressing the scientific notation (like 1e10)?**

Q: Pretty print `rand_arr` by suppressing the scientific notation (like 1e10)

In [90]:
# Answer:
np.set_printoptions(suppress=True, precision=6)

np.random.seed(100)
rand_arr = np.random.random([3,3])/1e3

rand_arr

array([[0.000543, 0.000278, 0.000425],
       [0.000845, 0.000005, 0.000122],
       [0.000671, 0.000826, 0.000137]])

**23. How to limit the number of items printed in output of numpy array?**

Q: Limit the number of items printed in python numpy array `a` to a maximum of 6 elements.

In [92]:
# Answer:
np.set_printoptions(threshold=6)

a = np.arange(15)

a

array([ 0,  1,  2, ..., 12, 13, 14])

**24. How to print the full numpy array without truncating**

Q: Print the full numpy array `a` without truncating

In [96]:
np.set_printoptions(threshold=np.inf)
a = np.arange(15)

a

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

**25. How to import a dataset with numbers and texts keeping the text intact in python numpy?**

Q: Import the iris dataset keeping the text intact

In [97]:
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris = np.genfromtxt(url, delimiter=',', dtype='object')

iris[:3]

array([[b'5.1', b'3.5', b'1.4', b'0.2', b'Iris-setosa'],
       [b'4.9', b'3.0', b'1.4', b'0.2', b'Iris-setosa'],
       [b'4.7', b'3.2', b'1.3', b'0.2', b'Iris-setosa']], dtype=object)

**26. How to extrat a particular column from 1D array of tuples?**

Q: Extract the text column `species` from the 1D `iris` imported in previous quetion.

In [108]:
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris_1d = np.genfromtxt(url, delimiter=',', dtype=None)

# Answer:
species = np.array([row[4] for row in iris_1d]) # you can use list comprehension

species[:5]

array(['Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa',
       'Iris-setosa'], dtype='<U15')

**27. How to convert a 1d array of tuples to a 2d numpy array?**

Q: Convert the 1D `iris` to 2D array `iris_2d` by omitting the `species` text field.

In [107]:
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris_1d = np.genfromtxt(url, delimiter=',', dtype=None)

# Answer:

iris_2d = np.array([row.tolist()[:4] for row in iris_1d])

iris_2d[:4]

array([[5.1, 3.5, 1.4, 0.2],
       [4.9, 3. , 1.4, 0.2],
       [4.7, 3.2, 1.3, 0.2],
       [4.6, 3.1, 1.5, 0.2]])

**28. How to compute the mean, median, standard deviation of a numpy array?**

Q: Find the mean, median, standard deviation of iris's `sepallength` (1st column)

In [111]:
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris = np.genfromtxt(url, delimiter=',', dtype='object')

# Answer:

sepallength = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0]) # changing the dtype to float and use usecols to extract the first column

print(np.mean(sepallength), np.median(sepallength), np.std(sepallength))

5.843333333333334 5.8 0.8253012917851409


**29. How to normalize an array so the values range exactly between 0 and 1?**

Q: Create a normalized form of `iris`'s `sepallength` whose values range exactly between 0 and 1 so that the minimum has value 0 and maximum has value 1.

In [113]:
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
sepallength = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0])

# WRONG!!!!!!!!
(sepallength-np.mean(sepallength))/np.std(sepallength)

# Answer:

Smax, Smin = sepallength.max(), sepallength.min()
S = (sepallength - Smin) / (Smax - Smin)

S

array([0.222222, 0.166667, 0.111111, 0.083333, 0.194444, 0.305556,
       0.083333, 0.194444, 0.027778, 0.166667, 0.305556, 0.138889,
       0.138889, 0.      , 0.416667, 0.388889, 0.305556, 0.222222,
       0.388889, 0.222222, 0.305556, 0.222222, 0.083333, 0.222222,
       0.138889, 0.194444, 0.194444, 0.25    , 0.25    , 0.111111,
       0.138889, 0.305556, 0.25    , 0.333333, 0.166667, 0.194444,
       0.333333, 0.166667, 0.027778, 0.222222, 0.194444, 0.055556,
       0.027778, 0.194444, 0.222222, 0.138889, 0.222222, 0.083333,
       0.277778, 0.194444, 0.75    , 0.583333, 0.722222, 0.333333,
       0.611111, 0.388889, 0.555556, 0.166667, 0.638889, 0.25    ,
       0.194444, 0.444444, 0.472222, 0.5     , 0.361111, 0.666667,
       0.361111, 0.416667, 0.527778, 0.361111, 0.444444, 0.5     ,
       0.555556, 0.5     , 0.583333, 0.638889, 0.694444, 0.666667,
       0.472222, 0.388889, 0.333333, 0.333333, 0.416667, 0.472222,
       0.305556, 0.472222, 0.666667, 0.555556, 0.361111, 0.333

**30. How to compute the softmax score?**

Q: Compute the softmax score of `sepallength`

In [114]:
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
sepallength = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0])

# Answer:

def softmax(x):
    """Compute softmax values for each sets of scores in x."""

    e_x = np.exp(x - np.max(x))
    return e_x / e_x.sum(axis=0)

print(softmax(sepallength))

[0.00222  0.001817 0.001488 0.001346 0.002008 0.002996 0.001346 0.002008
 0.001102 0.001817 0.002996 0.001644 0.001644 0.000997 0.00447  0.004044
 0.002996 0.00222  0.004044 0.00222  0.002996 0.00222  0.001346 0.00222
 0.001644 0.002008 0.002008 0.002453 0.002453 0.001488 0.001644 0.002996
 0.002453 0.003311 0.001817 0.002008 0.003311 0.001817 0.001102 0.00222
 0.002008 0.001218 0.001102 0.002008 0.00222  0.001644 0.00222  0.001346
 0.002711 0.002008 0.01484  0.008144 0.013428 0.003311 0.009001 0.004044
 0.007369 0.001817 0.009947 0.002453 0.002008 0.00494  0.005459 0.006033
 0.003659 0.010994 0.003659 0.00447  0.006668 0.003659 0.00494  0.006033
 0.007369 0.006033 0.008144 0.009947 0.01215  0.010994 0.005459 0.004044
 0.003311 0.003311 0.00447  0.005459 0.002996 0.005459 0.010994 0.007369
 0.003659 0.003311 0.003311 0.006033 0.00447  0.002008 0.003659 0.004044
 0.004044 0.006668 0.00222  0.004044 0.007369 0.00447  0.016401 0.007369
 0.009001 0.02704  0.001817 0.020032 0.010994 0.01812

**31. How to find the percentile scores of a numpy array?**

Q: Find the 5th and 95th percentile of iris's `sepallength`

In [None]:
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
sepallength = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0])

print(np.percentile(sepallength, 0.05), np.percentile(sepallength,0.95))

# np.percentile(sepallength, q=[5,95]) # why does it have to be this!!!!!??!!!!?!!??!/?!@?!?!?!?!??!?!!?!?!

4.30745 4.4


**32. How to insert values at random positions in an array?**

Q: Insert `np.nan` values at 20 random positions in `iris_2d` dataset

In [121]:
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris_2d = np.genfromtxt(url, delimiter=',', dtype='object')

pos1 = np.random.randint(0,len(iris_2d), 20)
pos2 = np.random.randint(0, 5, 20)

iris_2d[pos1, pos2] = np.nan

iris_2d

array([[b'5.1', b'3.5', b'1.4', b'0.2', b'Iris-setosa'],
       [b'4.9', b'3.0', b'1.4', b'0.2', b'Iris-setosa'],
       [nan, b'3.2', b'1.3', b'0.2', b'Iris-setosa'],
       [b'4.6', b'3.1', b'1.5', b'0.2', b'Iris-setosa'],
       [b'5.0', b'3.6', b'1.4', b'0.2', b'Iris-setosa'],
       [b'5.4', b'3.9', b'1.7', b'0.4', b'Iris-setosa'],
       [b'4.6', b'3.4', b'1.4', b'0.3', b'Iris-setosa'],
       [b'5.0', b'3.4', b'1.5', b'0.2', b'Iris-setosa'],
       [b'4.4', b'2.9', b'1.4', b'0.2', b'Iris-setosa'],
       [b'4.9', b'3.1', b'1.5', b'0.1', b'Iris-setosa'],
       [b'5.4', b'3.7', b'1.5', b'0.2', b'Iris-setosa'],
       [b'4.8', b'3.4', b'1.6', b'0.2', b'Iris-setosa'],
       [b'4.8', b'3.0', b'1.4', b'0.1', b'Iris-setosa'],
       [b'4.3', nan, b'1.1', b'0.1', b'Iris-setosa'],
       [b'5.8', b'4.0', b'1.2', b'0.2', nan],
       [b'5.7', b'4.4', b'1.5', b'0.4', b'Iris-setosa'],
       [b'5.4', b'3.9', b'1.3', b'0.4', b'Iris-setosa'],
       [b'5.1', b'3.5', b'1.4', b'0.3', nan],
  

**34. How to filter a numpy array based on two or more conditions?**

Q: Filter the rows of `iris_2d` that has `petallength (3rd column) > 1.5` and `sepallength (1st column) < 5.0`>

In [123]:
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris_2d = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0,1,2,3])

iris_2d[(iris_2d[:,2] > 1.5) & (iris_2d[:,0] < 5.0)]

array([[4.8, 3.4, 1.6, 0.2],
       [4.8, 3.4, 1.9, 0.2],
       [4.7, 3.2, 1.6, 0.2],
       [4.8, 3.1, 1.6, 0.2],
       [4.9, 2.4, 3.3, 1. ],
       [4.9, 2.5, 4.5, 1.7]])

**35. How to drop rows that contain a missing value from a numpy array?**

Q: Select the rows of iris_2d that does not have any `nan` value.

In [129]:
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris_2d = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0,1,2,3])
iris_2d[np.random.randint(150, size=20), np.random.randint(4, size=20)] = np.nan

# Answer:

any_nan_in_row = np.array([~np.any(np.isnan(row)) for row in iris_2d])
iris_2d[any_nan_in_row][:5]

array([[5.1, 3.5, 1.4, 0.2],
       [4.9, 3. , 1.4, 0.2],
       [4.6, 3.1, 1.5, 0.2],
       [5. , 3.6, 1.4, 0.2],
       [5.4, 3.9, 1.7, 0.4]])