In this notebook, we will take a look at NumPy's Set Routines.

In [1]:
import numpy as np

Let's start by defining some sample lists.

In [2]:
array_1 = np.array([['New York', 'New Mexico'], ['New Jersey', 'New Brunswick']])
array_1

array([['New York', 'New Mexico'],
       ['New Jersey', 'New Brunswick']],
      dtype='<U13')

In [3]:
list_2 = ['Alabama', 'Alaska', 'Arizona', 'Arkansas',
               'California', 'Colorado', 'Connecticut', 'Delaware',
               'Florida', 'Georgia', 'Hawaii', 'Idaho',
               'Illinois', 'Indiana', 'Iowa', 'Kansas',
               'Kentucky', 'Louisiana', 'Maine', 'Maryland',
               'Massachusetts', 'Michigan', 'Minnesota',
               'Mississippi', 'Missouri', 'Montana', 'Nebraska',
               'Nevada', 'New Hampshire', 'New Jersey',
               'New Mexico', 'New York', 'North Carolina',
               'North Dakota', 'Ohio', 'Oklahoma',
               'Oregon', 'Pennsylvania', 'Rhode Island',
               'South Carolina', 'South Dakota', 'Tennessee',
               'Texas', 'Utah', 'Vermont', 'Virginia',
               'Washington', 'West Virginia', 'Wisconsin', 'Wyoming']
array_2 = np.array(list_2).reshape(10, -1)
array_2

array([['Alabama', 'Alaska', 'Arizona', 'Arkansas', 'California'],
       ['Colorado', 'Connecticut', 'Delaware', 'Florida', 'Georgia'],
       ['Hawaii', 'Idaho', 'Illinois', 'Indiana', 'Iowa'],
       ['Kansas', 'Kentucky', 'Louisiana', 'Maine', 'Maryland'],
       ['Massachusetts', 'Michigan', 'Minnesota', 'Mississippi', 'Missouri'],
       ['Montana', 'Nebraska', 'Nevada', 'New Hampshire', 'New Jersey'],
       ['New Mexico', 'New York', 'North Carolina', 'North Dakota', 'Ohio'],
       ['Oklahoma', 'Oregon', 'Pennsylvania', 'Rhode Island',
        'South Carolina'],
       ['South Dakota', 'Tennessee', 'Texas', 'Utah', 'Vermont'],
       ['Virginia', 'Washington', 'West Virginia', 'Wisconsin', 'Wyoming']],
      dtype='<U14')

In [4]:
array_1.shape, array_2.shape

((2, 2), (10, 5))

Let's see which elements of array 1 are present in array 2. This is accomplished by using NumPy's **isin(ar1, ar2)** method. This returns a boolean array of the same shape as *ar1* with a boolean *True* if the element of *ar1* is in *ar2*, and boolean *False* otherwise.

In [5]:
mask = np.isin(array_1, array_2, assume_unique=True)
array_1[mask]

array(['New York', 'New Mexico', 'New Jersey'],
      dtype='<U13')

Invoking the same method as above with the argument **invert=True** will return an array of values in *ar1* that are not present in *ar2.*

In [6]:
mask = np.isin(array_1, array_2, assume_unique=True, invert=True)
array_1[mask]

array(['New Brunswick'],
      dtype='<U13')

The same output as in the preceding case can also be obtained using NumPy's **setdiff1d(ar1, ar2)* method. This method returns the sorted, unique values in *ar1* that are not in *ar2.*

In [7]:
np.setdiff1d(array_1, array_2, assume_unique=False)

array(['New Brunswick'],
      dtype='<U13')

To find the common elements between two arrays, we can use the **intersect1d(ar1, ar2)** method. This returns an array of values that are present in both input arrays.

In [8]:
np.intersect1d(array_1, array_2)

array(['New Jersey', 'New Mexico', 'New York'],
      dtype='<U14')