set: a collection of distinct objects, where each element is unique
    - uniqueness: no elements appear more than once
    - unordered: order of elements is not imposed
    - operations: union, intersection, difference, cartesian products, disjoint union

set: a well-defined collection of distinct objects that share or have certain properties that satisfies the set definition
    - uniquness: each element appears only once (duplicte or repeated elements are still considered the same)
    - unordered: no inherent order or arrangement of elements
    - operations: union, intersection, difference, disjoint union, product
    

In [6]:
from sklearn.datasets import load_iris
import numpy as np
# load iris data set
iris = load_iris() # returns an object called Bunch with attributes
data = iris.data # a matrix or data from the iris bunch with a numpy array that contains numerical values with 4 columns (sepal length, sepal width, petal length, petal width)
target = iris.target # (0, 1, 2) labels that tells which species each flower sample belongs to with each class respectively representing one of the three (01, 1, 2)

# data: a feature matrix with shape (150,4) has 150 samples and 4 features
indicies = np.arange(len(data)) # creates a range of integers from 0 to 149 (length of the data samples)

# target: numpy array containing class labels for iris dataset (values : 0, 1, 2)
# target == x: elementwise comparision which returns a new boolean array of the same shape with each element being True if equals to x, false otherwise
# np.where returns a tuple of arrays where each array contains the indices along one dimension where the condition is True
# np.where()[0] extracts the first array of indices and is extracted and passed into inbuilt python's set object
set_A = set(np.where(target==0)[0])


# data[:, 0]: extract all rows from the first column that gives out 1d array 
# np.median : computes the median values of the array it receives
median_sepal  = np.median(data[:, 0])

# indicies where the frst feature exceeds median
set_B = set(np.where(data[:, 0] > median_sepal)[0])

In [None]:
# set operations on data
np_union = np.union1d(np.array(list(set_A)), np.array(list(set_B)))
np_intersection = np.intersect1d(np.array(list(set_A)), np.array(list(set_B)))
np_difference = np.setdiff1d(np.array(list(set_A)), np.array(list(set_B)))



[  0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17
  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35
  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53
  54  55  56  57  58  59  60  61  62  63  64  65  66  67  68  69  70  71
  72  73  74  75  76  77  78  79  80  81  82  83  84  85  86  87  88  89
  90  91  92  93  94  95  96  97  98  99 100 101 102 103 104 105 106 107
 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125
 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143
 144 145 146 147 148 149]
