First we load the iris data set:

In [1]:
from sklearn.datasets import load_iris
iris_data = load_iris()

Next we extract the data matrix from the iris data.

In [2]:
X = iris_data['data']

We can see the number of rows and columns in this data matrix by viewing X's "shape" attribute:

In [3]:
X.shape

(150, 4)

We can also learn more about the data. The iris data set has 150 data instances (rows) and 4 attributes (columns), as well as a label associated with each data instance that tells us what class that instance is in. We can see the array of class labels, with one label per row in X, by accessing the 'target' of the iris data:

In [4]:
iris_data['target']

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])

This tells us that there are 3 classes, with labels 0, 1, and 2, with the first 50 rows in X corresponding to the first class (with label 0), the next 50 rows in X corresponding to the 2nd class (with label 1), and the last 50 rows corresponding to the 3rd class (with label 2).

However, we're not going to be using class labels. This is just an FYI. We are only interested in using Local Outlier Factor to predict which instances are anomalies in the data set X. 

In [6]:
from sklearn.neighbors import LocalOutlierFactor
lof = LocalOutlierFactor(n_neighbors = 2)
anomaly_predictions = lof.fit_predict(X)
print(anomaly_predictions)

[ 1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1 -1  1 -1  1
 -1  1  1  1  1  1  1 -1  1  1  1 -1  1  1  1  1  1 -1  1  1  1  1  1  1
  1  1  1  1  1  1  1  1  1  1  1 -1  1 -1 -1  1  1  1  1  1  1  1  1  1
  1  1  1  1  1  1  1 -1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1
  1  1  1  1  1  1  1  1  1  1 -1  1 -1 -1  1  1  1  1 -1  1  1  1  1  1
  1  1  1  1  1  1  1  1  1  1  1  1  1  1 -1  1  1  1  1  1  1  1  1  1
  1  1  1  1  1  1]


The output above tells us that the first row (first data instance) in X is not an anomaly, neither is the second, etc... We see a -1 in the 21st  position, which tells us the 21st row in X (with index 20) is predicted to be an outlier by LOF. In other words, 1's represent "inliers" and -1's represent outliers.