Iris Dataset
Created 7/22/22

In [2]:
from sklearn.datasets import load_iris
iris = load_iris() # since we are loading the data in from sklearn, we will assume it is clean

In [4]:
X = iris.data # we use capital X and lowercase y as industry standard
y = iris.target # we want to create a function that takes in x and gives y. Like f(x) = y

feature_names = iris.feature_names
target_names = iris.target_names


array(['setosa', 'versicolor', 'virginica'], dtype='<U10')

We now have the data we need, now we will split our data into training and testing sets

In [6]:
from sklearn.model_selection import train_test_split # this randomly selects training and testing subsets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2) # test_size gives proportion of testing size to entire data set

print(X_train.shape) # returns 120 rows and 4 columns
print(X_test.shape) # returns 30 rows and 4 columns

(120, 4)
(30, 4)


Now we will create the model and use the nearest neighbors algorithm

In [7]:
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors = 3) # we set n_neighbors to 3 because there are 3 different species in the dataset
knn.fit(X_train, y_train) # now created model, now need to check output

y_pred = knn.predict(X_test)

Now we will test our model for accuracy

In [8]:
from sklearn import metrics
print(metrics.accuracy_score(y_test, y_pred))

0.9333333333333333


In [None]:
We can also use other algorithms, here is the decision tree

In [9]:
from sklearn.tree import DecisionTreeClassifier
knn = DecisionTreeClassifier()
knn.fit(X_train, y_train) # now created model, now need to check output

y_pred = knn.predict(X_test)
print(metrics.accuracy_score(y_test, y_pred))

0.9333333333333333


We can also create our own testing samples and check the model predictions

In [10]:
sample = [[3,5,4,2],[2,3,5,4]]
predictions = knn.predict(sample)
pred_species = [iris.target_names[p] for p in predictions]
print("predictions:", pred_species)

predictions: ['versicolor', 'virginica']
