# A very gentle introduction to machine learning

In this notebook we will just get a very gentle introduction to the machine learning or maybe <b>introduction of introduction of machine learning</b> using most commonly used libraries in Python for machine learning. We will basically look at the three main classification algorithms and their scores.
<ul>
    <li>Logistic Regression LR</li>
    <li>KNeighborsClassifier KNN</li>
    <li>Support Vector Machine SVC</li>
</ul>

This notebook is more intended to make you comfortable using the syntax of these algorithms (i.e perameters, different utility funcitons and how train and test splits are performed etc). We will be using a very classic dataset for our practice in machine learning which you probably should have guessed if you have ever done machine learning <b>iris dataset</b>

In [None]:
import sys
print(sys.version)

In [None]:
import scipy
import numpy as np
import matplotlib
import pandas 
import sklearn

print ("Python: {}".format(sys.version));
print ("Scipy: {}".format(scipy.__version__))
print ("numpy: {}".format(np.__version__))
print ("matplotlib: {}".format(matplotlib.__version__))
print ("pandas: {}".format(pandas.__version__))
print ("sklean: {}".format(sklearn.__version__))

In [None]:
from pandas.plotting import scatter_matrix
import matplotlib.pyplot as plt
from sklearn import model_selection
from sklearn.metrics import classification_report
from sklearn.metrics import accuracy_score
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC

In [None]:
#Load Dataset
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
names = ["sepal-length","sepal-width","petal-length","petal-width",'class']
dataset = pandas.read_csv(url,names=names)

# Take a look at the data <a href="https://archive.ics.uci.edu/ml/datasets/iris" target="_blank">here</a>

<b>dataset</b> is an object of type <b>DataFrame</b> implemented in <b>pandas</b>. It orgranises the data in much simpler fashion. <b>DataFrame</b> or (<b>df</b> for short) is a 2-dimensional labeled data structure with columns of potentially different types. You can think of it like a spreadsheet. It is generally the most commonly used pandas object. Following are some of the methods used to get a quick look at the data.

In [None]:
dataset.head()

In [None]:
dataset.describe()

In [None]:
print (dataset.shape)

In [None]:
dataset.tail()

In [None]:
#Distribution (by class)
print (dataset.groupby("class").size())

In [None]:
dataset.hist()
plt.show()

In [None]:
scatter_matrix(dataset,figsize=(10,10))
plt.show()

In [None]:
#Spliting the data
array = dataset.values #This will convert the pandas dataframe object to np(numpy) array
X = array[:,0:4]
Y = array[:,4]
test_size = 0.20
seed = 7 
X_train,X_test,y_train,y_test = model_selection.train_test_split(X,Y,test_size= test_size,random_state = seed)

In [None]:
#Test option and evaluation metric
scoring = "accuracy"

# Creating List of classifiers
Now we are going to create the list classifiers for our classification problem.

In [None]:
models = []
models.append(("LR",LogisticRegression()))
models.append(("KNN",KNeighborsClassifier()))
models.append(("SVM",SVC()))

In [None]:
for model in models:
    print (model)

## K-Folds as per official docs
K-Folds cross-validator
Provides train/test indices to split data in train/test sets. Split dataset into k consecutive folds (without shuffling by default).
<br>
Each fold is then used once as a validation while the k - 1 remaining folds form the training set.
## cross_val_score 
cross_val_score is used to evaluate the score by cross validation 

In [None]:
results = []
names = []
print ("Type   MEAN     Std")
for name,model in models:
    kfold = model_selection.KFold(n_splits=10,random_state=seed)
    cv_results = model_selection.cross_val_score(model,X_train,y_train,scoring=scoring,cv=kfold)
    results.append(cv_results)
    names.append(name)
    msg = "%s:   %0.3f    (%0.5f)" % (name,cv_results.mean(),cv_results.std())
    print (msg)

In [None]:
#Make predictions on test dataset
for name,model in models:
    model.fit(X_train,y_train)
    predictions = model.predict(X_test)
    print (name)
    print(accuracy_score(y_test,predictions))
    print (classification_report(y_test,predictions))

So here ends our very first tutorial to machine learning. I suggest you writing this code by yourself to get complete understanding. Under stand each and every complex 