## A Simple Machine Learning App to Classify Apple & Orange

<img src="images/apple_and_orange4.jpg" />

In this simple tutorial, we are going to illustrate the working principle behind supervised learning. Simply we will create a simple machine learning application that has the ability to learn the differenes between the two fruits (e.g., an apple and an orange) and make prediction accordingly, after being given some examples.

### Let's get started

First, we will import the required libraries.


In [1]:
from sklearn import tree

Next, our supervised learning recipe has the folloing flow:
* Data Preparation
* Supervised Learning
* Making Predictions

Let's talk about each of the steps

### Data Preparation

Te data preparaton is really really an important phase in order to create a successful machine learning application. This single stage can holds a number of several sub-phases such as collection, exploration, improving quality, feature engineering, and spliting for training, testing and evaluation. 

Today, we keep this tutorial as simple as possible. Thus, we will use a tiny data set taken from [here](https://blog.education-ecosystem.com/a-simple-machine-learning-algorithm-to-differentiate-between-an-apple-and-an-orange/). 

This data set consist of 4 examples as follows:

| Weight (grams) | Texture | Class/Label |
| --- | --- | --- |
| 155 | Rough | Orange |
| 180 | Rough | Orange |
| 135 | Smooth | Apple |
| 110 | Smooth | Apple |

Those examples of apples and oranges in the table will be used as training data in the following.

In [2]:
# Use Python array to store the table's values

# TODO: Create a feature array with the appropriate features values
features = [[155, "rough"],
            [180, "rough"],
            [135, "smooth"],
            [110, "smooth"]]

# TODO: Create a label array with the appropriate class/label values
labels = ["orange",
          "orange",
          "apple",
          "apple"]


**Note** 

Some machine learning algorithms work better when convert categorical data, or text data, into numbers. Thus, it is a common requirement for many machine learning algotihms implemented in scikit-learn; please see [*LabelEncoder*](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html) and [this](https://medium.com/@contactsunny/label-encoder-vs-one-hot-encoder-in-machine-learning-3fc273365621)

Since scikit-learn requires numerical features, let's convert the features/class/label of orange and apple and give them  integer values of **1** and **0**, respectively.


_New Python Code_
We will update our code to reflect the change so that we have
* rough as 0 and smooth as 1
* oranges as 1 and apples as 0

In [3]:
# convert categorical data into numbers and re-write the above Python arraies 

# TODO: modify categorical data into numbers
features = [[155, 0],
            [180, 0],
            [135, 1],
            [110, 1]]

# TODO: modify categorical data into numbers
labels = [1,
          1,
          0,
          0]


### Supervised Learning

In machine learning, classification is a supervised learning wherby the alorithm learns from a set of exampled and labled data.

To achieve this task, we will use
* [Scikit-Learn](https://scikit-learn.org/stable/) - Machine Learnong lirary for Python
* [Decision Tree](https://scikit-learn.org/stable/modules/tree.html) algorithm as supervised machine learnign classifer.

Now, let pass our training set, which consist of features and labels, to the _fit_ method in order to _learn_ from these examples.

In [9]:
# Training the model
classifier = tree.DecisionTreeClassifier() # initilize the classifire
classifier = classifier.fit(features, labels) # train (learn) from our data

After being fitted (learned), the model can then be used to predict the class of NEW samples:



In [10]:
prediction = classifier.predict([[120, 1]])
print (prediction)

[0]


_or you can printed in more readable way.._

In [11]:
prediction = "Orange" if classifier.predict([[120, 1]]) else "Apple"
print (prediction)

Apple


### Exercise


please do the following:
* convert the orange & apple data set into scv file
* use panda to read the file
* use LabelEncoder method to change categorical data into numerical data

In [None]:
import pandas as pd

iris_data = pd.read_csv('apple_and_orange.csv')
iris_data.head()

In [None]:

features = [[155, 0], [180, 0], [135, 1], [110, 1]] 
labels = [1, 1, 0, 0]
classifier = tree.DecisionTreeClassifier()
classifier = classifier.fit(features, labels)
