# Exercise 1: Fruit Classification

## Exercise Description
* This exercise gives a basic introduction to Machine Learning model development.
* The model created in this exercise predicts the name of the fruit according to the descriptive text features of the fruit provided, e.g. the features "yellow curved soft tropical" describe the fruit: "banana"

## Learning Outcomes:
1. Understand ML model creation
2. Learn cloud deployment techniques
3. Experience API creation
4. Gain hands-on experience with different ML paradigms

Notebook Guidelines:
* This notebook is shared as a templaet for you to use as you follow along the practical exercise.

## Time Allocation: 10 minutes

## Help
* The webinar is recorded
* If you get stuck, you can refer to the recording afterwards
* You are welcome to use the Gemini interface to get guidance on the exercise.



# Step 1: Import Libraries

In [3]:
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.model_selection import train_test_split
import pickle

# Step 2: Create Sample Fruit Feature Dataset

In [4]:
data = {
    'features': [
        'round red sweet grows on trees',
        'yellow curved soft tropical',
        'small round orange citrus',
        'green oval tropical large seed'
    ],
    'fruit': ['apple', 'banana', 'orange', 'avocado']
}

# Step 3: Put Fruits Feature Dataset into Data Frame


In [5]:
df = pd.DataFrame(data)

# Step 4: Vectorize text features

In [6]:
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(df['features'])
y = df['fruit']

# Step 5: Split Training Data and Test Data
* A 75%:25% train-test split is used in this exercise.

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25)

# Step 6: Train Machine Learning Model
* An algorithm is selected for training the machine learning model.
* The Multinomial Naive-Bayes Algorithm is used for model training.

In [10]:
classifier = MultinomialNB()
classifier.fit(X_train, y_train)

# Step 7: The model is saved into a deployable object.
* This model is serialised into a pickle file format (.pkl)
* This file format is what is deployed so that the machine learning model you have created can be called by applications for real time predictions and inferencing.

In [8]:
with open('fruit_vectorizer.pkl', 'wb') as f:
    pickle.dump(vectorizer, f)
with open('fruit_classifier.pkl', 'wb') as f:
    pickle.dump(classifier, f)