# ECS7020P mini-project submission

The mini-project has two separate components:


1.   **Basic component** [6 marks]: Using the MLEnd Yummy Dataset, build a machine learning pipeline that takes as an input a photo of a dish that has either rice or chips and predicts whether the picture has rice or chips.
2.   **Advanced component** [10 marks]: Formulate your own machine learning problem and build a machine learning solution using the MLEnd Yummy Dataset.

**Submit two Jupyter notebooks**, one for the basic component and another one for advanced component. Please **name each notebook**:

* ECS7020P_miniproject_basic.ipynb
* ECS7020P_miniproject_advanced.ipynb

then **zip and submit them toghether**.

Each uploaded notebook should include:

*   **Text cells**, describing concisely each step and results.
*   **Code cells**, implementing each step.
*   **Output cells**, i.e. the output from each code cell.

and **should have the structure** (9 sections) indicated below. Notebooks might not be run, please make sure that the **output cells are saved**.

How will we evaluate your submission?

*   Conciseness in your writing.
*   Correctness in your methodology.
*   Correctness in your analysis and conclusions.
*   Completeness.
*   Originality and efforts to try something new.

Suggestion: Why don't you use **GitHub** to manage your project? GitHub can be used as a presentation card that showcases what you have done and gives evidence of your data science skills, knowledge and experience.

Each notebook should be structured into the following 9 sections:


# 1 Author

**Student Name**:  Ishwar Joshi \
**Student ID**:  230194814



# 2 Problem formulation

In this notebook I will be taking an input of a dish which has rice or chips and predicting weather its rice or chips.

In [85]:
#Installing Mlend dataset
!pip install mlend




[notice] A new release of pip is available: 23.2.1 -> 23.3.1
[notice] To update, run: C:\Users\ishwa\AppData\Local\Microsoft\WindowsApps\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\python.exe -m pip install --upgrade pip


# 3 Machine Learning pipeline

Describe your ML pipeline. Clearly identify its input and output, any intermediate stages (for instance, transformation -> models), and format of the intermediate data moving from one stage to the next. It's up to you to decide which stages to include in your pipeline.

In [16]:
# pip install --upgrade google-api-python-client

In [3]:
# pip install google

In [2]:
# pip install google.colab

In [1]:
# from google.colab import drive

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import spkit as sp

from skimage import exposure
from skimage.color import rgb2hsv, rgb2gray
import skimage as ski

import mlend
from mlend import download_yummy, yummy_load,download_yummy_small,yummy_small_load

import os, sys, re, pickle, glob
import urllib.request
import zipfile

import IPython.display as ipd
from tqdm import tqdm
import librosa

# drive.mount('/content/drive')

# 4 Transformation stage

Describe any transformations, such as feature extraction. Identify input and output. Explain why you have chosen this transformation stage.

In [2]:
subset = {}
baseDir = download_yummy(save_to = 'Documents/Data/MLEnd',subset = subset,verbose=1,overwrite=False)
baseDir


Downloading 3250 image files from https://github.com/MLEndDatasets/Yummy
100%|[0m▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓[0m|3250\3250|003250.jpg
Done!


'Documents/Data/MLEnd\\yummy'

In [3]:
os.listdir(baseDir)

['MLEndYD_images',
 'MLEndYD_images_small',
 'MLEndYD_image_attributes_benchmark.csv',
 'MLEndYD_image_attributes_small.csv']

In [4]:
MLENDYD_df = pd.read_csv('Documents/Data/MLEnd/yummy/MLEndYD_image_attributes_benchmark.csv').set_index('filename')
MLENDYD_df

Unnamed: 0_level_0,Diet,Cuisine_org,Cuisine,Dish_name,Home_or_restaurant,Ingredients,Healthiness_rating,Healthiness_rating_int,Likeness,Likeness_int,Benchmark_A
filename,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
000001.jpg,non_vegetarian,japanese,japanese,chicken_katsu_rice,marugame_udon,"rice,chicken_breast,spicy_curry_sauce",neutral,3.0,like,4.0,Train
000002.jpg,non_vegetarian,english,english,english_breakfast,home,"eggs,bacon,hash_brown,tomato,bread,tomato,bake...",unhealthy,2.0,like,4.0,Train
000003.jpg,non_vegetarian,chinese,chinese,spicy_chicken,jinli_flagship_branch,"chili,chicken,peanuts,sihuan_peppercorns,green...",neutral,3.0,strongly_like,5.0,Train
000004.jpg,vegetarian,indian,indian,gulab_jamun,home,"sugar,water,khoya,milk,salt,oil,cardamon,ghee",unhealthy,2.0,strongly_like,5.0,Train
000005.jpg,non_vegetarian,indian,indian,chicken_masala,home,"chicken,lemon,turmeric,garam_masala,coriander_...",healthy,4.0,strongly_like,5.0,Train
...,...,...,...,...,...,...,...,...,...,...,...
003246.jpg,vegetarian,indian,indian,zeera_rice,home,"1_cup_basmati_rice,2_cups_water,2_tablespoons_...",healthy,4.0,strongly_like,5.0,Train
003247.jpg,vegetarian,indian,indian,paneer_and_dal,home,"fried_cottage_cheese,ghee,lentils,milk,wheat_f...",healthy,4.0,strongly_like,5.0,Test
003248.jpg,vegetarian,indian,indian,samosa,home,"potato,onion,peanut,salt,turmeric_powder,red_c...",very_unhealthy,1.0,like,4.0,Test
003249.jpg,vegan,indian,indian,fruit_milk,home,"kiwi,banana,apple,milk",very_healthy,5.0,strongly_like,5.0,Train


In [8]:
help(yummy_load)

Help on function yummy_load in module mlend.processing:

yummy_load(datadir_main='../MLEnd/yummy/', train_test_split='Benchmark_A', verbose=1, attributes_as_labels='all', encode_labels=False)
    Read files of Yummy Dataset and create training and testing sets.
    
    
    # Arguments
        datadir_main (str): local path where 'MLEndYD_images' directory is stored
                  relative to `../MLEnd/yummy/`).
        train_test_split (str): split type for training and testing
          - 'Benchmark_A': A predifined fixed split
             Training (70%) and Testing (30%)
          - 'Random' or 'random': random split woth 70-30
          - float (e.g. 0.8) (>0 and <1)
            random split with given fraction for training set.
            if train_test_split = 0.8, Training set will be 80% and Testing 20%
    
        attributes_as_labels: list of attribuetes as labels
          - attributes_as_labels = 'all' will return all the attribuetes as label
          - attributes_as

In [14]:
TrainSet, TestSet, Map = yummy_load(datadir_main=baseDir,train_test_split='Benchmark_A',attributes_as_labels='all',encode_labels=True)

Total 3250 found in Documents/Data/MLEnd\yummy/MLEndYD_images/


In [15]:
TrainSet

{'X_paths': [],
 'Y': Empty DataFrame
 Columns: [Diet, Cuisine, Home_or_restaurant, Healthiness_rating, Likeness, Dish_name, Ingredients, Healthiness_rating_int, Likeness_int]
 Index: [],
 'Y_encoded': array([], shape=(0, 4), dtype=float64)}

In [11]:
TestSet.keys()

dict_keys(['X_paths', 'Y'])

In [12]:
Map

{'Diet': {},
 'Home_or_restaurant': {'home': 0, 'restaurant': 1},
 'Healthiness_rating': {'very_unhealthy': 1,
  'unhealthy': 2,
  'neutral': 3,
  'healthy': 4,
  'very_healthy': 5},
 'Likeness_int': {'strongly_dislike': 1,
  'dislike': 2,
  'neutral': 3,
  'like': 4,
  'strongly_like': 5}}

In [13]:
TrainSet['Y']

Unnamed: 0,Diet,Cuisine,Home_or_restaurant,Healthiness_rating,Likeness,Dish_name,Ingredients,Healthiness_rating_int,Likeness_int


In [76]:
TrainSet['Y_encoded']

KeyError: 'Y_encoded'

# 5 Modelling

Describe the ML model(s) that you will build. Explain why you have chosen them.

# 6 Methodology

Describe how you will train and validate your models, how model performance is assesssed (i.e. accuracy, confusion matrix, etc)

# 7 Dataset

Describe the dataset that you will use to create your models and validate them. If you need to preprocess it, do it here. Include visualisations too. You can visualise raw data samples or extracted features.

# 8 Results

Carry out your experiments here, explain your results.

# 9 Conclusions

Your conclusions, suggestions for improvements, etc should go here