This repository contains the project work of our group for the DTU special course Machine Learning Operations for the Autumn semester 2023.
Group members:
- Elena Muniz s213579
- Theodoros Loukis s223526
- Ioannis Louvis s222556
- Ioannis Karampinis s222559
The goal of the project is to fine tune a deep learning model based on Vision Transformer (ViT) that classifies the quality of fruits by their image.
We plan to use the tranformer framework from Huggingface. Specifically, use the Vision Transformer based on the paper: An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale.
We want to use the Transformers framework that includes many pretrained models, which wil intend to use in order to transfer, learn and train our classification model in the dataset bellow.
We plan to use the FRUIT CLASSIFICATION dataset from Kaggle. This is a dataset that contains a total of more than 14700 high quality fruit images of 6 different classes of fruits i.e. apple, banana, guava, lime, orange, and pomegranate. Our goal is to classify them to different classes based on their quality:
- Good.
- Bad.
- Mixed.
We expect to use the Vision Transformer (ViT) model, which is a deep learning model and is a transformer that is targeted at vision processing tasks such as image recognition. We might as well also try the BERT Pre-Training of Image Transformers (BEiT) and/or Data-efficient Image Transformers (DeiT) models, which are follow-up works on the original ViT model.