# Instructions

## Overview

In this project you are asked to create an ML-service to solve any kind of task.

You should define the task you are going to solve. Task definition should contain input and output description, approach chosen to solve (model description), dataset for model training and runtime architecture for the resulting service.

You can publish your code on private GitHub repo.

## Details for the project structure:

1. Project documentation

* 1.1. design document

* 1.2. run instructions (env, commands)

* 1.3. architecture, losses, metrics

2. Data set

3. Model training code.

* 3.1. Jupyter Notebook

* 3.2. MLFlow project

4. Service deployment and usage instructions

* 4.1. dockerfile or docker-compose file

* 4.2. required services: databases

* 4.3. client for service

* 4.4. model

1. **Project Documentation**
    * 1.1. Design Document: Describe the problem you're solving, why you chose ResNet50, and how you're fine-tuning it.
    * 1.2. Run Instructions: Document the environment setup (Python version, required libraries), and the commands to run your code.
    * 1.3. Architecture, Losses, Metrics: Describe the architecture of ResNet50, the loss function you're using (e.g., CrossEntropyLoss for multi-class classification), and the metrics (e.g., accuracy).

2. **Dataset**
    * Describe the Food-101 dataset, how you selected 50 subclasses, and how you split the data into training, validation, and test sets.

3. **Model Training Code**
    * 3.1. Jupyter Notebook: Write the code for data loading, model training, and evaluation in a Jupyter notebook.
    * 3.2. MLFlow Project: Use MLFlow to track your experiments, log metrics, and save models.

4. **Service Deployment and Usage Instructions**
    * 4.1. Dockerfile or Docker-compose file: Create a Dockerfile for your service, which includes the environment setup and the command to run your service.
    * 4.2. Required Services: Databases: If you need to store results, describe the database you're using.
    * 4.3. Client for Service: Provide a client script to call your service and get predictions.
    * 4.4. Model: Include instructions on how to load the trained model for making predictions.

Remember, this is a high-level plan. Each step will involve more detailed tasks. For example, for model training, you'll need to write code for data loading, data augmentation, model training, model validation, and model saving.

## Useful links:

- Datasets: http://kaggle.com/
- Finished Models: https://paperswithcode.com/
- GPU Learning (limited, but suitable for Learning Transfer): http://colab.research.google.com/
- Recommended Models for Learning Transfer:

- text - BERT
- images - Big Transfer

Project examples:
- Image or text classification and semantics analysis
- Lyrics generator with musician style (RNN model)
- Image super resolution (CNN)
- Image inpainting or generation (GAN, vAE, DDPMs)
- Image Style Transfer (GAN model)
- House price prediction based on image and table information
- Tags generation for StackOverflow questions


## Grading Criteria

1.5 points – data collection and model training. Points could be taken away for:
- improper use of data
- incorrect train/test/va split
- little data
- poor model quality evaluation
- no comparison to baseline
- no estimate of model runtime and size

1.5 points – service implementation. Criteria: Justification of architecture selection based on evaluation of:
- the service
- possible RPS (requests per second)
- models used
- implementation of backend architecture
- implementation of some interface (API or UI)
- evaluation of service quality, operation/response time

1 point – answers to additional questions on the implementation

You are allowed to create any kind service, as long as there is a clear objective and solution

Total: 4 points

You should submit GitHub or GitLab repo that meets the requirements.

Yes, Docker is a great choice for building the web service. It allows you to package your application along with all its dependencies into a container, which can then be run on any system that has Docker installed. This ensures that your application will run the same way regardless of the environment.

Here's a high-level overview of how you might use Docker in this project:

1. **Dockerfile:** Write a Dockerfile for your application. This is a script that contains instructions for how to build the Docker image. It will specify the base image (e.g., a Python image), the application's dependencies (which can be installed with pip), and the application's entry point.

2. **Build the Docker Image:** Use the `docker build` command to build the Docker image based on the Dockerfile.

3. **Run the Docker Container:** Use the `docker run` command to run a container based on the Docker image. This will start the web service.

4. **Docker Compose:** If your application has multiple services (e.g., a web service and a database), you can use Docker Compose to manage them. With a docker-compose.yml file, you can start all your services with a single command (`docker-compose up`).

Remember to include instructions for building and running the Docker image in your project's documentation.

Given the grading scale, here's how you can approach your project:

1. **Data Collection and Model Training (1.5 points)**

    - **Proper use of data**: Ensure that you're using the Food-101 dataset correctly. For example, don't use test data for training.
    - **Train/Test/Validation split**: Use a standard split such as 70% for training, 15% for validation, and 15% for testing.
    - **Data size**: The Food-101 dataset is quite large, so this shouldn't be an issue. If you're only using 50 subclasses, make sure you have enough examples for each class.
    - **Model quality evaluation**: Use appropriate metrics such as accuracy, precision, recall, and F1 score. Also, plot confusion matrices or ROC curves if applicable.
    - **Comparison to baseline**: You could use a simpler model (like a basic CNN) as a baseline and compare its performance with your fine-tuned ResNet50.
    - **Estimate of model runtime and size**: Measure the time it takes to train your model and to make predictions. Also, note the size of the saved model file.

2. **Service Implementation (1.5 points)**

    - **Justification of architecture selection**: Explain why you chose ResNet50 and how it's suitable for your task.
    - **The service**: Describe your service, what it does (e.g., takes an image and returns a prediction of the food class), and how it uses your trained model.
    - **Possible RPS (requests per second)**: Estimate how many requests your service can handle per second. This will depend on the runtime of your model and the resources of your server.
    - **Models used**: Document the use of the ResNet50 model, how it was trained, and how it's used in the service.
    - **Implementation of backend architecture**: Describe the backend of your service, e.g., how requests are handled, how the model is loaded and used, etc.
    - **Implementation of some interface (API or UI)**: Implement an API for your service that clients can use to send requests and receive predictions. If you're ambitious, you could also create a simple web UI.
    - **Evaluation of service quality, operation/response time**: Measure and document the response time of your service, i.e., how long it takes to return a prediction after receiving a request.