# Research Engineer test task

Dear applicant!
We are asking you to implement a thoughtful pipeline for a rather simple toy task.

You are given the dataset from the Tiny ImageNet Challenge which is the default final project for Stanford [CS231N](http://cs231n.stanford.edu/) course. It runs similar to the [ImageNet challenge](http://www.image-net.org/challenges/LSVRC/2014/) (ILSVRC). The goal of the original challenge is for you to do as well as possible on the Image Classification problem.

Although your goal of this task is **not** just to do as good as possible on the Image Classification problem.
We would rather encourage you to demonstrate best practices and your skills of rapid prototyping reliable pipelines.
You may want to take a look at Andrej Karpathy's [Recipe for Training Neural Networks
](https://karpathy.github.io/2019/04/25/recipe/).

We provided you with a simple pytorch baseline. You are free to use it or to design the whole solution from scratch. 
We are not restricting you with the frameworks, you can install any package you need and organise your files however you want. Just please make sure to provide all the sources, checkpoints, logs, visualisations etc.
If you decide to use our platform setup, it is all already taken care of. We will just review the artifacts of your work on our storage. Otherwise it is your responsibility.

To add some measurable results to the task, your final goal will be to achieve best accuracy on the provided __test__ split of the Tiny ImageNet dataset.
Also you are expected to show all the visualisations you find necessary alongside with the final evaluation metrics. We already took care of a tensorboard setup for you, so you can track some of your plots there. Please follow  [README.md](https://github.com/neuromation/test-task/blob/master/README.md) for instructions. 


# Table of contents:
- [Preparing the data](#Preparing-the-data)
- [Training model](#Training-model)
- [Visualizing results](#Visualizing-results)
- [Testing the solution](#Testing-the-solution)

In [82]:
import os
import shutil
import csv
import zipfile
import urllib.request

# Preparing the data

Let's define a path to data.

In [16]:
DATA_PATH = "..\\" + "data"

Downloading data.

Unzipping data.

In [None]:
with zipfile.ZipFile(os.path.join(DATA_PATH, "tiny-imagenet-200.zip"), 'r') as zip_ref:
    zip_ref.extractall(DATA_PATH)

# Deleting zip.
data_folder = os.listdir(DATA_PATH)
for item in data_folder:
    if item.endswith(".zip"):
        os.remove(os.path.join(DATA_PATH, item))

Now let's put all logic that handles the data in one class.

In [83]:
class PrepareData():
                 
    # Moving folders to upper directory.
    def move_folders():
        
        data_folder = os.listdir(os.path.join(DATA_PATH, "tiny-imagenet-200"))

        for item in data_folder:
            shutil.move(os.path.join(DATA_PATH, "tiny-imagenet-200", item), os.path.join(DATA_PATH, item))
            
        shutil.rmtree(os.path.join(DATA_PATH, "tiny-imagenet-200"))
    
    # Preparing data in train folder.
    def prepare_train():
        
        data_folder = os.listdir(os.path.join(DATA_PATH, "train"))

        for item in data_folder:

            # Removing .txt
            data_folder_1 = os.listdir(os.path.join(DATA_PATH, "train", item))
            for item_1 in data_folder_1:
                if item_1.endswith(".txt"):
                    os.remove(os.path.join(DATA_PATH, "train", item, item_1))

            images = os.listdir(os.path.join(DATA_PATH, "train", item, "images"))
            for image in images:    
                shutil.move(os.path.join(DATA_PATH, "train", item, "images", image), os.path.join(DATA_PATH, "train", item, image))
            shutil.rmtree(os.path.join(DATA_PATH, "train", item, "images"))
    
    # Preparing data in val folder.
    def prepare_val():
        
        with open(os.path.join(DATA_PATH, "val", "val_annotations.txt")) as f:
            reader = csv.reader(f)
            images_labels = list(reader)
        for i in images_labels:
            i[0] = i[0].split("\t")
            
        images_names = []
        images_folders = []     
        
        for i in images_labels:
            images_names.append(i[0][0])
            images_folders.append(i[0][1])        
            
        for folder in set(images_folders):
            os.mkdir(os.path.join(DATA_PATH, "val", folder))     
            
        data_folder = os.listdir(os.path.join(DATA_PATH, "val", "images"))

        for item in data_folder:
            shutil.move(os.path.join(DATA_PATH, "val", "images", item), os.path.join(DATA_PATH, "val", images_folders[images_names.index(item)]))            
            
        shutil.rmtree(os.path.join(DATA_PATH, "val", "images"))
        os.remove(os.path.join(DATA_PATH, "val", "val_annotations.txt"))       
        
    # Preparing data in test folder.
    def prepare_test():
        
        with open(os.path.join(DATA_PATH, "test", "test_annotations.txt")) as f:
            reader = csv.reader(f)
            images_labels = list(reader)
        for i in images_labels:
            i[0] = i[0].split("\t")
            
        images_names = []
        images_folders = []     
        
        for i in images_labels:
            images_names.append(i[0][0])
            images_folders.append(i[0][1])        
            
        for folder in set(images_folders):
            os.mkdir(os.path.join(DATA_PATH, "test", folder))     
            
        data_folder = os.listdir(os.path.join(DATA_PATH, "test", "images"))

        for item in data_folder:
            shutil.move(os.path.join(DATA_PATH, "test", "images", item), os.path.join(DATA_PATH, "val", images_folders[images_names.index(item)]))            
            
        shutil.rmtree(os.path.join(DATA_PATH, "test", "images"))
        os.remove(os.path.join(DATA_PATH, "test", "test_annotations.txt"))           

In [76]:
PrepareData.move_folders()

In [77]:
PrepareData.prepare_train()

In [78]:
PrepareData.prepare_val()

# Training model

# Visualizing results

# Testing the solution

In order to test solution add "test_annotations.txt" file to test folder and run cells below.

In [None]:
PrepareData.prepare_test()

In [79]:
# TEst generator

In [80]:
# model.evaluate