<a href="https://colab.research.google.com/github/novoic/ml-challenge/blob/master/image_challenge.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<a href="https://novoic.com"><img src="https://novoic.com/wp-content/uploads/2019/10/logo_320px.png" alt="Novoic logo" width="160"/></a>

# Novoic ML challenge – image data

## Introduction
Welcome to the Novoic ML challenge!

This is an open-ended ML challenge to help us identify exceptional researchers and engineers. The guidance below describes an open-source dataset that you can use to demonstrate your research skills, creativity, coding ability, scientific communication or anything else you think is important to the role.

Before starting the challenge, go ahead and read our CEO's [Medium post](https://medium.com/@emil_45669/the-doctor-is-ready-to-see-you-tube-videos-716b12367feb) on what we're looking for in our Research Scientists, Research Engineers and ML Interns. We recommend you spend around three hours on this (more or less if you wish), which you do not have to do in one go. Please make use of any resources you like.

This is the image version of the challenge. Also available are text and audio versions. You can access all three from [this GitHub repo](https://github.com/novoic/ml-challenge).

Best of luck – we're looking forward to seeing what you can do!

## Prepare the data
Copy the dataset to a local directory – this should take around 10 minutes.

In [None]:
#!mkdir -p data
#!gsutil -m cp -r gs://novoic-ml-challenge-image-data/* ./data

## Data description

The data comprises 17,125 images in jpg format. Each image is of a realistic scene typically containing a number of objects.

There are 20 object classes of interest: aeroplane, bicycle, bird, boat, bottle, bus, car, cat, chair, cow, dining table, dog, horse, motorbike, person, potted plant, sheep, sofa, train, TV monitor. 

Each image is labelled with one of three numbers for each object class:
- -1 (no objects of this class feature in the image)
- 1 (at least one object of this class features in the image)
- 0 (at least one object of this class features in the image but they are all difficult to recognise)


In [None]:
import IPython 
IPython.display.Image(filename='data/images/2012_004258.jpg') 

In [None]:
IPython.display.Image(filename='data/images/2008_007739.jpg')

Each object class file (e.g. `aeroplane.txt`) contains the name of the image without the extension (e.g. `2008_007739`) followed by a space and then the class label (e.g. `-1`).

For more information about the dataset, see its `README.md`.

Directory structure:
```
data/
├── images/         # dir for jpg files
├── aeroplane.txt   # aeroplane object class labels
├── bicycle.txt     # bicycle object class labels
├── bird.txt        # bird object class labels
├── boat.txt        # boat object class labels
├── bottle.txt      # bottle object class labels
├── bus.txt         # bus object class labels
├── car.txt         # car object class labels
├── cat.txt         # cat object class labels
├── chair.txt       # chair object class labels
├── cow.txt         # cow object class labels
├── diningtable.txt # dining table object class labels
├── dog.txt         # dog object class labels
├── horse.txt       # horse object class labels
├── motorbike.txt   # motorbike object class labels
├── person.txt      # person object class labels
├── pottedplant.txt # potted plant object class labels
├── sheep.txt       # sheep object class labels
├── sofa.txt        # sofa object class labels
├── train.txt       # train object class labels
├── tvmonitor.txt   # TV monitor object class labels
├── LICENSE
└── README.md
```






## The challenge
This is an open-ended challenge and we want to witness your creativity. Some obvious suggestions:
- Data exploration/visualization
- Binary/multiclass classification
- Anomaly detection
- Unsupervised clustering
- Model explainability

You're welcome to explore one or more of these topics, or do something entirely different.

Create, iterate on, and validate your work in this notebook, using any packages of your choosing.

**You can access a GPU via `Runtime -> Change runtime type` in the toolbar.**

## Submission instructions
Once you're done, send this `.ipynb` notebook (or a link to it hosted on Google Drive/GitHub with appropriate permissions) to talent@novoic.com, ensuring that outputs from cells (text, plots etc) are preserved.

If you haven't applied already, make sure you submit an application first through our [job board](https://novoic.com/careers/).

## Your submission
The below sets up TensorFlow as an example but feel free to use any framework you like.

In [None]:
# The default TensorFlow version on Colab is 1.x. Uncomment the below to use TensorFlow 2.x instead.
# %tensorflow_version 2.x

In [None]:
import tensorflow as tf
tf.__version__

Take the wheel!