Skip to content
Permalink
Branch: master
Find file Copy path
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
142 lines (101 sloc) 8.01 KB

Note!!

Sometimes github struggles to visualize jupyter notebook files. If you try clicking the file guide.ipynb and nothing renders (except possibly an error message), go to https://nbviewer.jupyter.org/github/epimedai/flowers/blob/master/guide.ipynb to view the file through Jupyters own notebook viewer.

Flower species classification

This repo contains the basic building blocks for creating an image classification model for recognizing flowers of various species. It exists as an extension of this machine learning workshop, and subsequent variants, hosted by Tekna. The repo consists of three main components:

  1. The slides used in the workshop
  2. A fully working guide implemented as a Jupyter Notebook
  3. This readme, which also substitutes as a guide for setting up the necessary environment

Additionally there is a version of the guide implemented as six stand-alone python scripts, one per step, for users not familiar with Jupyter Notebooks.

Setup

The goal of this setup is creating an environment where we can run the code listed in the guide. To achieve this goal there are two necessary prerequisites:

  • A working Python 3 environment with Tensorflow and Keras installed
  • A folder containing the dataset structured as we want it

Additionally there is a brief introduction to jupyter notebooks, for those wanting to run the guide.

Configuring the environment

Setting up our environment means installing Python and all the packages we will be needing for this project. In this guide we are going to use conda as an environment manager and pip as a package manager. There is, however, a wide variety of options out there, but as long as you are able to run the sanity check you should be good.

Installing conda

Conda is a package, dependency and environment manager for several languages, but in this project we will take advantage of the environment management capabilities. We will be using a version called Miniconda, which is installed by downloading and running a bash-script. Note that both the URL and the name of the script depend on your operating system.

macOS and Linux

macOS: https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh

Linux: https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh

$ wget <url> 
$ sh Miniconda3-latest-<OS>-x86_64.sh

Running the script will trigger a bunch of prompts, one of which is

Do you wish the installer to prepend the Miniconda3 install location to PATH in your /home/esten/.bashrc ? [yes|no]

where we recommend you answer yes. Once the installer finishes Miniconda is installed, including both Python and pip ready for use.

Windows

The miniconda installer for windows can be downloaded from https://conda.io/en/latest/miniconda.html

Follow the installation guide until it is completed. Open the "Anaconda Prompt" from your start menu. If you are using linux, you should use the terminal to execute the commands, but if you are using windows you should use the "Anaconda prompt".

Creating the environment

We can create an environment with our newly installed conda installation using the command conda create. We do, however, have to source the .bashrc (or .bash_profile for Mac users) file modified in the previous step. This is not nessecary for windows:

$ source .bashrc 

Next, we run the command that creates a new environment. Here we name it "ml" and give it the default python version 3.6.

$ conda create --name ml python=3.6

The new environment has to be activated.

Linux / macOS
$ source activate ml
Windows
$ conda activate ml

If everything went as intended the command line prompt should now be prefixed with the name of the environment.

(ml) $ .

Installing packages

The two most important packages needed for this project are Tensorflow and Keras, which can both be installed by pip. Additionally we will install matplotlib and PIL for interacting and showing images.

(ml) $ pip install tensorflow
(ml) $ pip install keras
(ml) $ pip install matplotlib
(ml) $ pip install pillow

(Note: Users with a GPU could install tensorflow-gpu instead of tensorflow to greatly increase the training efficiency, but this is not necessary for this workshop)

We also recommend installing Jupyter to be able to run the guide as a notebook:

(ml) $ pip install jupyter

Environment sanity check

We can check that everything works as it should by importing the packages in Python:

(ml) $ python -c "import tensorflow" 
(ml) $ python -c "import keras"

If you are able to run these commands without anything failing horribly (warnings are OK!) you are all set up.

Note that whether or not this setup runs smoothly depends heavily on what already exists on your OS. Typical problems relate to image-specific libraries used by tensorflow. If you run into trouble you should get far by googling, or by sending me an email at esten@epimed.ai

Preparing the dataset

The dataset we will be using consists of images of flowers, 17 species with 80 samples each, and was created by the Visual Geometry Group at the University of Oxford. For later convenience we want the dataset structured as follows:

  • Two folders, called train and val, each containing
  • 17 folders, one per species, each containing
  • A set of images of of flowers of the given species

We will be using 65 images per species for training and 15 images per species for validation.

Downloading the dataset

Clone (download) the Flowers guide from https://github.com/epimedai/flowers/blob/master/guide.ipynb to your machine. If you have installed git, you can git clone it, otherwise you can visit the GitHub web site, press either Download or Raw, and save the file to a local folder. This folder will now be your repo's root folder.

Before continuing, Windows users must download and install wget, for instance from https://eternallybored.org/misc/wget/ (version 1.20).

The dataset can now be downloaded and unzipped into this repo's root folder as follows:

(ml) $ wget http://www.robots.ox.ac.uk/~vgg/data/flowers/17/17flowers.tgz
(ml) $ tar -xvzf 17flowers.tgz

or by downloading the zipped file from http://www.robots.ox.ac.uk/~vgg/data/flowers/17/17flowers.tgz and extracting the files into this repo's root folder.

Restructuring

The newly downloaded dataset is structured by having the first 80 images belonging to the first species, the next 80 images to the second, and so on. For the sake of simplicity, we want to change this structure. By executing the script restructure.py (found in this repo) in the same folder as jpg, the images will be structured as described in Preparing the dataset. This step can also be done manually. If you don't have git, download the file from https://github.com/epimedai/flowers/blob/master/restructure.py into the repo's root folder and run it:

Make sure that the newly created folder is called flowers (this will happen automatically after running restructure.py) and is placed in the same directory as this guide, i.e. root folder of this repo. When this is the case you are ready to go!

Jupyter notebooks

Jupyter notebook is a web application for running and sharing code and documentation in a user friendly and readable format. If you installed jupyter as defined here you are all set up to start using notebooks. To start, run jupyter from the terminal (note: You should be in the folder where guide.ipynb is located, which should be the root folder of this repo):

(ml) $ jupyter notebook

If you open http://localhost:8888 you should see the file-structure of the folder where you run the command, and if any .ipynb-files (such as the guide) exists, simply click them to get started.

You can’t perform that action at this time.