Skip to content

Simple image classification using CNN (Keras) and dataset of pears and Sons.

License

Notifications You must be signed in to change notification settings

kmwolowiec/pear-sons

Repository files navigation

Pearson – image classification using CNN

General info

The project is an attempt of a image classification using Convolutional Neural Network with Keras and self-prepared Pearson image dataset.

Content

script.js - few lines of code which allows to scrape URLs of images already loaded in Google Images page

pears_urls.txt / sons_urls.txt- text files of pears/Sons images URLs, generated using script.js code

download-images.py - download images from URLs sourced in above .txt files

images.zip - pack of downloaded and preselected images, the dataset is devided into training and validation sets

pearson.ipynb - contains data preparation, sequential model (keras), it's training and simple evaluation

requirements.txt - list of Python dependencies of the project, e.g. numpy, tensorflow, keras (including jupyter notebook packages)

Summary

The main goal of this project is to apply simple CNN model for image classification using technology provided in keras. I'm not fully satisfied of the evaluation of that project but on the other hand I kept in mind issues connected with applied assumptions. I will show you below main 4 conclusions:

1. Too small data set - whole dataset has 1108 image after preselection, of which 400 pears images and 508 Sons images in training set and 100 of each in validation set. The larger dataset is, the more accurate the prediction should be.

2. The applied models architecture wasn't perfect - my simple model consists of only 12 layers. For example ResNet50 model (from ImageNet competition) included about 150 layers, with over 25 000 000 of params, but it was trained on over 1.2 mln images. In my opinion there is no sense for creating more complicated model to such a tiny dataset, due to overfitting probabillity. It is much easier to overfit model with less amount of data. However, it is possible to make some changes, that will not complicate the model much and could increase accuracy. I must admit that I still improve my skills in deep learning and more general in data science and IT. Every day I learn new practical skills, even while working on that project.

3. Appearance of undesirable objects - I need to point out about one more important issue. Other objects that appears on the images have impact on the final result. For example, Son is usually surrounded by grass (of football pitch). If we will try to predict category of a pear lying on the grass it is more likely that pear is actually Son. And one of factors that will cause that situation is grass on the image. Additionally Son has red T-shirt on many pictures. That's why it is even more likely that red pear on the grass is Spurs player!

Sample of Pears

Drag Racing

Sample of Sons

Drag Racing

4. Not considering rotation of images - the model should work in the same way regardless of rotation of images. In presented model rotated image provides different results.

How to use

To clone and run the project, you'll need Git (including git large file storage to download the images.zip archive.) and Python 3 installed. I'm gonna show you how to prepare the environment before working. The easiest way is to use pip and Python virtual environment (virtualenv package), but actually it's more convenient to use conda instead due to performance.

Windows

Firstly you need to download and install git-lfs. Then from your command line:

# install git-lft
$ git lfs install
# here should occur an information like: 'Git LFS initialized'.

# install python virtual environment
$ pip install virtualenv

# Clone this repository
$ git clone https://github.com/ThePearsSon/pear-sons.git

# Go into the repository
$ cd pear-sons

# Create and activate virtual environment
$ python -m venv env
$ env\Scripts\activate.bat

# Install dependencies
$ pip install -r dependencies.txt

# Launch jupyter notebook and have fun
$ jupyter notebook

Linux

Firstly you need to download and install git-lfs. To do that you will need root access. Then from your command prompt:

$ curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | sudo bash
$ sudo apt-get install git-lfs
$ git lfs install
# here should occur an information like: 'Git LFS initialized'.

# install python virtual environment
$ pip install virtualenv

# Clone this repository
$ git clone https://github.com/ThePearsSon/pear-sons.git

# Go into the repository
$ cd pear-sons

# Create and activate virtual environment
$ python -m venv env
$ source env/bin/activate

# Install dependencies
$ pip install -r dependencies.txt

# Launch jupyter notebook and have fun
$ jupyter notebook

About

Simple image classification using CNN (Keras) and dataset of pears and Sons.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages