The code in this repository uses Convolutional Neural Networks (CNN) in Tensorflow/Keras to classify images of two sets of plant species (e.g. the morphologically similar plant families Lycopodieaceae and Selaginellaceae, or two species of the Frullania genus) based on the available corpus of images. Scripts are available to download and preprocess images. The CNN and classification programs are generic enough to accept images of any species of plants (or other objects!).
- Clone the repository to your local machine.
- Confirm you have the necessary Python version and packages installed (see Environment section below).
- Prepare two sets of images, each within a directory that indicates the class name. These folders should be put together in a directory, with no other image files. e.g.,
training_image_folder
└───species_a
└───species_b
- If you have TIF images, use the script in
utilities\image_process\tif_to_jpg.py
to quickly convert files.
(The TIF files will be moved to a new subdirectory calledtif
.) - You should prepare a separate group of test images, either manually, or you can use the available utility script:
utilities\image_processing\create_test_group.py
.- This script defaults to creates a split of 90% for training/validation and 10% for testing. It creates copies of the images in four new directories -- folder1test, folder1train, folder2test, and folder2train.
This code has been tested in Python 3.9.4 in Windows and Ubuntu, using Anaconda
for virtual environments. Please consult requirements.txt
or the list below
for necessary Python packages.
- tensorflow 2.5.0-rc3 (Release v1.0 and earlier are compatible with TensorFlow 1.15.0)
- matplotlib~=3.4.2
- numpy==1.19.5
- opencv-python~=4.5.2.52
- pandas==1.2.4
- scikit-learn==0.24.2
- pillow~=8.2.0
- scipy~=1.6.2
- requests~=2.25.1
- scikit-image~=0.18.1
- augmentor~=0.2.8
- Run
train_models_image_classification.py
ortrain_handwriting_model.py
, using arguments to specify image sets and hyper-parameters.
- Arguments: (
-h
flag for full details)training_set
(positional, required) - file path of the directory that contains the training images (e.g.training_image_folder
as described in the Setup section.)height
(positional, required) - desired image size (if the optional-w
argument is not provided, images will be loaded asheight x height
square)-w
- image width for non-square images- (
-color
,-bw
) - boolean flag for number of color channels (RGB or K) (default = color) -lr
- learning rate value (decimal number, default = 0.001)-e
- number of epochs per fold (integer >= 5, default=25)-b
- batch size for updates (integer >= 2, default=64)-cls
- number of classes (integer >= 2, default=2)
- Weights:
- Determined in model_training.py lines 33-43, 51
- Uncomment line with desired weights
- To use no weighting, comment:
- Lines 33-43 (Optional)
- Line 51 (class_weight=self.class_weight)
- Output:
- Directory
saved_models
is created in current working directory, which will contain one model file after training (CNN_1.model
). - Directory
graphs
is created in current working directory, which will contain all generated graphs/plots for each run, plus a CSV summary for each fold.- Note: This directory will be empty after CTC model training.
- Directory
- Example execution (CNN):
python train_models_image_classification.py training_images 128 -color -lr 0.005 -f 10 -e 50 -b 64 -cls 2 > species_a_b_training_output.txt &
- After the training is finished, use the model file(s) to classify test set images. The number of predictions generated = # of test images * # of model files To run
classify_images_by_vote.py
:
- Arguments: (
-h
flag for full details)images
(positional, required) - file path of a directory containing the test image foldersmodels
(positional, required) - a single model file, or a folder of models (e.g.saved_models
in working directory)height
(positional, required) - desired image size (if the optional-w
argument is not provided, images will be loaded asheight x height
square)-w
- image width for non-square images- (
-color
,-bw
) - boolean flag for number of color channels (RGB or K) (default = color)
- Output:
- Directory
predictions
is created if needed, and the predictions are saved as a CSV file (yyyy-mm-dd-hh-mm-ssmodel_vote_predict.csv
).
- Directory
- Example execution (CNN only):
python classify_images_by_vote.py test_images saved_models 128 -color
- / - Contains the main files used for training and testing models.
- /data_visualization - Contains the files for generating and saving graphs/data visualizations after training. Creates a
graphs
directory, if it doesn't already exist. - /labeled_images - Contains the files for loading in image sets.
- /models - Contains the files used to define neural network layer architectures.
- /utilities - Contains image preprocessing scripts, a simple program timer, and archived files.
- train_models_image_classification.py
- The main training program for image classification -- see Workflow above.
- train_handwriting_model.py
- The main training program for RNN/CTC handwriting digitization -- see Workflow above.
- classify_images_by_vote.py
- The main testing program for image classification -- see Workflow above.
This code has been developed by Beth McDonald (emcdona1, Field Museum, former NEIU), Sean Cullen (SeanCullen11, NEIU) and Allison Chen (allisonchen23, UCLA).
This code was developed under the guidance of Dr. Matt von Konrat (Field Museum), Dr. Francisco Iacobelli (fiacobelli, NEIU), Dr. Rachel Trana (rtrana, NEIU), and Dr. Tom Campbell (NEIU).
This project was made possible thanks to the Grainger Bioinformatics Center at the Field Museum.
This project has been created for use in the Field Museum Gantz Family Collections Center, under the direction of Dr. Matt von Konrat, Head of Botanical Collections at the Field.
Please contact Dr. von Konrat for licensing inquiries.