There are two options for environmental setup, a Conda environment or the requirements.txt. A conda environment is recommended as a specific python version was used.
- Go to root directory of project in terminal
- Run
conda env create -f caption_env.yml
- Confirm caption_env environment created with
conda env list
- Go to root directory of project in terminal
- Run
pip install requirements.txt
Caption generation is done from the command line. The image to be captioned should be accessable from the root directory of the project.
- Go to root directory of project and activate Conda environment if necessary
- Run
python generate_caption.py --p PHOTO_PATH --m MODEL_NAME
- PHOTO_PATH is the path from the directory to the desired photo. Ex: manedwolf.jpg
- MODEL_NAME is the name of one of the trained models
- base = base model
- selu = use of selu activation function rather than relu
- dropout = use of 0.2 droout rather than 0.5
- layers = extra layer in the feature extraction and decoder
- adamax = use of adamax optimizer instead of adam
Files were run either by pressing the run button in Pycharm or with python FILE
in the terminal. The files were run in the following order to train the model:
- prep_dataset
- prep_text
- train_model
- evaluate_model
- save_tokenizer
Any ablations done to train_model is then followed by running train_model again.
For out project we used the Flickr8K dataset to train the model. However, there were issues with the frequency of terms. For example, an abundance of the word 'red' in reference to shirts would have all captions with shirts have the word 'red'. Thus, we cleaned up the dataset by first counting the number of occurences of adjectives in the dataset. Any word in a certain category, like color, that occured inproportionately frequently was then altered. Many caption with those colors in it had the word removed so that the frequency of colors were more even without making the captions inaccurate.
If you posses another model you wish to generate captions with, simply change line 100 in generate_caption.py. The line should be as follows:
model = load_model(MODEL_PATH)
where MODEL_PATH is replaced with the path to the h5 file such as 'models/new-model.h5'.
It can then be run in the command line with python generate_model --p PHOTO_PATH
Any new models trained can be verified against the base model. This model is found in 'models/model-tf2.h5'. This model is also run with the --m base
flag if the code is not altered as per the previous section.