MSJE is a prototype system for the paper:
Learning TFIDF Enhanced Joint Embedding for Recipe-Image Cross-Modal Retrieval Service
- Recipe1M Dataset
- Vision models
- Out-of-the-box training
Given a recipe query which contains the recipe title, a list of ingredients and a sequence of cooking instructions, the goal is to train a statistical model to retrieve the associated image. For the recipe query, we list the top 5 images retrieved by JESR, ACME and our MSJE model.
We use the environment with Python 3.7.6 and Pytorch 1.4.0. Run
pip install --upgrade cython and then install the dependencies with
pip install -r requirements.txt. Our work is an extension of im2recipe.
The Recipe1M dataset is available for download here, where you can find some code used to construct the dataset and get the structured recipe text, food images, pre-trained instruction featuers and so on.
This current version of the code uses a pre-trained ResNet-50.
To train the model, you will need to create following files:
data/train_lmdb: LMDB (training) containing skip-instructions vectors, ingredient ids and categories.
data/train_keys: pickle (training) file containing skip-instructions vectors, ingredient ids and categories.
data/val_lmdb: LMDB (validation) containing skip-instructions vectors, ingredient ids and categories.
data/val_keys: pickle (validation) file containing skip-instructions vectors, ingredient ids and categories.
data/test_lmdb: LMDB (testing) containing skip-instructions vectors, ingredient ids and categories.
data/test_keys: pickle (testing) file containing skip-instructions vectors, ingredient ids and categories.
data/text/vocab.txt: file containing all the vocabulary found within the recipes.
Recipe1M LMDBs and pickle files can be found in train.tar, val.tar and test.tar. here
It is worth mentioning that the code is expecting images to be located in a four-level folder structure, e.g. image named
0fa8309c13.jpg can be found in
./data/images/0/f/a/8/0fa8309c13.jpg. Each one of the Tar files contains the first folder level, 16 in total.
The pre-trained TFIDF vectors for each recipe, image category feature for each image and the optimized category label for each image-recipe pair can be found in id2tfidf_vec.pkl, id2img_101_cls_vec.pkl and id2class_1005.pkl respectively.
Training word2vec with recipe data:
- Download and compile word2vec
- Train with:
./word2vec -hs 1 -negative 0 -window 10 -cbow 0 -iter 10 -size 300 -binary 1 -min-count 10 -threads 20 -train tokenized_text.txt -output vocab.bin
The pre-trained word2vec model can be found in vocab.bin.
- Train the model with:
CUDA_VISIBLE_DEVICES=0 python train.py
We did the experiments with batch size 100, which takes about 11 GB memory.
- Test the trained model with
CUDA_VISIBLE_DEVICES=0 python test.py
- The results will be saved in
results, which include the MedR result and recall scores for the recipe-to-image retrieval and image-to-recipe retrieval.
- Our best model trained with Recipe1M (TSC paper) can be downloaded here.
We are continuing the development and there is ongoing work in our lab regarding cross-modal retrieval between cooking recipes and food images. For any questions or suggestions you can use the issues section or reach us at email@example.com.
Lead Developer: Zhongwei Xie, Georgia Institute of Technology
Advisor: Prof. Dr. Ling Liu, Georgia Institute of Technology
If you use our code, please cite
 Zhongwei Xie, Ling Liu, Yanzhao Wu, et al. Learning TFIDF Enhanced Joint Embedding for Recipe-Image Cross-Modal Retrieval Service[J]//IEEE Transactions on Services Computing.
 Zhongwei Xie, Ling Liu, Lin Li, et al. Learning Joint Embedding with Modality Alignments for Cross-Modal Retrieval of Recipes and Food Images[C]//Proceedings of the 2021 International Conference on Information and Knowledge Management (CIKM).
 Zhongwei Xie, Ling Liu, Yanzhao Wu, et al. Cross-Modal Joint Embedding with Diverse Semantics[C]//2020 IEEE Second International Conference on Cognitive Machine Intelligence (CogMI). IEEE, 2020: 157-166.