Multimodal recipe retrieval system built using PyTorch for text (instructions and ingredients) and images
Utilized the MIT Recipe 1M dataset for recipe information
The Infersent sentence embeddings were used to embed the recipe instructions
The individual ingredients were cleaned and embedded using TensorFlow
The ResNet-50 CNN model was used for the image embeddings
Ingredient encodings were fed into a bi-directional LSTM and instruction embeddings were fed into a unidirectional LSTM. These encodings were concatenated and passed through a fully connected layer. Cosine similarity loss function was then used to train the model. First, the text side was trained and then, resnet-50 model was fine tuned.
Ananya Gupta, Raghav Sriram, Samarth Ramesh, Samyak Jain