Skip to content

oxai/world-wide-dishes

Repository files navigation

You are what you eat? Feeding foundation models a regionally diverse food dataset of World Wide Dishes

Jabez Magomere, Shu Ishida, Tejumade Afonja, Aya Salama, Daniel Kochin, Foutse Yuehgoh, Imane Hamzaoui, Raesetje Sefala, Aisha Aalagib, Elizaveta Semenova, Lauren Crais, Siobhan Mackenzie Hall

Official Website (used for data collection): https://worldwidedishes.com/

The World Wide Dishes Dataset

We present the World Wide Dishes dataset which seeks to assess these disparities through a decentralised data collection effort to gather perspectives directly from people with a wide variety of backgrounds from around the globe with the aim of creating a dataset consisting of their insights into their own experiences of foods relevant to their cultural, regional, national, or ethnic lives.

The meta data of the World Wide Dishes dataset is available in the Croissant format:

The World Wide Dishes website

Link to the website used during data collection: https://worldwidedishes.com/

The website includes our Data Protection Policy and FAQs developed to support contributors during the data collection process.

Running your own instance of the website:

Please refer to the README.md in the webapp directory for instructions on how to run your own instance of the website.

The World Wide Dishes Experiments

In addition to World Wide Dishes dataset, we present 30 dishes for 5 selected African countries + 30 dishes for the US as a baseline. An additional test suite was curated for regional parity.

Reproducing experiments

Setting up the Python environment

conda create -n wwd python=3.10
conda activate wwd
pip install -r requirements.txt

Create an .env file with settings

Create a .env file in the root directory of the repository with the following settings:

WWD_CSV_PATH=./data/WorldWideDishes_2024_June_World_Wide_Dishes.csv
WWD_30_DISHES_CSV_PATH=./data/WorldWideDishes_2024_June_Selected_Countries.csv

This points to the World Wide Dishes dataset and the 30 dishes selected for the African countries and the US.

Obtaining an OpenAI API key and Groq API key

If you want to conduct experiments that involve the use of OpenAI products such as GPT 3.5 (required for the LLM experiments), DALL-E 2 and DALL-E 3 (required for the dish image generation), please obtain the OpenAI API key from here and set it as an environment variable OPENAI_API_KEY by adding it to the .env file. (Make sure you don't commit this file to Git!)

While Llama 3 (8B) model and Llama 3 (70B) model can be run locally by first obtaining a licence through Huggingface from the links provided, running these models locally is computationally expensive and time-consuming.

Groq offers a fast and reliable API service for open-sourced LLMs, including Llama 3 models. As of June 2024, the Groq API is free to use. Please obtain the Groq API key from here and set it as an environment variable GROQ_API_KEY by adding it to the .env file.

LLM Experiments to evaluate common knowledge understanding

Please refer to the README.md in the llm_probing directory for instructions on how to run the experiments.

Code for generating images using the World Wide Dishes dataset

Please refer to the README.md in the gen_images directory for instructions on how to run the experiments.

CLIP Experiments to evaluate association of generated images with positive and negative descriptors

Please refer to the README.md in the clip_probing directory for instructions on how to run the experiments.

VQA Experiments to probe generated outputs for potential biases

Please refer to the README.md in the vqa directory for instructions on how to run the experiments.

Community Review of generated images

Due to the high degree of inaccurate and culturally insensitve imagery we will not be releasing the generated images for safety reasons. Our terms of use also prohibits the generation of images for trainign models using the World Wide Dishes dataset.

For transparency and insight into the review conducted, we are releasing the text responses only:

Contributor-submitted CC-licenced dish images

In the World Wide Dishes dataset, we have a column uploaded_image_name that contains paths to dish images that we have contributed and are CC-licenced. This is a subset of the images that were contributed to the data collection website. We only include those images that we were personally able to verify as being owned by the contributor. We have uploaded these images to a Google Drive folder for public access.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published