GitHub - sdadas/vwsd: Code for SemEval 2023 Task 1: Visual Word Sense Disambiguation

Getting started

1. Download the VWSD task data

Download the trial, train and test sets from the task page. Place the data in the same directory, in subdirectories named trial_v1, train_v1, test_v1, respectively. In each subdirectory, create a folder with images for the specific subset named trial_images_v1, train_images_v1, or test_images_v1.

2. Prepare Wikipedia index

Wikipedia retrieval is handled by wiki-index application available in a separate repository: https://github.com/sdadas/wiki-index. Clone the repository, execute mvn package to build the app, and java -jar target/wiki-index.jar to run it. In order to build a new index, you need to download the appropriate Wikipedia dump in pages-articles format from the Wikimedia Downloads respository. Then, you can execute the vwsd/wikipedia.py script.

Instead of building an index from scratch, you can also download our pre-built indexes for English, Italian and Persian. Unzip the archive to the directory from which you run the Java program.

3. Download WIT dataset

Download WIT dataset from the official repository. You should download all *.tsv.gz files from the training, test and validation parts of the dataset, then unpack them to the directory of your choice.

4. Run the code for English

To generate predictions for the test dataset, execute the following command:

python run_model.py --wit_dir [path_to_wit_directory] --data_dir [path_to_vwsd_task_data] --data_split test --lang en

5. Download additional models for Italian and Persian

For languages other than English, the code uses additional models which need to be downloaded. First, download the fine-tuned CLIP text encoders for Italian and Persian, and then extract them in the project directory. Next, create a new directory named embeddings in the project root directory. Download the FastText models for Italian and Persian, and place them in the newly created directory.

6. Run the code for Italian or Persian

To run the code for a language other than English, execute the same command as in step 4, changing the lang parameter. For example:

python run_model.py --wit_dir [path_to_wit_directory] --data_dir [path_to_vwsd_task_data] --data_split test --lang it

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
vwsd		vwsd
.gitignore		.gitignore
README.md		README.md
ltr_model.json		ltr_model.json
requirements.txt		requirements.txt
run_model.py		run_model.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Getting started

1. Download the VWSD task data

2. Prepare Wikipedia index

3. Download WIT dataset

4. Run the code for English

5. Download additional models for Italian and Persian

6. Run the code for Italian or Persian

About

Releases

Packages

Languages

sdadas/vwsd

Folders and files

Latest commit

History

Repository files navigation

Getting started

1. Download the VWSD task data

2. Prepare Wikipedia index

3. Download WIT dataset

4. Run the code for English

5. Download additional models for Italian and Persian

6. Run the code for Italian or Persian

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages