Labeling other people

This repository contains code and data for the following paper:

@InProceedings{W18-6550,
  author = 	"van Miltenburg, Emiel
		and Elliott, Desmond
		and Vossen, Piek",
  title = 	"Talking about other people: an endless range of possibilities",
  booktitle = 	"Proceedings of the 11th International Conference on Natural Language Generation",
  year = 	"2018",
  publisher = 	"Association for Computational Linguistics",
  pages = 	"415--420",
  location = 	"Tilburg University, The Netherlands",
  url = 	"http://aclweb.org/anthology/W18-6550"
}

The exact code and data for this paper (this commit) is captured as a release.

Folder structure

There are three folders:

Flickr30K contains all code and data for the categorization of person labels in Flickr30k-Entities.
VisualGenome contains code and data for the categorization of attributes from Visual Genome.
Other contains some additional functions to compute relevant statistics.

General requirements

The code has been tested with the following software. Results shouldn't be different for other versions of Python or the NLTK, but this is untested.

Python 3.6.3
nltk 3.2.2

How to use the code

We'll take the Flickr30K data as an example. The general logic is as follows:

The resources folder contains all files with categories, stopwords, etc.
The grammar is generated by using python update_grammar.py. This script takes the resources and compiles a grammar to match the labels with the categories.
You can check the labels by using python check_labels.py. This script checks which labels are covered by the grammar. Labels that are covered are written to grammatical.txt. Ungrammatical labels are written to ungrammatical.txt. By reading the latter, we can identify (parts of) labels that should be categorized.
After adding (parts of) labels to the category files in the resources folder, run python update_grammar.py again.

Then there are two non-essential script files.

If you want to parse any labels, just import the analyze_label function from label_parser.py.
Run flickr_stats.py to get some statistics about the original data. Specifically: total number of unique labels classified as PEOPLE; size of the subset of those labels that end in boy, girl, male, female, woman, or man.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Flickr30k		Flickr30k
Other		Other
VisualGenome		VisualGenome
LICENSE		LICENSE
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Labeling other people

Folder structure

General requirements

How to use the code

About

Releases 1

Packages

Languages

License

evanmiltenburg/LabelingPeople

Folders and files

Latest commit

History

Repository files navigation

Labeling other people

Folder structure

General requirements

How to use the code

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages