MAPERT: Map Encoder Representations from Transformers

Introduction

PyTorch code for the SpLU-RoboNLP 2021 paper "Learning to Read Maps: Understanding Natural Language Instructions from Unseen Maps".

ROSMI

Fine-tuning

Download the ROSMI data from the official GitHub repo.

REPO ROOT | |-- data
| |-- rosmi

| |-- snap |-- src


Please also kindly contact us if anything is missing!




## Faster R-CNN Feature Extraction


We use the Faster R-CNN feature extractor demonstrated in ["Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering", CVPR 2018](https://arxiv.org/abs/1707.07998)
and its released code at [Bottom-Up-Attention github repo](https://github.com/peteanderson80/bottom-up-attention).







## Reference

```bibtex
@inproceedings{katsakioris-etal-2021-learning,
    title = "Learning to Read Maps: Understanding Natural Language Instructions from Unseen Maps",
    author = "Katsakioris, Miltiadis Marios  and
      Konstas, Ioannis  and
      Mignotte, Pierre Yves  and
      Hastie, Helen",
    booktitle = "Proceedings of Second International Combined Workshop on Spatial Language Understanding and Grounded Communication for Robotics",
    month = aug,
    year = "2021",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.splurobonlp-1.2",
    doi = "10.18653/v1/2021.splurobonlp-1.2",
    pages = "11--21",
    abstract = "Robust situated dialog requires the ability to process instructions based on spatial information, which may or may not be available. We propose a model, based on LXMERT, that can extract spatial information from text instructions and attend to landmarks on OpenStreetMap (OSM) referred to in a natural language instruction. Whilst, OSM is a valuable resource, as with any open-sourced data, there is noise and variation in the names referred to on the map, as well as, variation in natural language instructions, hence the need for data-driven methods over rule-based systems. This paper demonstrates that the gold GPS location can be accurately predicted from the natural language instruction and metadata with 72{\%} accuracy for previously seen maps and 64{\%} for unseen maps.",
}

License

Distributed under the Creative Commons 4.0 Attribution-ShareAlike license (CC4.0-BY-SA).

Acknowledgement

This research received funding by Seebyte, Datalab and MASTS-S. This work was also supported by the EPSRC funded ORCA Hub (EP/R026173/1, 2017-2021).

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
run		run
snap		snap
src		src
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MAPERT: Map Encoder Representations from Transformers

Introduction

ROSMI

Fine-tuning

License

Acknowledgement

About

Releases

Packages

Languages

License

marioskatsak/mapert

Folders and files

Latest commit

History

Repository files navigation

MAPERT: Map Encoder Representations from Transformers

Introduction

ROSMI

Fine-tuning

License

Acknowledgement

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages