Skip to content


Switch branches/tags

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time


Python 3.6 Language grade: Python Total alerts License: MIT


[Paper] [Slide] [Explore Drone-view Data] [Explore Satellite-view Data] [Explore Street-view Data] [Video Sample] [中文介绍]

Download [University-1652] upon request. You may use the request template.

This repository contains the dataset link and the code for our paper University-1652: A Multi-view Multi-source Benchmark for Drone-based Geo-localization, ACM Multimedia 2020. The offical paper link is at We collect 1652 buildings of 72 universities around the world. Thank you for your kindly attention.

Task 1: Drone-view target localization. (Drone -> Satellite) Given one drone-view image or video, the task aims to find the most similar satellite-view image to localize the target building in the satellite view.

Task 2: Drone navigation. (Satellite -> Drone) Given one satellite-view image, the drone intends to find the most relevant place (drone-view images) that it has passed by. According to its flight history, the drone could be navigated back to the target place.

Table of contents

About Dataset

The dataset split is as follows:

Split #imgs #buildings #universities
Training 50,218 701 33
Query_drone 37,855 701 39
Query_satellite 701 701 39
Query_ground 2,579 701 39
Gallery_drone 51,355 951 39
Gallery_satellite 951 951 39
Gallery_ground 2,921 793 39

More detailed file structure:

├── University-1652/
│   ├── readme.txt
│   ├── train/
│       ├── drone/                   /* drone-view training images 
│           ├── 0001
|           ├── 0002
|           ...
│       ├── street/                  /* street-view training images 
│       ├── satellite/               /* satellite-view training images       
│       ├── google/                  /* noisy street-view training images (collected from Google Image)
│   ├── test/
│       ├── query_drone/  
│       ├── gallery_drone/  
│       ├── query_street/  
│       ├── gallery_street/ 
│       ├── query_satellite/  
│       ├── gallery_satellite/ 
│       ├── 4K_drone/

We note that there are no overlaps between 33 univeristies of training set and 39 univeristies of test set.


1 Dec 2021 Fix the issue due to the latest torchvision, which do not allow the empty subfolder. Note that some buildings do not have google images.

3 March 2021 GeM Pooling is added. You may use it by --pool gem.

21 January 2021 The GPU-Re-Ranking, a GNN-based real-time post-processing code, is at Here.

21 August 2020 The transfer learning code for Oxford and Paris is at Here.

27 July 2020 The meta data of 1652 buildings, such as latitude and longitude, are now available at Google Driver. (You could use Google Earth Pro to open the kml file or use vim to check the value).
We also provide the spiral flight tour file at Google Driver. (You could open the kml file via Google Earth Pro to enable the flight camera).

26 July 2020 The paper is accepted by ACM Multimedia 2020.

12 July 2020 I made the baseline of triplet loss (with soft margin) on University-1652 public available at Here.

12 March 2020 I add the state-of-the-art page for geo-localization and tutorial, which will be updated soon.

Code Features

Now we have supported:

  • Float16 to save GPU memory based on apex
  • Multiple Query Evaluation
  • Re-Ranking
  • Random Erasing
  • ResNet/VGG-16
  • Visualize Training Curves
  • Visualize Ranking Result
  • Linear Warm-up


  • Python 3.6
  • GPU Memory >= 8G
  • Numpy > 1.12.1
  • Pytorch 0.3+
  • [Optional] apex (for float16)

Getting started


git clone
cd vision
python install
  • [Optinal] You may skip it. Install apex from the source
git clone
cd apex
python install --cuda_ext --cpp_ext

Dataset & Preparation

Download [University-1652] upon request. You may use the request template.

Or download CVUSA / CVACT.

For CVUSA, I follow the training/test split in (

Train & Evaluation

Train & Evaluation University-1652

python --name three_view_long_share_d0.75_256_s1_google  --extra --views 3  --droprate 0.75  --share  --stride 1 --h 256  --w 256 --fp16; 
python --name three_view_long_share_d0.75_256_s1_google

Default setting: Drone -> Satellite If you want to try other evaluation setting, you may change these lines at:

Ablation Study only Satellite & Drone

python --name two_view_long_no_street_share_d0.75_256_s1  --share --views 3  --droprate 0.75  --stride 1 --h 256  --w 256  --fp16; 
python --name two_view_long_no_street_share_d0.75_256_s1

Set three views but set the weight of loss on street images to zero.

Train & Evaluation CVUSA

python --name usa_vgg_noshare_warm5_lr2 --warm 5 --lr 0.02 --use_vgg16 --h 256 --w 256  --fp16 --batchsize 16;
python  --name usa_vgg_noshare_warm5_lr2 

Trained Model

You could download the trained model at GoogleDrive or OneDrive. After download, please put model folders under ./model/.


The following paper uses and reports the result of the baseline model. You may cite it in your paper.

  title={University-1652: A Multi-view Multi-source Benchmark for Drone-based Geo-localization},
  author={Zheng, Zhedong and Wei, Yunchao and Yang, Yi},
  journal={ACM Multimedia},

Instance loss is defined in

  title={Dual-Path Convolutional Image-Text Embeddings with Instance Loss},
  author={Zheng, Zhedong and Zheng, Liang and Garrett, Michael and Yang, Yi and Xu, Mingliang and Shen, Yi-Dong},
  journal={ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM)},
  publisher={ACM New York, NY, USA}

Related Work

  • Instance Loss Code
  • Lending Orientation to Neural Networks for Cross-view Geo-localization Code
  • Predicting Ground-Level Scene Layout from Aerial Imagery Code


ACM Multimedia2020 University-1652: A Multi-view Multi-source Benchmark for Drone-based Geo-localization 🚁 annotates 1652 buildings in 72 universities around the world.







Sponsor this project


No packages published