Geolocation model inspired by ideas presented in: PlaNet - Photo Geolocation with Convolutional Neural Networks (ECCV 2016), Tobias Weyand, Ilya Kostrikov, James Philbin https://research.google.com/pubs/pub45488.html
Data and Classes
Our data come from the geotagged images in the YFCC100M Multimedia Commons dataset.
Training, validation, and test images are split so that images uploaded by the same person do not appear in multiple sets.
Classes are created with the training data using Google's S2 Geometry Library
as described in the PlaNet paper above. The classes are defined in
grids.txt where the i-th line is the i-th class and the columns are:
S2 Cell Token, Latitude, Longitude.
Difference between our model and PlaNet:
|Dataset source||Multimedia Commons||Images crawled from the web|
|Training set||33.9 million||91 million|
|Validation||1.8 million||34 million|
|S2 Cell Partitioning||t_1=5000, t_2=500 ==> 15,527 cells||t_1=10,000, t_2=50 ==> 26,263 cells|
|Optimization||SGD with Momentum and LR Schedule||Adagrad|
|Training time||9 days on 16 NVIDIA K80 GPUs (p2.16xlarge), 12 epochs||2.5 months on 200 CPU cores|
|Test set||Placing Task 2016 Test Set (1.5 million Flickr images)||2.3 M geo-tagged Flickr images|
Im2GPS test set
The values indicate the percentages of images within test set that were correctly localized within the given distance.
Note that these result in the table are not directly comparable as the test set images used in PlaNet is not publicly released. The values indicate the percentages of images within test set that were correctly localized within the given distance.