# Google Landmark Dataset

## Dataset

**Data Source**: https://github.com/cvdfoundation/google-landmark

Data is downloaded as follows:

```bash
git clone https://github.com/cvdfoundation/google-landmark

curl -LO https://s3.amazonaws.com/google-landmark/metadata/train.csv
curl -LO https://s3.amazonaws.com/google-landmark/metadata/train_clean.csv
curl -LO https://s3.amazonaws.com/google-landmark/metadata/train_attribution.csv
curl -LO https://s3.amazonaws.com/google-landmark/metadata/train_label_to_category.csv
curl -LO https://s3.amazonaws.com/google-landmark/index/images_000.tar
```

**Code**: https://github.com/psinger/kaggle-landmark-recognition-2020-1st-place

In [1]:
!ls -lh

total 1.5G
drwxr-xr-x 5 studio-lab-user users   33 Feb  7 12:34 0
drwxr-xr-x 3 studio-lab-user users   62 Feb  7 12:30 google-landmark
-rw-r--r-- 1 studio-lab-user users  29K Feb  7 12:56 landmark.ipynb
-rw-r--r-- 1 studio-lab-user users 502M Feb  7 12:28 train.csv
-rw-r--r-- 1 studio-lab-user users 965M Feb  7 12:29 train_attribution.csv
-rw-r--r-- 1 studio-lab-user users  27M Feb  7 12:28 train_clean.csv
-rw-r--r-- 1 studio-lab-user users  15M Feb  7 12:29 train_label_to_category.csv


In [2]:
import pandas as pd

max_rows = 100

In [3]:
df = pd.read_csv('train.csv', nrows=max_rows, index_col=0)
df.head().T

id,6e158a47eb2ca3f6,202cd79556f30760,3ad87684c99c06e1,e7f70e9c61e66af3,4072182eddd0100e
url,https://upload.wikimedia.org/wikipedia/commons...,http://upload.wikimedia.org/wikipedia/commons/...,http://upload.wikimedia.org/wikipedia/commons/...,https://upload.wikimedia.org/wikipedia/commons...,https://upload.wikimedia.org/wikipedia/commons...
landmark_id,142820,104169,37914,102140,2474


In [4]:
df_clean = pd.read_csv('train_clean.csv', nrows=max_rows, index_col=0)
df_clean.head().T

landmark_id,1,7,9,11,12
images,17660ef415d37059 92b6290d571448f6 cd41bf948edc...,25c9dfc7ea69838d 28b13f94a6f1f3c1 307d6584f473...,0193b65bb58d2c77 1a30a51a287ecf69 1f4e8ab1f1b2...,1a6cb1deed46bb17 1cc2c8fbc83e1a0c 2361b8da868c...,0a199c97c382b1ff 1492a5d344495391 290097bd36a6...


In [5]:
df_clean['images'].str.split(pat=' ', expand=True)

Unnamed: 0_level_0,0,1,2,3,4,5,6,7,8,9,...,494,495,496,497,498,499,500,501,502,503
landmark_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,17660ef415d37059,92b6290d571448f6,cd41bf948edc0340,fb09f1e98c6d2f70,,,,,,,...,,,,,,,,,,
7,25c9dfc7ea69838d,28b13f94a6f1f3c1,307d6584f473ba35,4a7ba9eb16d51bc4,597353dfbb3df649,a40d00dc4fcc3a10,aff1d42de18d9efe,c87bbcbf35a41875,,,...,,,,,,,,,,
9,0193b65bb58d2c77,1a30a51a287ecf69,1f4e8ab1f1b2321c,28267d88d4d9ea30,294c5690ad39a48e,52ac5040369fc460,5f849ade1b4fbcb5,86463b5e23adde46,899f66ffe9ba3559,904efd09f3536f0e,...,,,,,,,,,,
11,1a6cb1deed46bb17,1cc2c8fbc83e1a0c,2361b8da868c9113,32652480a7d99c5e,34533ce2fb47a64f,3c79cb8374f8ec83,49c20b7fcf95c10d,6ad926b79d48e39d,6ce47c7c47dd8531,73e5aa8fb1eac238,...,,,,,,,,,,
12,0a199c97c382b1ff,1492a5d344495391,290097bd36a6b01d,2b87d221476447d2,2d685b1280ba366b,30a8e693c1dae116,346204851c3234f5,39ae9ce73feeaa81,4ea6aed2ce0b2164,57175747c275757e,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
250,17bc9b3848b9481d,2e9bc2c3cc57b63a,3bbb942ebc86444b,42de117c87fd44c6,6932896210b58bc6,7aff0e7f3e176d55,,,,,...,,,,,,,,,,
259,04334376e14f8ae5,09ec292aff9e60e7,2e635f695638bcb2,3127bb70da2c9866,394922dfeb821301,4572aceb7dbd7827,4d2d79153425846f,55be3ed65c9fa6c7,8a4d1d46cacb8fd7,9181bea6fd575f5c,...,,,,,,,,,,
260,14caf71e8e93ef77,30762db798d9dcd6,378a036747aac9a3,61615286df341578,69e2a3539558c6ec,6b519141fd92c261,7b1351cb598fb189,b76db5cacef05d65,ba396b41fb056043,bd85e9cc72634597,...,,,,,,,,,,
262,1272ef793ebba7a2,172999b2cc578a66,3f509df1d66997a0,7c9eb6d53e98e77e,8327ec1243899c11,9981810cd64b2e5b,9a27c2f18d737148,a263751cf3b9d364,b0cca170edffde0b,b897ea4300767588,...,,,,,,,,,,
