---
title: "30 Day Map Challenge 2024 - Day 14: A world map"
categories:
  - Project
tags:
  - 30daymapchallenge
  - python
  - programming
classes: wide
header:
  teaser: /assets/images/30daymapchallenge2024-day14.png
---

The theme for day 14 is _A world map_:
> Map the whole world. Whether it’s continents, ecosystems, or oceans, this is the day to map the entire planet.

### Data

Today, we will use data from the [_PANGAEA_ data warehouse](https://www.pangaea.de/).
I found that the search enginge of the EUDAT's b2find service is superior to that of PANGAEA.
Therefore, I will query for "Polarstern Master Track" there, scrape the results and use the links to the PANGAEA website to locate the actual dataset.
Using a python library, we can then download the datasets.


### Data Download

```shell
curl 'https://search-es-wmm-prod-4wpjacc7xjk4fixaluzofej5vq.eu-west-1.es.amazonaws.com/rankings/_search' --data-raw '{"from":"0","size":"1000000","sort":[{"_score":{"order":"desc"}},{"fastest_finish_time_secs":{"order":"asc"}}],"query":{"bool":{"must":[],"filter":[{"match":{"edition":"6"}},{"match":{"gender":"M"}}],"must_not":{"match":{"overall_ranking":0}}}}}' -H 'content-type: application/json' > results.json
```

## Implementation

Create a dataframe that holds the average income per inhabitant of the city Dresden (Germany). Each row contains the data for one district of Dresden.

In [6]:
import json
from pathlib import Path

import pandas as pd
from tqdm import tqdm

In [13]:
data_folder = Path(f"data/abbott/2024worldranking")
for sex in ("m", "f"):
    data_file_path = data_folder / f"results_{sex}.json"
    with open(data_file_path) as data_file:
        data = json.load(data_file)
    df = pd.concat([pd.json_normalize(s['_source']).loc[:, ["age_group", "gender", "nationality", "fastest_finish_time_secs"]] for s in tqdm(data["hits"]["hits"])])
    df.to_csv(data_file_path.with_suffix(".csv"), index=False)

100%|██████████| 321171/321171 [01:53<00:00, 2832.65it/s]
100%|██████████| 133007/133007 [00:47<00:00, 2810.74it/s]


In [16]:
df = pd.concat(
    [pd.read_csv(data_folder / f"results_{sex}.csv") for sex in ("m", "f")],
    axis=0
).reset_index(drop=True)

In [20]:
df.drop(["age_group"], axis="columns").groupby(["gender", "nationality"]).mean().reset_index()

Unnamed: 0,gender,nationality,fastest_finish_time_secs
0,F,AFG,16131.000000
1,F,ALB,17404.428571
2,F,ALG,17880.000000
3,F,AND,15325.250000
4,F,ANG,15657.333333
...,...,...,...
361,M,VEN,15452.579431
362,M,VIE,16566.221679
363,M,VIN,17933.666667
364,M,ZAM,15987.173913
