# How to load the Kontur population dataset in QGIS on an old Laptop
The popular [Kontur Polulation Dataset](https://www.kontur.io/portfolio/population-dataset/) provides detailed (400 m resolution) population data for the whole World in a hexagonal grid. However, the dataset is huge. If I try to open the geopackage with QGIS on my old laptop, it looks like QGIS is slowly adding puzzle parts to a World map, taking hours before finally crashing.

It took me a while, but I found a solution to open a subset of the data covering my area of intesest. The key is the underlying [H3 system](https://h3geo.org/), a hexagonal hierarchical geospatial indexing system that is used by the data set. It allows us to select data of a certain region with simple SQL, without any (slow and resource intensive) spatial queries.

![Screenshot](kontur.png)

This notebook helps to formulate the SQL query that can be used in the QGIS database manager. It requires [H3-Pandas](https://github.com/DahnJ/H3-Pandas), a module that integrates H3 into Pandas and Geopandas (install with pip or conda).

In [4]:
import pandas as pd
import geopandas as gpd
import h3pandas

As a start, I use a point in the center of my area of interest.

In [5]:
from shapely.geometry import Point

gdf = gpd.GeoDataFrame({'name':['AOI'], 
                        'geometry':[Point(13.40449, 52.50021)]}, 
                        crs="EPSG:4326")

gdf

Unnamed: 0,name,geometry
0,AOI,POINT (13.40449 52.50021)


H3 systematically devides the World into a hierarchical system of hexagon grids of different resolutions and every hexagon gets an ID (an 64 bit integer usually represented in hexadecimal notation as string). Resolution 0 has an edge length of about 1280 km, resolution 8 (used by Kontur) of about 0.5 km (and the max resolution of 15 a lenght of 0.6 m).
[To get to the next higher resolution](https://h3geo.org/docs/highlights/indexing), 7 smaller hexagons are fitted as good as possible into every hexagon. 

The conversion between geographic location and cell ID is done with a fast hashing algorithm. We can use the [H3 library](https://uber.github.io/h3-py/intro.html) or [H3-Pandas](https://github.com/DahnJ/H3-Pandas) to get IDs for latitude/longitude values or to get the children/parents of a hexagon in the grids of higher/lower resolution. (Note: this can also be used to [aggregate data](https://h3-pandas.readthedocs.io/en/latest/notebook/00-intro.html) to a coarser resolution.) Even better: We can even 
get the relationship of children / parents by looking at the IDs, great for a simple SQL query.

Let's get the H3 cell of our point in resolution 0:


In [6]:
gdf.h3.geo_to_h3(0)

Unnamed: 0_level_0,name,geometry
h3_00,Unnamed: 1_level_1,Unnamed: 2_level_1
801ffffffffffff,AOI,POINT (13.40449 52.50021)


Note that we get the H3 cell as index of the data frame. 

We can do the same for different resolutions, save the hexagons of these cells to a file and open it in QGIS to decide which resolution gives the best coverage for our area of interest.

In [8]:
resolutions = [res for res in range(9)]
h3s = [gdf.h3.geo_to_h3(res).index[0] for res in resolutions]
df  = pd.DataFrame({'resolution': resolutions, 'h3': h3s})
gdf = df.set_index('h3').h3.h3_to_geo_boundary() # Adds geometry of the H3 hexagons
gdf


Unnamed: 0_level_0,resolution,geometry
h3,Unnamed: 1_level_1,Unnamed: 2_level_1
801ffffffffffff,0,"POLYGON ((5.52365 55.70677, 2.02657 45.18425, ..."
811f3ffffffffff,1,"POLYGON ((5.52365 55.70677, 6.25969 51.96477, ..."
821f1ffffffffff,2,"POLYGON ((11.24517 53.03204, 10.54962 51.56399..."
831f18fffffffff,3,"POLYGON ((12.19189 52.26095, 12.24402 51.69695..."
841f1d5ffffffff,4,"POLYGON ((13.13764 52.80725, 13.02989 52.59974..."
851f18b3fffffff,5,"POLYGON ((13.29015 52.45516, 13.29638 52.37491..."
861f18b27ffffff,6,"POLYGON ((13.35819 52.49388, 13.34272 52.46416..."
871f18b25ffffff,7,"POLYGON ((13.39356 52.49604, 13.39443 52.48459..."
881f18b259fffff,8,"POLYGON ((13.40329 52.50157, 13.40107 52.49732..."


In [9]:
gdf.to_file('hexagons.geojson', driver="GeoJSON")

You'll probably decide to use resolution 2 or 3. With smaller resolutions, the hexagon might cut off parts of our area of interest. 

In [13]:
use_res = 2

s = gdf.index[use_res]
s

'821f1ffffffffff'

Have a look at the ID values above: 
- Ignore the first digit
- The second digit refers to the resolution
- From the 3rd digit on, new digits are added at each level, but these values are always the same for all parents / children.
- The rest is filled with f.

Note that we have the data as geopackage and a geopackage is basically a sqlite database. We know the data uses H3 in resolution 8 and it turns out we can select all children of a certain cell with a simple SQL query such as:

`SELECT * FROM population WHERE h3 LIKE '881f1%';` 



To be precise, the string for the LIKE statement with our selected grandgrandparent hexagon would be:

In [18]:
'88' + s[2:].rstrip('f') + '%'

'881f1%'

Now, simply open the database manager in QGIS and: 
- connect to the Kontur population geopackage, 
- open the query window, 
- enter and execute the query (took 20 seconds on my laptop), 
- check "load as new layer", 
- click "load".

You might want to export the result, otherwise QGIS has to run the query again if you close and reopen the project.