## Problem 3: How many people live near shopping centers? (8 points)

In the last step of this analysis, use a *spatial join* to relate data from a population grid data set to the buffer layer created in *problem 2* to find out how many people live in all population grid cells that are **within** 1.5 km distance from each shopping centre. 

Use the same population grid data set as during [lesson 3](https://autogis-site.readthedocs.io/en/latest/lessons/lesson-3/spatial-join.html) (load it directly from WFS, don’t forget to assign a CRS).


*Feel free to divide your solution into more codeblocks than prepared! Remember to add comments to your code :)*

### a) Load the population grid data set and the buffer geometries

Use the same population grid data set as during [lesson 3](https://autogis-site.readthedocs.io/en/latest/lessons/lesson-3/spatial-join.html) (load it directly from WFS, don’t forget to assign a CRS). Load the data into a `GeoDataFrame` called `population_grid`.

(optional) If you want, discard unneeded columns and translate the remaining column names from Finnish to English.

In [1]:
import pathlib
import geopandas as gpd


import os
os.environ['USE_PYGEOS'] = '0'
import geopandas

In the next release, GeoPandas will switch to using Shapely by default, even if PyGEOS is installed. If you only have PyGEOS installed to get speed-ups, this switch should be smooth. However, if you are using PyGEOS directly (calling PyGEOS functions on geometries from GeoPandas), this will then stop working and you are encouraged to migrate from PyGEOS to Shapely 2.0 (https://shapely.readthedocs.io/en/latest/migration_pygeos.html).
  import geopandas as gpd


In [24]:
# ADD YOUR OWN CODE HERE
population_grid = gpd.read_file(
    (
        "https://kartta.hsy.fi/geoserver/wfs"
        "?service=wfs"
        "&version=2.0.0"
        "&request=GetFeature"
        "&typeName=asuminen_ja_maankaytto:Vaestotietoruudukko_2020"
        "&srsName=EPSG:3879"
    ),
)
population_grid.crs = crs="EPSG:3879"  # for WFS data, the CRS needs to be specified manually
population_grid = population_grid[["asukkaita", "geometry"]]
population_grid = population_grid.rename(columns={"asukkaita": "population"})
population_grid.head()

Unnamed: 0,population,geometry
0,5,"POLYGON ((25472499.995 6685998.998, 25472499.9..."
1,8,"POLYGON ((25472499.995 6684249.004, 25472499.9..."
2,5,"POLYGON ((25472499.995 6683999.005, 25472499.9..."
3,13,"POLYGON ((25472499.995 6682998.998, 25472499.9..."
4,5,"POLYGON ((25472749.993 6690249.003, 25472749.9..."


In [25]:
# NON-EDITABLE CODE CELL FOR TESTING YOUR SOLUTION
import geopandas
import pyproj

assert isinstance(population_grid, geopandas.GeoDataFrame)
assert population_grid.crs == pyproj.CRS("EPSG:3879")



Load the buffers computed in *problem 2* into a `GeoDataFrame` called `shopping_centre_buffers`. Add an `assert` statement to check whether the two data frames are in the same CRS.

In [26]:
# ADD YOUR OWN CODE HERE
NOTEBOOK_PATH = pathlib.Path().resolve()
file_path = NOTEBOOK_PATH / "shopping_centres.gpkg"

shopping_centre_buffers = gpd.read_file(file_path, layer="buffers")

In [27]:
# NON-EDITABLE CODE CELL FOR TESTING YOUR SOLUTION
assert isinstance(shopping_centre_buffers, geopandas.GeoDataFrame)
assert shopping_centre_buffers.geometry.geom_type.unique() == ["Polygon"]
assert shopping_centre_buffers.crs == pyproj.CRS("EPSG:3879")


---

### b) Carry out a *spatial join* between the `population_grid` and the `shopping_centre_buffers`

Join the shopping centre’s `id` column (and others, if you want) to the population grid data frame, for all population grid cells that are **within** the buffer area of each shopping centre. [Use a *join-type* that retains only rows from both input data frames for which the geometric predicate is true](https://geopandas.org/en/stable/gallery/spatial_joins.html#Types-of-spatial-joins). 


In [28]:
# ADD YOUR OWN CODE HERE
buffers_with_population_data = shopping_centre_buffers.sjoin(
    population_grid,
    how="inner",
    predicate="intersects"
)
buffers_with_population_data.head()

Unnamed: 0,id,name,addr,geometry,index_right,population
0,1,Itis,"Itäkatu 1-7, 00930 Helsinki","POLYGON ((25506080.930 6677684.921, 25506073.7...",5232,63
0,1,Itis,"Itäkatu 1-7, 00930 Helsinki","POLYGON ((25506080.930 6677684.921, 25506073.7...",5231,43
0,1,Itis,"Itäkatu 1-7, 00930 Helsinki","POLYGON ((25506080.930 6677684.921, 25506073.7...",5230,80
0,1,Itis,"Itäkatu 1-7, 00930 Helsinki","POLYGON ((25506080.930 6677684.921, 25506073.7...",5229,319
0,1,Itis,"Itäkatu 1-7, 00930 Helsinki","POLYGON ((25506080.930 6677684.921, 25506073.7...",5344,202



---

### c) Compute the population sum around shopping centres

Group the resulting (joint) data frame by shopping centre (`id` or `name`), and calculate the `sum()` of the population living inside the 1.5 km radius around them.

Print the results, for instance, in the form "12345 people live within 1.5 km from REDI".

In [38]:
# ADD YOUR OWN CODE HERE
result = buffers_with_population_data.groupby("name").population.sum()

for name, pop in result.items():
    print(f"{pop} people live within 1.5 km from {name}")

81056 people live within 1.5 km from Forum
35613 people live within 1.5 km from Iso-omena
30207 people live within 1.5 km from Itis
11449 people live within 1.5 km from Jumbo
45131 people live within 1.5 km from REDI
31279 people live within 1.5 km from Sello
43929 people live within 1.5 km from Tripla



---

### d) Reflection

Good job! You are almost done with this week’s exercise. Please quickly answer the following short questions:
    
- How challenging did you find problems 1-3 (on scale to 1-5), and why?
- What was easy?
- What was difficult?

Add your answers in a new *Markdown* cell below:

- How challenging did you find problems 1-3 (on scale to 1-5), and why?
    - Not very challenging. The challenge did not involve muche creativity.
- What was easy?
   - Everything
- What was difficult?
    - Nothing