## Problem 3: How many people live near shopping centers? (10 points)

In the last step of this analysis, use a *spatial join* to relate data from a population grid data set to the buffer layer created in *problem 2* to find out how many people live in all population grid cells that are **within** 1.5 km distance from each shopping centre. 

Use the same population grid data set as during [lesson 3](https://autogis-site.readthedocs.io/en/latest/lessons/lesson-3/spatial-join.html) (load it directly from WFS, don’t forget to assign a CRS).


*Feel free to divide your solution into more codeblocks than prepared! Remember to add comments to your code :)*

### a) Load the population grid data set and the buffer geometries (2 points)

Use the same population grid data set as during [lesson 3](https://autogis-site.readthedocs.io/en/latest/lessons/lesson-3/spatial-join.html) (load it directly from WFS, don’t forget to assign a CRS). Load the data into a `GeoDataFrame` called `population_grid`.

(optional) If you want, discard unneeded columns and translate the remaining column names from Finnish to English.

In [1]:
import pathlib
NOTEBOOK_PATH = pathlib.Path().resolve()
DATA_DIRECTORY = NOTEBOOK_PATH / "data"

In [6]:
# ADD YOUR OWN CODE HERE
import geopandas as gpd
population_grid = gpd.read_file(
    (
        "https://kartta.hsy.fi/geoserver/wfs"
        "?service=wfs"
        "&version=2.0.0"
        "&request=GetFeature"
        "&typeName=asuminen_ja_maankaytto:Vaestotietoruudukko_2020"
        "&srsName=EPSG:3879"
    ),
)
population_grid.crs = 'EPSG:3879'

In [7]:
# NON-EDITABLE CODE CELL FOR TESTING YOUR SOLUTION
import geopandas
import pyproj

assert isinstance(population_grid, geopandas.GeoDataFrame)
assert population_grid.crs == pyproj.CRS("EPSG:3879")



Load the buffers computed in *problem 2* into a `GeoDataFrame` called `shopping_centre_buffers`. Add an `assert` statement to check whether the two data frames are in the same CRS.

In [15]:
# ADD YOUR OWN CODE HERE
shopping_centre_buffers = gpd.read_file(DATA_DIRECTORY/'shopping_centres.gpkg')
assert shopping_centre_buffers.crs == population_grid.crs

In [16]:
# NON-EDITABLE CODE CELL FOR TESTING YOUR SOLUTION
assert isinstance(shopping_centre_buffers, geopandas.GeoDataFrame)
assert shopping_centre_buffers.geometry.geom_type.unique() == ["Polygon"]
assert shopping_centre_buffers.crs == pyproj.CRS("EPSG:3879")


---

### b) Carry out a *spatial join* between the `population_grid` and the `shopping_centre_buffers`  (4 points)

Join the shopping centre’s `id` column (and others, if you want) to the population grid data frame, for all population grid cells that are **within** the buffer area of each shopping centre. [Use a *join-type* that retains only rows from both input data frames for which the geometric predicate is true](https://geopandas.org/en/stable/gallery/spatial_joins.html#Types-of-spatial-joins). 


In [24]:
# ADD YOUR OWN CODE HERE
population_grid_within_shopping_centers = shopping_centre_buffers.sjoin(
   population_grid,
    how = "inner",
    predicate = "within"
)
population_grid_within_shopping_centers.head()
population_grid_within_shopping_centers = population_grid_within_shopping_centers[["asukkaita", "geometry","id","name"]]
population_grid_within_shopping_centers = population_grid_within_shopping_centers.rename(columns={"asukkaita": "population", "name":"shopping_center", "id":"shopping_center_id"})
population_grid_within_shopping_centers.head()

Unnamed: 0,population,geometry,shopping_center_id,shopping_center
1,133,"POLYGON ((25496557.313 6672875.135, 25496557.3...",2,Forum
2,501,"POLYGON ((25485471.935 6672070.961, 25485471.9...",3,Iso-omena
3,924,"POLYGON ((25489378.845 6678410.981, 25489378.8...",4,Sello
5,647,"POLYGON ((25498837.031 6674981.435, 25498837.0...",6,REDI
6,534,"POLYGON ((25496135.293 6676176.74, 25496135.28...",7,Tripla



---

### c) Compute the population sum around shopping centres (4 points)

Group the resulting (joint) data frame by shopping centre (`id` or `name`), and calculate the `sum()` of the population living inside the 1.5 km radius around them.

Print the results, for instance, in the form "12345 people live within 1.5 km from REDI".

In [None]:
# ADD YOUR OWN CODE HERE


In [26]:
population_grid_within_shopping_centers.groupby("shopping_center")["population"].sum()

shopping_center
Forum        133
Iso-omena    501
REDI         647
Sello        924
Tripla       534
Name: population, dtype: int64


---

### d) Reflection

Good job! You are almost done with this week’s exercise. Please quickly answer the following short questions:
    
- How challenging did you find problems 1-3 (on scale to 1-5), and why?
- What was easy?
- What was difficult?

Add your answers in a new *Markdown* cell below: