# Comparing Data From Different Sources

In this set of exercises we will attempt overlaying data from two sources:

1. keybiotopes_habitatgroups_shapefiles:
    - These shapefiles are the result of the last set of exercises analysing sksNyckelBiotoper_shapefiles. Both shapefiles display the locations and attributes of the key biotopes (Nyckelbiotoper) in Sweden.
    - Key-biotopes are high conservation value forests mapped by The Swedish Forest Agency though field surveys. A database of approximately 67 000 areas, consisting of polygons. A polygon represents an area that was determined to have high conservation value.

2. rutor_shapefiles:
    - This folder contains shapefiles for a specific (square-shaped) area in Sweden, for which laser measurements and satellite data are available.
    - The laser measurements are made using Lidar (light detection and ranging), an optical remote-sensing technique that uses laser light to densely sample the surface of the earth, producing highly accurate x, y, z measurements.
    - Lidar measurements are stored in `.laz` files. These files exist separately, but their names are registered in rutor_shape_files.

The overall goal is to use laser data (referenced in rutor_shapefiles) to study and identify key-biotopes (registered in sksNyckelBiotoper_shapefiles). So in this set of exercises we will find the intersection of these two datasets and study it.

# Read The Data

- To Do:

1. Import necessary libraries and read both keybiotopes_habitatgroups_shapefiles and rutor_shapefiles.

# Explore and Filter

- To Do:
1. Explore the GeoDataFrames. What are the properties registered for the key-biotopes? What are the properties registered for the laser data?
        - Each key-biotope is identified by a code 'Beteckn'.
        - Each square-tile of laser data is identified by the coordinates of its south-west corner 'square'.

2. Can you figure out what columns of non-geometric data are not relevant? Drop them.

3. Before we can compare the geometries of these GeoDataFrames, we must make sure that the geometric objects are in the same coordinate system. (Use attributes `crs` attribute to read Coordinate Reference System)
        - We won't need this here, but in case the two CRS were different we could use module `pyproj` to convert one to another.

4. Laser measurements are provided for 2,5 km x 2,5 km square tiles. Check whether all of the tiles indeed have areas of 6.25 km$^2$.

## Plotting
- To Do:
1. Plot both geometries and then overlay them.
        - Use `GeoPandas` plot function

# Intersection

The analysis we aim to -potentially- do will focus on the areas for which we have both kinds of data; ground surveys and laser measurement. In other words, the intersection of the plots we drew previously.

- To Do:
1. Get the intersection of the two GeoDataFrames' geometries.
        - use `GeoPandas` overlay function.
2. Get a GeoDataFrame of laser tiles that **don't** have key-biotopes.
3. How many square-tiles include key-biotopes? (i.e. how many squares-tiles are in the intersection?)
4. How many key-biotopes (of the total) are not covered by laser data (outside of the intersection)? What is the percentage of their area to the total?
        - You can add a new column to each GeoDataFrame, called 'Area' for example, and fill it using the attribute `.Area` applied to 'geometry'

# Groupby

One important operation in `Pandas` (and `GeoPandas`) is `groupby`. It is used for splitting a dataset into groups, applying a function, and combining the results. We will employ it in the following:

- To Do:
1. For each square-tile of laser data, how much of its area is occupied by key-biotopes?
2. Differentiate key-biotopes by their type, how many types are included in a square-tile? How much area is occupied by each habitat type?

*In later sets of exercises we will load `.laz` files and analyse laser measurements stored within. We will search for patterns in the laser measurements corresponding to square-tiles which include key-biotopes -compared to tiles that have no key-biotopes.*