# Generate Land Use with CLC

## Objective
# ToDo:
why did we create this notebook?
why landuse?

## Scope
In this file we process a bird dataframe and merge it with the Corine Land Cover Dataframe to get the land use on the location of a bird sighting.
With the functions given in this file it is possible to get the land use on the coordinate, to get the most common land use in a square or to get all land uses per percentage in a square around the coordinate.

The CLC was used because the LUCAS dataframe was used first and gave a much less detailed result. The LUCAS dataframe has points on a map which describe the land use whereas the CLC dataframe has polygons all over the map. 

To get the land use with LUCAS the nearest point to the coordinate has to be selected. To do the same with CLC (altough much more precises) the coordinates of the bird dataframe are checked whether they are in any polygon. Then the land use of this polygon is used.

To get the most common land use within a square we create squares around each coordinate. Then each square is checked for the landuses within. Then the most common land uses gets returned.

The probably best solution to get the land use is to get the percentage of landuses in each square. This can be done by checking how much of each polygon is in each square. This is really compute heavy and this is also why there might be a better option:

To accomplish this another aspect of the CLC data is used. The CLC data also has points all over the map which can be used to get the land use. While those points also could be inaccurate if looking for the land use on a coordinate, those specific points inside of the square could be counted and calculated how much of each land use is in each square.


In [3]:
import geopandas as gpd
from shapely.geometry import Point, box
import pandas as pd
import math

In [4]:
pd.set_option('display.max_columns', None)
pd.set_option('display.expand_frame_repr', False)

In [4]:
df_path = 'D:\Simon\Documents\GP\data\datasets\selected_bird_species_with_grids_50km.csv'
df = pd.read_csv(df_path, index_col=0)

  df = pd.read_csv(df_path, index_col=0)


In [5]:
df.head(2)

Unnamed: 0,id_sighting,id_species,name_species,date,timing,coord_lat,coord_lon,precision,altitude,total_count,atlas_code,id_observer,country,eea_grid_id
0,29666972,8.0,Haubentaucher,2018-01-01,,53.15776,8.676993,place,-1.05101,0.0,0,37718.0,de,50kmE4200N3300
1,29654244,397.0,Schwarzkehlchen,2018-01-01,,53.127639,8.957263,square,0.760781,2.0,0,37803.0,de,50kmE4250N3300


In [6]:
clc_path = 'D:\\Simon\\Documents\GP\data\\util_files\\CLC2018\\U2018_CLC2018_V2020_20u1.shp'
clc = gpd.read_file(clc_path)

In [None]:
clc['Code_18'].fillna(999, inplace=True)
clc['Code_18'] = clc['Code_18'].astype(int)
clc.drop(columns=['OBJECTID', 'Remark','Area_Ha','ID','Shape_Leng','Shape_Area'], axis=1, inplace=True)

In [12]:
clc.head(2)

Unnamed: 0,Code_18,geometry
0,111,"POLYGON ((4.65404 43.80421, 4.65492 43.80702, ..."
1,111,"POLYGON ((4.64857 43.80864, 4.64914 43.80790, ..."


### Get landuse on coordinate

In [14]:
geometry = [Point(lon, lat) for lon, lat in zip(df['coord_lon'], df['coord_lat'])]
gdf = gpd.GeoDataFrame(df, geometry=geometry, crs="EPSG:4326")

merged_gdf = gpd.sjoin(gdf, clc, how="left", predicate="within")

merged_gdf.drop(columns=['geometry','index_right'], axis=1, inplace=True)

### Get most common landuse within 1km square around coord

In [19]:
# Convert 1km to degrees
km_to_degrees = 1 / 111.0

# Define the size of the square in degrees
square_size_horizontal = 1 * km_to_degrees
square_size_vertical = square_size_horizontal * 1.6667  # To make it a square

# Create a GeoDataFrame with square polygons around each point
geometry = [Point(lon, lat) for lon, lat in zip(df['coord_lon'], df['coord_lat'])]
gdf = gpd.GeoDataFrame(df, geometry=geometry, crs="EPSG:4326")

# Create squares around each point
squares_gdf = gdf.copy()
squares_gdf['geometry'] = gdf['geometry'].apply(lambda point: box(
    point.x - square_size_vertical/2, 
    point.y - square_size_horizontal/2,
    point.x + square_size_vertical/2, 
    point.y + square_size_horizontal/2
))

In [20]:
squares_gdf.head(2)

Unnamed: 0,id_sighting,id_species,name_species,date,timing,coord_lat,coord_lon,precision,altitude,total_count,atlas_code,id_observer,country,eea_grid_id,geometry
0,29666972,8.0,Haubentaucher,2018-01-01,,53.15776,8.676993,place,-1.05101,0.0,0,37718.0,de,50kmE4200N3300,"POLYGON ((8.68450 53.15326, 8.68450 53.16226, ..."
1,29654244,397.0,Schwarzkehlchen,2018-01-01,,53.127639,8.957263,square,0.760781,2.0,0,37803.0,de,50kmE4250N3300,"POLYGON ((8.96477 53.12313, 8.96477 53.13214, ..."


In [21]:
# Spatially join the squares with the land use data
merged_squares = gpd.sjoin(squares_gdf, clc, how="left", predicate="intersects")

In [24]:
merged_squares.head(3)

Unnamed: 0,id_sighting,id_species,name_species,date,timing,coord_lat,coord_lon,precision,altitude,total_count,atlas_code,id_observer,country,eea_grid_id,geometry,index_right,Code_18
0,29666972,8.0,Haubentaucher,2018-01-01,,53.15776,8.676993,place,-1.05101,0.0,0,37718.0,de,50kmE4200N3300,"POLYGON ((8.68450 53.15326, 8.68450 53.16226, ...",519790.0,231.0
0,29666972,8.0,Haubentaucher,2018-01-01,,53.15776,8.676993,place,-1.05101,0.0,0,37718.0,de,50kmE4200N3300,"POLYGON ((8.68450 53.15326, 8.68450 53.16226, ...",507381.0,141.0
0,29666972,8.0,Haubentaucher,2018-01-01,,53.15776,8.676993,place,-1.05101,0.0,0,37718.0,de,50kmE4200N3300,"POLYGON ((8.68450 53.15326, 8.68450 53.16226, ...",538281.0,324.0


In [38]:
merged_squares.reset_index(drop=True, inplace=True)

In [23]:
# write
df_path = 'D:\Simon\Documents\GP\data\datasets\merged_squares.csv'
merged_squares.to_csv(df_path)

In [None]:
# read
df_path = 'D:\Simon\Documents\GP\data\datasets\merged_squares.csv'
merged_squares = pd.read_csv(df_path, index_col=0)

In [26]:
# Use value_counts to find the most common land use category in each square
common_land_use = merged_squares.groupby(merged_squares.index)['Code_18'].agg(lambda x: x.value_counts().idxmax() if not x.empty and not x.value_counts().empty else None)

# Create a new DataFrame with the common land use and the corresponding index
common_land_use_df = pd.DataFrame({'Code_18_sq': common_land_use.values}, index=common_land_use.index)

In [None]:
common_land_use_df.head(3)

Unnamed: 0,num_land_use_sq
0,231.0
1,231.0
2,211.0


In [27]:
# Merge the common land use DataFrame with the original DataFrame based on the index
merged_gdf = merged_squares.merge(common_land_use_df, left_index=True, right_index=True)
merged_gdf.drop(columns=['geometry'], axis=1, inplace=True)

In [28]:
merged_gdf.head(3)

Unnamed: 0,id_sighting,id_species,name_species,date,timing,coord_lat,coord_lon,precision,altitude,total_count,atlas_code,id_observer,country,eea_grid_id,index_right,Code_18,Code_18_sq
0,29666972,8.0,Haubentaucher,2018-01-01,,53.15776,8.676993,place,-1.05101,0.0,0,37718.0,de,50kmE4200N3300,519790.0,231.0,231.0
0,29666972,8.0,Haubentaucher,2018-01-01,,53.15776,8.676993,place,-1.05101,0.0,0,37718.0,de,50kmE4200N3300,507381.0,141.0,231.0
0,29666972,8.0,Haubentaucher,2018-01-01,,53.15776,8.676993,place,-1.05101,0.0,0,37718.0,de,50kmE4200N3300,538281.0,324.0,231.0


In [30]:
clc_code_to_numerical_label = {
    111: 1,  # Urban
    112: 1,  # Urban
    121: 1,  # Urban
    122: 1,  # Urban
    123: 1,  # Urban
    124: 1,  # Urban
    131: 2,  # Industrial
    132: 2,  # Industrial
    133: 2,  # Industrial
    141: 1,  # Urban
    142: 1,  # Urban
    211: 3,  # Agriculture
    212: 3,  # Agriculture
    213: 3,  # Agriculture
    221: 3,  # Agriculture
    222: 3,  # Agriculture
    223: 3,  # Agriculture
    231: 3,  # Agriculture
    241: 3,  # Agriculture
    242: 3,  # Agriculture
    243: 3,  # Agriculture
    244: 3,  # Agriculture
    311: 4,  # Forest
    312: 4,  # Forest
    313: 4,  # Forest
    321: 5,  # Grassland
    322: 5,  # Grassland
    323: 5,  # Grassland
    324: 5,  # Grassland
    331: 6,  # Water
    332: 6,  # Water
    333: 6,  # Water
    334: 6,  # Water
    335: 6,  # Water
    411: 6,  # Water
    412: 6,  # Water
    421: 6,  # Water
    422: 6,  # Water
    423: 6,  # Water
    511: 6,  # Water
    512: 6,  # Water
    521: 6,  # Water
    522: 6,  # Water
    523: 6,  # Water
    990: 7,  # UNCLASSIFIED LAND SURFACE
    995: 6,  # UNCLASSIFIED WATER BODIES
    999: 8   # NODATA
}

merged_gdf['num_land_use_coord'] = merged_gdf['Code_18'].map(clc_code_to_numerical_label)
merged_gdf['num_land_use_sq'] = merged_gdf['Code_18_sq'].map(clc_code_to_numerical_label)

In [31]:
numerical_label_to_description = {
    1: 'Urban',
    2: 'Industrial',
    3: 'Agriculture',
    4: 'Forest',
    5: 'Grassland',
    6: 'Water',
    7: 'NODATA',
    8: 'UNCLASSIFIED LAND SURFACE'
}

merged_gdf['land_use_coord'] = merged_gdf['num_land_use_coord'].map(numerical_label_to_description)
merged_gdf['land_use_sq'] = merged_gdf['num_land_use_sq'].map(numerical_label_to_description)

In [41]:
merged_gdf.head(3)

Unnamed: 0,id_sighting,id_species,name_species,date,timing,coord_lat,coord_lon,precision,altitude,total_count,atlas_code,id_observer,country,eea_grid_id,index_right,num_land_use_coord,num_land_use_sq,land_use_coord,land_use_sq
0,29666972,8.0,Haubentaucher,2018-01-01,,53.15776,8.676993,place,-1.05101,0.0,0,37718.0,de,50kmE4200N3300,519790.0,3.0,3.0,Agriculture,Agriculture
1,29666972,8.0,Haubentaucher,2018-01-01,,53.15776,8.676993,place,-1.05101,0.0,0,37718.0,de,50kmE4200N3300,507381.0,1.0,3.0,Urban,Agriculture
2,29666972,8.0,Haubentaucher,2018-01-01,,53.15776,8.676993,place,-1.05101,0.0,0,37718.0,de,50kmE4200N3300,538281.0,5.0,3.0,Grassland,Agriculture


In [33]:
merged_gdf['atlas_code'].fillna(0, inplace=True)

In [34]:
merged_gdf.drop(columns=['Code_18', 'Code_18_sq'], axis=1, inplace=True)

In [42]:
df_path = 'D:\Simon\Documents\GP\data\datasets\selected_species_50km_luse.csv'
merged_gdf.to_csv(df_path)