# Generate Land Use
## Objective
ToDo:

why did we create this notebook? why landuse?

## Scope

In this file we process a bird dataframe and merge it with the Corine Land Cover Dataframe to get the land use on the location of a bird sighting. With the functions given in this file it is possible to get the land use on the coordinate, to get the most common land use in a square or to get all land uses per percentage in a square around the coordinate.

The CLC was used because the LUCAS dataframe was used first and gave a much less detailed result. The LUCAS dataframe has points on a map which describe the land use whereas the CLC dataframe has polygons all over the map.

To get the land use with LUCAS the nearest point to the coordinate has to be selected. To do the same with CLC (altough much more precises) the coordinates of the bird dataframe are checked whether they are in any polygon. Then the land use of this polygon is used.

To get the most common land use within a square we create squares around each coordinate. Then each square is checked for the landuses within. Then the most common land uses gets returned.

The probably best solution to get the land use is to get the percentage of landuses in each square. This can be done by checking how much of each polygon is in each square. This is really compute heavy and this is also why there might be a better option:

To accomplish this another aspect of the CLC data is used. The CLC data also has points all over the map which can be used to get the land use. While those points also could be inaccurate if looking for the land use on a coordinate, those specific points inside of the square could be counted and calculated how much of each land use is in each square.

In [3]:
import geopandas as gpd
from shapely.geometry import Point, box
import pandas as pd
import math

In [4]:
pd.set_option('display.max_columns', None)
pd.set_option('display.expand_frame_repr', False)

In [4]:
df_path = 'D:\Simon\Documents\GP\data\datasets\selected_bird_species_with_grids_50km.csv'
df = pd.read_csv(df_path, index_col=0)

  df = pd.read_csv(df_path, index_col=0)


In [5]:
df.head(2)

Unnamed: 0,id_sighting,id_species,name_species,date,timing,coord_lat,coord_lon,precision,altitude,total_count,atlas_code,id_observer,country,eea_grid_id
0,29666972,8.0,Haubentaucher,2018-01-01,,53.15776,8.676993,place,-1.05101,0.0,0,37718.0,de,50kmE4200N3300
1,29654244,397.0,Schwarzkehlchen,2018-01-01,,53.127639,8.957263,square,0.760781,2.0,0,37803.0,de,50kmE4250N3300


In [6]:
clc_path = 'D:\\Simon\\Documents\GP\data\\util_files\\CLC2018\\U2018_CLC2018_V2020_20u1.shp'
clc = gpd.read_file(clc_path)


In [7]:
clc.crs

<Geographic 2D CRS: EPSG:4326>
Name: WGS 84
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- name: World.
- bounds: (-180.0, -90.0, 180.0, 90.0)
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich

In [8]:
clc['Code_18'].fillna(999, inplace=True)
clc['Code_18'] = clc['Code_18'].astype(int)
clc.drop(columns=['OBJECTID', 'Remark','Area_Ha','ID','Shape_Leng','Shape_Area'], axis=1, inplace=True)

In [9]:
clc.head(2)

Unnamed: 0,Code_18,geometry
0,111,"POLYGON ((4.65404 43.80421, 4.65492 43.80702, ..."
1,111,"POLYGON ((4.64857 43.80864, 4.64914 43.80790, ..."


### Create squares around each coordinate
This is to determine the most common land use and also the land use in percentage.

In [7]:
# Convert 1km to degrees
km_to_degrees = 1 / 111.0

# Define the size of the square in degrees
square_size_horizontal = 1 * km_to_degrees
square_size_vertical = square_size_horizontal * 1.6667  # To make it a square

# Create a GeoDataFrame with square polygons around each point
geometry = [Point(lon, lat) for lon, lat in zip(df['coord_lon'], df['coord_lat'])]
gdf = gpd.GeoDataFrame(df, geometry=geometry, crs="EPSG:4326")

# Create squares around each point
squares_gdf = gdf.copy()
squares_gdf['geometry'] = gdf['geometry'].apply(lambda point: box(
    point.x - square_size_vertical/2, 
    point.y - square_size_horizontal/2,
    point.x + square_size_vertical/2, 
    point.y + square_size_horizontal/2
))

In [8]:
squares_gdf.head(1)

Unnamed: 0,id_sighting,id_species,name_species,date,timing,coord_lat,coord_lon,precision,altitude,total_count,atlas_code,id_observer,country,eea_grid_id,geometry
0,29666972,8.0,Haubentaucher,2018-01-01,,53.15776,8.676993,place,-1.05101,0.0,0,37718.0,de,50kmE4200N3300,"POLYGON ((8.68450 53.15326, 8.68450 53.16226, ..."


In [93]:
squares_gdf.crs

<Geographic 2D CRS: EPSG:4326>
Name: WGS 84
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- name: World.
- bounds: (-180.0, -90.0, 180.0, 90.0)
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich

## Get land use percentage v2

In [48]:
# execution time: 215m
crs_projected = "EPSG:4326"

# Transform your data to the projected CRS 
squares_gdf_projected = squares_gdf.to_crs(crs_projected)
clc_projected = clc.to_crs(crs_projected)

# Perform a spatial join between squares_gdf and clc
joined = gpd.sjoin(squares_gdf_projected, clc_projected, how="inner", predicate='intersects')

# Calculate the area of intersection 
joined['intersection_area'] = joined.apply(lambda x: x['geometry'].intersection(clc_projected.loc[x['index_right'], 'geometry']).area, axis=1)

# Group by the index of squares_gdf and Code_18, and sum the intersection areas 
grouped = joined.groupby([joined.index, 'Code_18'])['intersection_area'].sum()

# Calculate the square area - all squares have the same size so we can just take the area of the first square
total_area = squares_gdf_projected.geometry.area.loc[grouped.index.get_level_values(0)].iloc[0]

# Calculate the percentage of each land use type within each square 
percentage_land_use = grouped / total_area

# Unstack the grouped dataframe to get a dataframe where each row corresponds to a square and each column corresponds to a land use type
land_use_df = percentage_land_use.unstack(fill_value=0)

# Join the land use dataframe with squares_gdf 
result = pd.concat([squares_gdf, land_use_df], axis=1)


  total_area = squares_gdf_projected.geometry.area.loc[grouped.index.get_level_values(0)].iloc[0]


In [86]:
result.head(1)

Unnamed: 0,id_sighting,id_species,name_species,date,timing,coord_lat,coord_lon,precision,altitude,total_count,atlas_code,id_observer,country,eea_grid_id,geometry,111,112,121,122,123,124,131,132,133,141,142,211,221,222,231,242,243,311,312,313,321,322,324,331,332,333,334,335,411,412,421,423,511,512,521,522,523
0,29666972,8.0,Haubentaucher,2018-01-01,,53.15776,8.676993,place,-1.05101,0.0,0,37718.0,de,50kmE4200N3300,"POLYGON ((8.68450 53.15326, 8.68450 53.16226, ...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012687,0.17861,0.0,0.0,0.0,0.39956,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021766,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.28978,0.0,0.097597,0.0


In [95]:
category_dict = {
    111: 'urban_area_percent',
    112: 'urban_area_percent',
    121: 'urban_area_percent',
    122: 'urban_area_percent',
    123: 'urban_area_percent',
    124: 'urban_area_percent',
    131: 'industrial_area_percent',
    132: 'industrial_area_percent',
    133: 'industrial_area_percent',
    141: 'urban_area_percent',
    142: 'urban_area_percent',
    211: 'agriculture_area_percent',
    212: 'agriculture_area_percent',
    213: 'agriculture_area_percent',
    221: 'agriculture_area_percent',
    222: 'agriculture_area_percent',
    223: 'agriculture_area_percent',
    231: 'agriculture_area_percent',
    241: 'agriculture_area_percent',
    242: 'agriculture_area_percent',
    243: 'agriculture_area_percent',
    244: 'agriculture_area_percent',
    311: 'forest_area_percent',
    312: 'forest_area_percent',
    313: 'forest_area_percent',
    321: 'grassland_area_percent',
    322: 'grassland_area_percent',
    323: 'mediterranean_vegetation_area_percent',
    324: 'shrubland_area_percent',
    331: 'coastal_area_percent',
    332: 'rocky_area_percent',
    333: 'sparsley_vegetated_area_percent',
    334: 'burnt_area_percent',
    335: 'glacier_area_percent',
    411: 'wetlands_area_percent',
    412: 'wetlands_area_percent',
    421: 'wetlands_area_percent',
    422: 'wetlands_area_percent',
    423: 'wetlands_area_percent',
    511: 'water_area_percent',
    512: 'water_area_percent',
    521: 'water_area_percent',
    522: 'water_area_percent',
    523: 'water_area_percent',
    990: 'unclassified_land_area_percent',
    995: 'unclassified_water_area_percent',
    999: 'unclassified_area_percent'
}

category_df = result.copy()
for code, category in category_dict.items():
    if code in result.columns:
        if category not in category_df.columns:
            category_df[category] = result[code]
        else:
            category_df[category] += result[code]


columns_to_drop = [col for col in category_dict.keys() if col in category_df.columns]
category_df.drop(columns=columns_to_drop, inplace=True)


In [11]:
category_df.tail(3)

Unnamed: 0,id_sighting,id_species,name_species,date,timing,coord_lat,coord_lon,precision,altitude,total_count,atlas_code,id_observer,country,eea_grid_id,geometry,urban_area_percent,industrial_area_percent,agriculture_area_percent,forest_area_percent,grassland_area_percent,shrubland_area_percent,coastal_area_percent,rocky_area_percent,sparsley_vegetated_area_percent,burnt_area_percent,glacier_area_percent,wetlands_area_percent,water_area_percent
2660040,15002272,123.0,Bergente,2018-02-17,,47.512154,9.436332,precise,391.37,1.0,0,11245.0,ch,50kmE4250N2700,POLYGON ((9.443839266020303 47.507649232485264...,0.519785,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.480215
2660041,15002282,8.0,Haubentaucher,2018-02-17,,47.512154,9.436332,precise,391.37,,0,11245.0,ch,50kmE4250N2700,POLYGON ((9.443839266020303 47.507649232485264...,0.519785,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.480215
2660042,15002291,8.0,Haubentaucher,2018-02-17,11:11:00,47.121205,7.240961,precise,433.98572,1.0,0,14061.0,ch,50kmE4100N2650,POLYGON ((7.248468666873759 47.116700337277834...,0.837752,0.0,0.060568,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.101679


In [97]:
# write
df_path = 'D:\Simon\Documents\GP\data\datasets\land_use\land_use_percentage.csv'
category_df.to_csv(df_path)

In [10]:
# read
df_path = 'D:\Simon\Documents\GP\data\datasets\land_use\land_use_percentage.csv'
category_df = pd.read_csv(df_path, index_col=0)

  category_df = pd.read_csv(df_path, index_col=0)


## Get landuse on coordinate

In [87]:
geometry = [Point(lon, lat) for lon, lat in zip(df['coord_lon'], df['coord_lat'])]
gdf = gpd.GeoDataFrame(df, geometry=geometry, crs="EPSG:4326")

merged_gdf = gpd.sjoin(gdf, clc, how="left", predicate="within")

merged_gdf.drop(columns=['geometry','index_right'], axis=1, inplace=True)

In [88]:
# execution time: 132m
merged_squares = gpd.sjoin(squares_gdf, clc, how="left", predicate="intersects")

In [98]:
merged_squares.head(1)

Unnamed: 0,id_sighting,id_species,name_species,date,timing,coord_lat,coord_lon,precision,altitude,total_count,atlas_code,id_observer,country,eea_grid_id,geometry,index_right,Code_18
0,29666972,8.0,Haubentaucher,2018-01-01,,53.15776,8.676993,place,-1.05101,0.0,0,37718.0,de,50kmE4200N3300,"POLYGON ((8.68450 53.15326, 8.68450 53.16226, ...",519790.0,231.0


In [99]:
merged_squares.reset_index(drop=True, inplace=True)

In [100]:
# write
df_path = 'D:\Simon\Documents\GP\data\datasets\land_use\land_use_on_coord.csv'
merged_squares.to_csv(df_path)

In [None]:
# read
df_path = 'D:\Simon\Documents\GP\data\datasets\land_use\land_use_on_coords.csv'
merged_squares = pd.read_csv(df_path, index_col=0)

## Get most common landuse within square

In [26]:
common_land_use = merged_squares.groupby(merged_squares.index)['Code_18'].agg(lambda x: x.value_counts().idxmax() if not x.empty and not x.value_counts().empty else None)

common_land_use_df = pd.DataFrame({'Code_18_sq': common_land_use.values}, index=common_land_use.index)

In [None]:
common_land_use_df.head(2)

In [None]:
# write
df_path = 'D:\Simon\Documents\GP\data\datasets\land_use\land_use_most_common.csv'
common_land_use_df.to_csv(df_path)

## Get land use percentage 

In [19]:
squares_with_clc = gpd.sjoin(clc, squares_gdf, how='inner', predicate='intersects')

# Calculate area of intersection between squares and CLC polygons for each Code_18 within each square
intersection_areas = squares_with_clc.groupby(['index_right', 'Code_18'])['geometry'].apply(lambda x: x.area)
total_areas = intersection_areas.groupby('index_right').sum()

# Calculate percentages of each Code_18 within each square
results = {}
for square_id, group in intersection_areas.groupby(level=0):
    percentages = {}
    for code_18, area in group.items():
        percentages[code_18[1]] = (area / total_areas[square_id])
    results[square_id] = percentages

# Create a DataFrame from the results dictionary
results_df = pd.DataFrame(results).T.fillna(0)

In [20]:
results_df.head(2)

Unnamed: 0,141,142,231,324,512,522,112,121,211,131,311,312,313,511,132,111,411,521,412,221,124,322,331,421,423,523,321,222,242,243,122,133,123,333,332,335,334
0,0.999438,0.728023,7.835214,0.315655,0.385008,89.736663,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.0,0.0,100.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,54.900457,0.0,0.0,0.0,2.212007,2.129309,40.758226,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.006888,0.0,0.018852,0.0,0.0,0.0,99.966571,0.007689,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,7.389199,0.0,11.458981,0.0,40.447079,0.0,0.0,0.0,25.929732,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [23]:
# Reset the index of results_df to make it easier to merge
results_df.reset_index(inplace=True)

# Merge the original DataFrame 'df' with results_df on the common index or a column
merged_df = pd.merge(df, results_df, left_index=True, right_on='index')

In [26]:
merged_df.head(2)

Unnamed: 0,id_sighting,id_species,name_species,date,timing,coord_lat,coord_lon,precision,altitude,total_count,atlas_code,id_observer,country,eea_grid_id,index,141,142,231,324,512,522,112,121,211,131,311,312,313,511,132,111,411,521,412,221,124,322,331,421,423,523,321,222,242,243,122,133,123,333,332,335,334
0,29666972,8.0,Haubentaucher,2018-01-01,,53.157760,8.676993,place,-1.051010,0.0,0,37718.0,de,50kmE4200N3300,0,0.999438,0.728023,7.835214,0.315655,0.385008,89.736663,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00000,0.0,0.000000,0.0,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0
1,29654244,397.0,Schwarzkehlchen,2018-01-01,,53.127639,8.957263,square,0.760781,2.0,0,37803.0,de,50kmE4250N3300,1,0.000000,0.000000,100.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00000,0.0,0.000000,0.0,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0
2,29654521,463.0,Wiesenpieper,2018-01-01,,50.850941,12.146953,place,270.831300,2.0,0,39627.0,de,50kmE4450N3050,2,0.000000,0.000000,54.900457,0.000000,0.000000,0.000000,2.212007,2.129309,40.758226,0.000000,0.000000,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00000,0.0,0.000000,0.0,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0
3,29666414,8.0,Haubentaucher,2018-01-01,,51.076006,11.038316,place,158.941010,8.0,0,38301.0,de,50kmE4350N3100,3,0.000000,0.000000,0.006888,0.000000,0.018852,0.000000,0.000000,0.000000,99.966571,0.007689,0.000000,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00000,0.0,0.000000,0.0,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0
4,29656211,8.0,Haubentaucher,2018-01-01,,51.389380,7.067282,place,52.362160,10.0,0,108167.0,de,50kmE4100N3100,4,0.000000,0.000000,7.389199,0.000000,11.458981,0.000000,40.447079,0.000000,0.000000,0.000000,25.929732,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00000,0.0,0.000000,0.0,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2659911,27523548,469.0,Bergpieper,2022-08-01,09:35:04,46.563896,8.551648,precise,2099.035000,4.0,0,11482.0,ch,50kmE4200N2600,2660038,0.000000,0.000000,0.000000,0.000000,0.644112,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,11.13615,0.0,0.000000,0.0,0.0,0.0,0.0,3.511372,84.708366,0.0,0.0
2659912,27523686,338.0,Mittelspecht,2022-10-08,09:17:16,47.383318,7.666533,precise,802.159700,1.0,0,11482.0,ch,50kmE4100N2650,2660039,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,5.850286,0.0,42.285625,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00000,0.0,8.912231,0.0,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0
2659913,15002272,123.0,Bergente,2018-02-17,,47.512154,9.436332,precise,391.370000,1.0,0,11245.0,ch,50kmE4250N2700,2660040,0.000000,0.000000,0.000000,0.000000,99.120961,0.000000,0.701437,0.177602,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00000,0.0,0.000000,0.0,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0
2659914,15002282,8.0,Haubentaucher,2018-02-17,,47.512154,9.436332,precise,391.370000,,0,11245.0,ch,50kmE4250N2700,2660041,0.000000,0.000000,0.000000,0.000000,99.120961,0.000000,0.701437,0.177602,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00000,0.0,0.000000,0.0,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0


In [25]:
# write
df_path = 'D:\Simon\Documents\GP\data\datasets\land_use\land_use_percentage.csv'
merged_df.to_csv(df_path)

In [None]:
common_land_use_df.head(3)

Unnamed: 0,num_land_use_sq
0,231.0
1,231.0
2,211.0


In [27]:
# Merge the common land use DataFrame with the original DataFrame based on the index
merged_gdf = merged_squares.merge(common_land_use_df, left_index=True, right_index=True)
merged_gdf.drop(columns=['geometry'], axis=1, inplace=True)

In [28]:
merged_gdf.head(3)

Unnamed: 0,id_sighting,id_species,name_species,date,timing,coord_lat,coord_lon,precision,altitude,total_count,atlas_code,id_observer,country,eea_grid_id,index_right,Code_18,Code_18_sq
0,29666972,8.0,Haubentaucher,2018-01-01,,53.15776,8.676993,place,-1.05101,0.0,0,37718.0,de,50kmE4200N3300,519790.0,231.0,231.0
0,29666972,8.0,Haubentaucher,2018-01-01,,53.15776,8.676993,place,-1.05101,0.0,0,37718.0,de,50kmE4200N3300,507381.0,141.0,231.0
0,29666972,8.0,Haubentaucher,2018-01-01,,53.15776,8.676993,place,-1.05101,0.0,0,37718.0,de,50kmE4200N3300,538281.0,324.0,231.0


In [30]:
clc_code_to_numerical_label = {
    111: 'urban',
    112: 'urban',
    121: 'urban',
    122: 'urban',
    123: 'urban',
    124: 'urban',
    131: 'industrial',
    132: 'industrial',
    133: 'industrial',
    141: 'urban',
    142: 'urban',
    211: 'agriculture',
    212: 'agriculture',
    213: 'agriculture',
    221: 'agriculture',
    222: 'agriculture',
    223: 'agriculture',
    231: 'agriculture',
    241: 'agriculture',
    242: 'agriculture',
    243: 'agriculture',
    244: 'agriculture',
    311: 'fores',
    312: 'fores',
    313: 'fores',
    321: 'grassland',
    322: 'grassland',
    323: 'mediterranean_vegetation',
    324: 'shrubland',
    331: 'coastal',
    332: 'rocky_area',
    333: 'sparsley_vegetated',
    334: 'burnt_area',
    335: 'glacier',
    411: 'wetlands',
    412: 'wetlands',
    421: 'wetlands',
    422: 'wetlands',
    423: 'wetlands',
    511: 'water',
    512: 'water',
    521: 'water',
    522: 'water',
    523: 'water',
    990: 'unclassified_land',
    995: 'unclassified_water',
    999: 'unclassified'
}
merged_gdf['land_use_coord'] = merged_gdf['Code_18'].map(clc_code_to_numerical_label)
merged_gdf['land_use_sq'] = merged_gdf['Code_18_sq'].map(clc_code_to_numerical_label)

In [41]:
merged_gdf.head(3)

Unnamed: 0,id_sighting,id_species,name_species,date,timing,coord_lat,coord_lon,precision,altitude,total_count,atlas_code,id_observer,country,eea_grid_id,index_right,num_land_use_coord,num_land_use_sq,land_use_coord,land_use_sq
0,29666972,8.0,Haubentaucher,2018-01-01,,53.15776,8.676993,place,-1.05101,0.0,0,37718.0,de,50kmE4200N3300,519790.0,3.0,3.0,Agriculture,Agriculture
1,29666972,8.0,Haubentaucher,2018-01-01,,53.15776,8.676993,place,-1.05101,0.0,0,37718.0,de,50kmE4200N3300,507381.0,1.0,3.0,Urban,Agriculture
2,29666972,8.0,Haubentaucher,2018-01-01,,53.15776,8.676993,place,-1.05101,0.0,0,37718.0,de,50kmE4200N3300,538281.0,5.0,3.0,Grassland,Agriculture


In [33]:
merged_gdf['atlas_code'].fillna(0, inplace=True)

In [34]:
merged_gdf.drop(columns=['Code_18', 'Code_18_sq'], axis=1, inplace=True)

In [42]:
df_path = 'D:\Simon\Documents\GP\data\datasets\selected_species_50km_luse.csv'
merged_gdf.to_csv(df_path)