In [1]:
%load_ext watermark
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib as mpl
import matplotlib.colors
from matplotlib.colors import LinearSegmentedColormap, ListedColormap
import seaborn as sns

(landusereporter)=
# Land use class

The landuse class groups the survey results according to the rate of landuse at each survey location. Provides the correlation matrix of the feature variables and the number of samples and the average result for each landuse type and magnitude.

## Why is this important?

__Because it is another way to get proxies for usage and population.__

We assume there is a relationship between how the land is used and what it is we find on the ground. Archeaologists and Anthropologists make this basic assumption every time they undertake an excavation and interpret the results in the context of other findings. This interpretation of beach litter data does exactly the same. As discussed in [Near or far](https://www.hammerdirt.ch) and the federal report [IQAASL](https://www.hammerdirt.ch) at the national level there is strong evidence to support a correlation between the density of objects found and specific topographic features that can be isolated on a standard topographical map.

### What is important?

__The relationship between the topograhpical features and the density of the objects found.__

However, the measured features are not independent of each other. For example if their are buildings in an area we expect to also find a road that leads to those buildings. This multicolinearity can lead to unstable coefficient estimates and make it challenging to interpret the individual effects of the correlated variables on the target variable.

The topographical data from the confederation provides continuity to what could be interpreted as unrelated observations. Furthermore, the labels provided for the various topographical features are indicators of use and have a real meaning to georaphers and engineers in planning and development. Local associations that are involved in preventing and reducing litter may also be interested.

## Make a land use object

After the topographical features are extracted the results are applied to the data. The land use clas is available by calling `geospatial.LandUseReport(df_target, features)`. The `df_target` and `features` variables are generated in the `SurveyReport`. 

__Instantiate a `LandUseReport`__

```python
# start a survey report
import session_config
import reports
import geospatial

# available data
surveys = session_config.collect_survey_data()

# boundaries / search parameters
feature_type = 'canton'
feature_name = 'Vaud'

df = surveys[surveys[feature_type] == feature_name].copy()
vaud_report = reports.SurveyReport(dfc=df)

# the parameters for the land use report
target_df = vaud_report.sample_results
features = vaud_report.sampling_conditions()
land_use_report = geospatial.LandUseReport(target_df, features)
```
### Report contents

1. Number of samples per feature and magnitude
2. The total number of objects collected per feature and magnitude
3. The number of locations per feature and magnitude
4. The average pieces per meter for each feature and magnitude
5. The correlation matrix of the feature variables
6. The correlated pairs
7. The landuse on a conintuous scale
8. The landuse on a categorical scale

In [2]:
import session_config
import reports
import geospatial

# available data
surveys = session_config.collect_survey_data()

# boundaries / search parameters
feature_type = 'feature_name'
feature_name = 'lac-leman'

df = surveys[surveys[feature_type] == feature_name].copy()
vaud_report = reports.SurveyReport(dfc=df)

# the parameters for the landuse report
target_df = vaud_report.sample_results
features = geospatial.collect_topo_data(locations=target_df.location.unique())
land_use_report = geospatial.LandUseReport(target_df, features)

# creates an array of tuples of the correlated pairs
correlated_pairs = land_use_report.correlated_pairs()

# pass the correlated pairs to combine features method
# this will categorize the features and combine the correlated pairs
# into new columns
land_use_report.combine_features(correlated_pairs)

### Number of samples per feature

In [3]:
samples_per_feature = land_use_report.n_samples_per_feature()
samples_per_feature[session_config.feature_variables]

Unnamed: 0,orchards,vineyards,buildings,forest,undefined,public services,streets
1,251,248,25,216,208,200,46
2,0,1,17,22,18,51,160
3,0,2,1,13,25,0,45
4,0,0,50,0,0,0,0
5,0,0,158,0,0,0,0


### Quantity per feature

In [4]:
q_pf = land_use_report.n_pieces_per_feature()
q_pf[session_config.feature_variables]

Unnamed: 0,orchards,vineyards,buildings,forest,undefined,public services,streets
1,80016,79749,2115,69630,68202,62279,14442
2,0,146,9513,1484,9699,17737,50897
3,0,121,186,8902,2115,0,14677
4,0,0,13315,0,0,0,0
5,0,0,54887,0,0,0,0


### Locations per feature

In [5]:
l_pf = land_use_report.locations_per_feature()
l_pf[session_config.feature_variables]

Unnamed: 0,orchards,vineyards,buildings,forest,undefined,public services,streets
1,38,36,3,35,31,29,7
2,0,1,3,2,4,9,22
3,0,1,1,1,3,0,9
4,0,0,5,0,0,0,0
5,0,0,26,0,0,0,0


### density per feature

In [6]:
r_pf = land_use_report.rate_per_feature().T
r_pf[session_config.feature_variables]

Unnamed: 0,orchards,vineyards,buildings,forest,undefined,public services,streets
1,8.921753,8.949919,4.6128,8.507593,8.632308,7.8458,11.502174
2,0.0,9.75,19.01,4.256364,18.251111,13.141176,7.273375
3,0.0,5.015,5.35,23.698462,4.6128,0.0,12.144889
4,0.0,0.0,5.5604,0.0,0.0,0.0,0.0
5,0.0,0.0,9.60443,0.0,0.0,0.0,0.0


### Corelation matrix

In [7]:
land_use_report.correlation_matrix()

Unnamed: 0,orchards,vineyards,buildings,forest,undefined,public services,streets
orchards,1.0,0.215632,-0.232329,-0.044311,0.205992,-0.195685,-0.087427
vineyards,0.215632,1.0,-0.10063,-0.190298,-0.057757,-0.215319,-0.155203
buildings,-0.232329,-0.10063,1.0,-0.872981,-0.946109,0.480962,0.532584
forest,-0.044311,-0.190298,-0.872981,1.0,0.767505,-0.367753,-0.538793
undefined,0.205992,-0.057757,-0.946109,0.767505,1.0,-0.425264,-0.412409
public services,-0.195685,-0.215319,0.480962,-0.367753,-0.425264,1.0,0.542128
streets,-0.087427,-0.155203,0.532584,-0.538793,-0.412409,0.542128,1.0


### Corelated pairs

The correlated pairs method identifies the land-use features that are correlated with each other. The method returns a tuple with the two features that are correlated and the method that could be used to combine them.

In [8]:
print(f'Correlated pairs:\n{correlated_pairs}')

Correlated pairs:
[('buildings', 'public services', 'rate'), ('forest', 'undefined', 'sum')]


### Continuous land use

In [9]:
continuous = land_use_report.df_cont.copy()
examps = continuous[continuous.location.isin(['veveyse', 'la-pecherie'])].drop_duplicates('location')
examps[['location', *session_config.feature_variables]].fillna(0).set_index('location')

Unnamed: 0_level_0,orchards,vineyards,buildings,forest,undefined,public services,streets
location,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
la-pecherie,0.171,0.162,0.106,0.096,0.464,0.010449,0.166033
veveyse,0.0,0.027,0.958,0.0,0.015,0.047393,0.344649


### Categorical land use

In [10]:
cat = land_use_report.df_cat.copy()
examps = cat[cat.location.isin(['veveyse', 'la-pecherie'])].drop_duplicates('location')
examps[['location', *session_config.feature_variables]].set_index('location')

Unnamed: 0_level_0,orchards,vineyards,buildings,forest,undefined,public services,streets
location,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
la-pecherie,1,1,1,1,3,1,1
veveyse,1,1,5,1,1,1,2


In [11]:
%watermark -a hammerdirt-analyst -co --iversions

Author: hammerdirt-analyst

conda environment: cantonal_report

pandas    : 2.0.3
seaborn   : 0.12.2
numpy     : 1.25.2
matplotlib: 3.7.1

