In [1]:
%load_ext watermark
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib as mpl
import matplotlib.colors
from matplotlib.colors import LinearSegmentedColormap, ListedColormap
import seaborn as sns

(landusereporter)=
# Land use class

The landuse class groups the survey results according to the rate of landuse at each survey location. Provides the correlation matrix of the feature variables and the number of samples and the average result for each landuse type and magnitude.

## Why is this important?

__Because it is another way to get proxies for usage and population.__

We assume there is a relationship between how the land is used and what it is we find on the ground. Archeaologists and Anthropologists make this basic assumption every time they undertake an excavation and interpret the results in the context of other findings. This interpretation of beach litter data does exactly the same. As discussed in [Near or far](https://www.hammerdirt.ch) and the federal report [IQAASL](https://www.hammerdirt.ch) at the national level there is strong evidence to support a correlation between the density of objects found and specific topographic features that can be isolated on a standard topographical map.

### What is important?

__The relationship between the topograhpical features and the density of the objects found.__

However, the measured features are not independent of each other. For example if their are buildings in an area we expect to also find a road that leads to those buildings. This multicolinearity can lead to unstable coefficient estimates and make it challenging to interpret the individual effects of the correlated variables on the target variable.

The topographical data from the confederation provides continuity to what could be interpreted as unrelated observations. Furthermore, the labels provided for the various topographical features are indicators of use and have a real meaning to georaphers and engineers in planning and development. Local associations that are involved in preventing and reducing litter may also be interested.

## Make a land use object

After the topographical features are extracted the results are applied to the data. The land use clas is available by calling `geospatial.LandUseReport(df_target, features)`. The `df_target` and `features` variables are generated in the `SurveyReport`. 

__Instantiate a `LandUseReport`__

```python
# start a survey report
import session_config
import reports
import geospatial

# available data
surveys = session_config.collect_survey_data()

# boundaries / search parameters
feature_type = 'canton'
feature_name = 'Vaud'

df = surveys[surveys[feature_type] == feature_name].copy()
vaud_report = reports.SurveyReport(dfc=df)

# the parameters for the land use report
target_df = vaud_report.sample_results
features = vaud_report.sampling_conditions()
land_use_report = geospatial.LandUseReport(target_df, features)
```

### Indentify covariates 

Before categorizing the landuse variables it is required to combine the subcategories and identify the covariates of the feature variables.

### Categorize land use data

The land use rate 0 <= x <= 1 is categorized to five groups 1-5. Each location has a landuse profile of at least 7 topographic features rated from 1 - 5.

```python

# creates an array of tuples of the correlated pairs
correlated_features = land_use_report.correlated_pairs()

# pass the correlated pairs to combine features method
# this will categorize the features and combine the correlated pairs
# into new columns
land_use_report.combine_features(correlated_pairs)
```

### Report contents

1. Number of samples per feature and magnitude
2. The total number of objects collected per feature and magnitude
3. The number of locations per feature and magnitude
4. The average pieces per meter for each feature and magnitude
5. The correlation matrix of the feature variables
6. The correlated pairs
7. The landuse on a conintuous scale
8. The landuse on a categorical scale: quintiles

In [2]:
import session_config
import reports
import geospatial

# available data
surveys = session_config.collect_survey_data()

# boundaries / search parameters
feature_type = 'feature_name'
feature_name = 'lac-leman'

df = surveys[surveys[feature_type] == feature_name].copy()
vaud_report = reports.SurveyReport(dfc=df)

# the parameters for the landuse report
target_df = vaud_report.sample_results
features = geospatial.collect_topo_data(locations=target_df.location.unique())
land_use_report = geospatial.LandUseReport(target_df, features)

# creates an array of tuples of the correlated pairs
correlated_pairs = land_use_report.correlated_pairs()

# pass the correlated pairs to combine features method
# this will categorize the features and combine the correlated pairs
# into new columns
land_use_report.combine_features(correlated_pairs)

### Number of samples per feature

In [3]:
samples_per_feature = land_use_report.n_samples_per_feature()
samples_per_feature[session_config.feature_variables]

### Quantity per feature

In [4]:
q_pf = land_use_report.n_pieces_per_feature()
q_pf[session_config.feature_variables]

### Locations per feature

In [5]:
l_pf = land_use_report.locations_per_feature()
l_pf[session_config.feature_variables]

### density per feature

In [6]:
r_pf = land_use_report.rate_per_feature().T
r_pf[session_config.feature_variables]

### Corelation matrix

In [7]:
land_use_report.correlation_matrix()

### Corelated pairs
Land use associated that is unclassisfied correlates with vineyards, forests and orchards. Note that vineyards and forest are correlated with each other.

In [8]:
print(f'Correlated pairs:\n{correlated_pairs}')

### Continuous land use

In [9]:
continuous = land_use_report.df_cont.copy()
examps = continuous[continuous.location.isin(['veveyse', 'la-pecherie'])].drop_duplicates('location')
examps[['location', *session_config.feature_variables]].fillna(0).set_index('location')

### Categorical

In [10]:
cat = land_use_report.df_cat.copy()
examps = cat[cat.location.isin(['veveyse', 'la-pecherie'])].drop_duplicates('location')
examps[['location', *session_config.feature_variables]].set_index('location')

In [11]:
%watermark -a hammerdirt-analyst -co --iversions