In [26]:
import builder as builder
import geopandas as gpd
import pandas as pd

# Demostration of the thesis code

### Loading the data files

In [2]:
klein_raw = pd.read_csv('data/klein/results.csv')
klein_agg = gpd.read_file('data/klein/agg_results.geojson')

gross_raw = pd.read_csv('data/gross/results.csv')
gross_agg = gpd.read_file('data/gross/agg_results.geojson')

cop_points = gpd.read_file('data/cop/ems.shp')
cop_buildings = gpd.read_file('data/cop/cop_clipped_footprints.shp')

# the buildings that were moved to account for the georeference shift of the
# MS-small project
cop_buildings_moved = gpd.read_file('data/cop/cop_footprints_moved.shp')

gt_agg = gpd.read_file('data/reference/agg_fertig.shp')

### Create Project Instances
The input for the Objects are the raw data, the aggregated data and the Copernicus buildings. For MS-small the reference dataset (gt_agg) is loaded too. Also the filter_bad_im gives the threshold for a picture with a bad image share > than the threshold. To deactive the threshold, set no input or False.

The Statistics class inherits from the Project class and provides some more methods for user analyses.

Some information is outputted while initializing the class.

Info for the GLAD algorithm: Since the GLAD algorithm takes a while to run, the results are loaded from a file. If new MapSwipe data is added, Line 945 in the builder.py has to be reactivated.

In [28]:
# Load an object of the MS-big data
Gb = builder.Project(gross_raw, gross_agg, cop_buildings, filter_bad_im=0.5)

# And one of the MS-small
Kbm = builder.Statistics(klein_raw, klein_agg, cop_buildings_moved, gt_agg, filter_bad_im=0.5)

Bad imagery threshold is active (0.5) 
ignoring all pictures with given or higher bad imagery share
Added Individual Answers!
Setted Tiles with Copernicus Polygons!
Added MapSwipe Collection!
Added Copernicus Collections!


FileNotFoundError: [Errno 2] No such file or directory: 'outputs/gross_glad_data.txt'

### Attributres
The following attributes can be found for an object.
In the end, not all attributes are really used, so some could be outdated.

In [None]:
list(Kbm.__dict__)

### Methods
The following public methods can be found for an object.

In [None]:
sorted(Kbm.__dir__()[47:72])

### None plotting examples

In [None]:
# get gets you a certain tile
tile = Kbm.get('20-309459-470366')
#list(tile.__dict__)

In [None]:
# export methods save something with an filename input
# export idx for example takes a list of indices and exports ther geometries as a shapefile
Kbm.export_idx([5, 8, 13, 21, 34], 'example.shp')

In [None]:
# get Accuracy, Precision, Sensitivity and F1 for a certain collection or user id
print(Kbm.get_spec_sens('min_45'))

# get the raw counts for the TN, FP, FN, TP values
print(Kbm.get_spec_sens('min_45', counts=True))

# or get the values based on a user_id
print(Kbm.get_spec_sens(user_id='NFlvMjIcKwOrui9olixZwlLUNFv2'))

### Confusion Matrices
Confusion matrices are created by using the collections, these are accessed by their keys, e.g. 'min_65'.
There are two ways of getting confusion matrices:
1. with two collections with the get_confusion_matrix method
2. with multiple collections with the get_big_confusions_matrix_meine method

However, the get_confusion_matrix method is outdated so its better to also use the get_big_confusion_matrix_meine method but wiht only one input as a list

In [None]:
# confusion matrix between the damaged and destroyed copernicus buildings and all tiles with at least 35 % positive answers
Kbm.get_big_confusion_matrix_meine(['min_35'], ['GT'])

In [None]:
# multiple confusion matrices in one Table with lists of collection keys
Kbm.get_big_confusion_matrix_meine(['min_35', 'min_65', 'glad_yes'], ['cop_all', 'cop_damaged_or_destroyed', 'GT'])

# Result plotting
Plotting can be done directly in the Project class. Quality assessment will not work for the MS-big project since there is not groud truth initialized.

### Quality Parameters by Threshold
Figure 11

In [None]:
Kbm.plot_measures()

### ROC for obeserved TPRs and FPRs
Figure 12

In [None]:
Kbm.plot_roc()

### The cumulative tiles per user distribution
Figure 14

In [None]:
Kbm.cum_user_plot()

### Quality Measures for each MapSwipe user
Figure 15

The warnings appear because some users did process only a few tiles, leading to devision by 0 for some quality measures.

In [None]:
Kbm.plot_user_stats()

### ROCs for different user characterisitcs
Figure 16

In [None]:
Kbm.logit_by_statsmodels()

### Cohens Kappas
Figures 17 and 19

In [None]:
_ = Kbm.plot_ms_cop_comparison()

In [None]:
_ =Gb.plot_ms_cop_comparison()

### Note for Example Plotting

The example plotting file does only work with the raster images. These are unfortunately too large for a GitHub upload and are possible subjected too redistribution limits 

### Data Sources

In [None]:
data = pd.read_csv('../data_overview.csv', sep=";")
data