<a href="https://colab.research.google.com/github/kleenman/tds_capstone/blob/main/Berechnen_der_Pixel_Anzahl_model.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

##Description
This file identifies green spaces within urban areas from satellite images. The analysis uses total pixel count from processed images as a basis to calculate green space percentages. This method assumes any inaccuracies across individual images average out given a large dataset. Explicitly the non city areas from Geoatlas for Würzburg and Munich are assumed to pe relatively equal.

##Configuration

*   Image Size Assumption: Each image is assumed to have a dimension of 256x256 pixels.
*   Dataset: The analysis covers images from two cities, Würzburg and Munich, with their respective image counts determining the total pixels considered in each city.
*   Percentage Calculation: The percentage of green spaces is calculated by dividing the sum of green pixels (as determined by the model) by the total number of pixels in all analyzed images, multiplied by 100 to get a percentage value.


In [1]:
from google.colab import drive
drive.mount('/content/gdrive')

Mounted at /content/gdrive


In [2]:
import numpy as np
import pandas as pd
import os
import matplotlib.pyplot as plt
from PIL import Image
from fastai.vision.all import *


In [3]:
#ausführen
dataset_path = '/content/gdrive/MyDrive/Capstone_split/Data/munich_data'

# define mask function

def mask_function(name):
  img_name = str(name).split('/')[-1]
  mask_path = dataset_path + '/masks/' + img_name.replace('.jpg', '.npy')
  return np.load(mask_path)


In [4]:
#ausführen
# Create datablock
parks_block = DataBlock(blocks = (ImageBlock, MaskBlock(codes = ['background','park'])),
                 get_items=get_image_files,
                 splitter=RandomSplitter(valid_pct=0.2, seed=42),
                 get_y=mask_function,
                 item_tfms=Resize(500),
                 batch_tfms=aug_transforms(size=500, max_lighting=0.3)
)

# Dataloaders with batch size 4
dls4 = parks_block.dataloaders(dataset_path, bs=4)

# Dataloaders with batch size 16
dls16 = parks_block.dataloaders(dataset_path, bs=16)

# Dataloaders with batch size 32
dls32 = parks_block.dataloaders(dataset_path, bs=32)


In [5]:
#dls16.show_batch(figsize=(15, 15), max_n=4)

In [6]:
##

def create_and_train_learner(dl, arch=resnet34, metrics=Dice, monitor='dice', epcohs=40, fname='model', act_cls=torch.nn.modules.activation.ReLU):
  learn = unet_learner(dl, arch, metrics=metrics, act_cls=act_cls)
  lr = learn.lr_find()
  learn.fit_one_cycle(40, lr_max=lr.valley, cbs=[
      EarlyStoppingCallback(monitor=monitor,
                            min_delta=0.01,
                            patience=10,
                            comp=np.greater),
      SaveModelCallback(monitor=monitor,
                        min_delta=0.01,
                        comp=np.greater,
                        fname=fname)])
  return learn

In the next sections we will experiment with different activation functions (ELU, LeakyReLU, GELU) with the U-Net architecture and resnet34 model, while keeping all transformations and batch size constant.


The ELU activation function drastically accelerated the learning process of the model. A dice score of over 0.5 was reached as soon as epoch 19 of training, as opposed to the previous model, which used the Rectified Linear Unit activation function, and reached a comparable dice score only after unfreezing its last layers.

The ELU activation function provided slightly better performance than RELU.
Next, we will try the LeakyRELU activation function.

Finally we will unfreeze and train the last layers of the best model and export it.

In [7]:
#ausführen
path = Path('./gdrive/MyDrive/Capstone')
path.ls()

(#7) [Path('gdrive/MyDrive/Capstone/Literature'),Path('gdrive/MyDrive/Capstone/TDS2.pdf'),Path('gdrive/MyDrive/Capstone/Data'),Path('gdrive/MyDrive/Capstone/park_images'),Path('gdrive/MyDrive/Capstone/nopark_images'),Path('gdrive/MyDrive/Capstone/images for gradio'),Path('gdrive/MyDrive/Capstone/best_model.pkl')]

In [8]:

#ausführen
best_model_path = 'gdrive/MyDrive/Capstone_split/models/best_model.pkl'
learn_app = load_learner(best_model_path)

In [9]:


# Pfad zu Ihren Bildern
bilder_pfad = '/content/gdrive/MyDrive/Capstone_split/Data/wuerzburg_data'

# Liste aller Dateien im Bilderpfad
dateinamen = os.listdir(bilder_pfad)

# DataFrame initialisieren
df = pd.DataFrame(columns=['Bildname', 'Pixelanzahl'])

for dateiname in dateinamen:
    bild_pfad = os.path.join(bilder_pfad, dateiname)

    # Überprüfen, ob der Pfad auf ein Bild zeigt
    if os.path.isfile(bild_pfad) and bild_pfad.endswith('.jpg'):
        # Vorhersage der Maske
        try:
            pred_mask = learn_app.predict(bild_pfad)[0]
            # Zählen der Pixel, die zur Klasse "park" gehören
            pixelanzahl = np.sum(np.array(pred_mask) == 1)
            # Hinzufügen der Daten zum DataFrame
            df = pd.concat([df, pd.DataFrame({'Bildname': [dateiname], 'Pixelanzahl': [pixelanzahl]})], ignore_index=True)
        except Exception as e:
            print(f"Fehler bei der Verarbeitung des Bildes {dateiname}: {e}")

# Gesamtfläche berechnen
gesamtflaeche = df['Pixelanzahl'].sum()
print(f"Gesamtfläche: {gesamtflaeche} Pixel")


Gesamtfläche: 1203498 Pixel


In [11]:
# Pfad zu Ihren Bildern
bilder_pfad = '/content/gdrive/MyDrive/Capstone_split/Data/munich_data'

# Liste aller Dateien im Bilderpfad
dateinamen = os.listdir(bilder_pfad)

# DataFrame initialisieren
df_munich = pd.DataFrame(columns=['Bildname', 'Pixelanzahl'])

for dateiname in dateinamen:
    bild_pfad = os.path.join(bilder_pfad, dateiname)

    # Überprüfen, ob der Pfad auf ein Bild zeigt
    if os.path.isfile(bild_pfad) and bild_pfad.endswith('.jpg'):
        # Vorhersage der Maske
        try:
            pred_mask = learn_app.predict(bild_pfad)[0]
            # Zählen der Pixel, die zur Klasse "park" gehören
            pixelanzahl = np.sum(np.array(pred_mask) == 1)
            # Hinzufügen der Daten zum DataFrame
            df_munich = pd.concat([df_munich, pd.DataFrame({'Bildname': [dateiname], 'Pixelanzahl': [pixelanzahl]})], ignore_index=True)
        except Exception as e:
            print(f"Fehler bei der Verarbeitung des Bildes {dateiname}: {e}")

# Gesamtfläche berechnen
gesamtflaeche_munich = df_munich['Pixelanzahl'].sum()
print(f"Gesamtfläche: {gesamtflaeche_munich} Pixel")

Gesamtfläche: 27354605 Pixel


#Calculation of the percentage of green spaces

In our analysis, we used the total number of pixels from all analysed images as the basis for calculating the percentage of green spaces. This approach is based on the assumption that any inaccuracies or deviations in the individual images will even out across the large amount of data. It is important to note that the image selection includes both central and peripheral urban areas, which means that different urban structures, including areas without development, were taken into account.

Instead of making a detailed calculation of the exact urban area of Würzburg and Munich, we decided to use the definitions of the GEO Atlas of Bavaria for the delineations of the cities. As a result, all pixels of all images were included in the analysis, providing a more comprehensive overview of urban and peri-urban green spaces. However, it should be noted that a more precise delineation of urban areas could lead to an adjustment of the calculated percentage, which is a separate and complex problem.

In [21]:
# Annahme: Jedes Bild hat eine Größe von 250x250 Pixeln
pixel_pro_bild = 500 * 500

# Anzahl der Bilder für Würzburg und München
anzahl_bilder_wuerzburg = len(df)
anzahl_bilder_muenchen = len(df_munich)

# Gesamtzahl der Pixel in allen Bildern für Würzburg und München
gesamt_pixel_wuerzburg = anzahl_bilder_wuerzburg * pixel_pro_bild
gesamt_pixel_muenchen = anzahl_bilder_muenchen * pixel_pro_bild



# Berechnung des Prozentsatzes der Park-Pixel für Würzburg
prozent_parks_wuerzburg = (df['Pixelanzahl'].sum() / gesamt_pixel_wuerzburg) * 100

# Berechnung des Prozentsatzes der Park-Pixel für München
prozent_parks_muenchen = (df_munich['Pixelanzahl'].sum() / gesamt_pixel_muenchen) * 100

# Ausgabe der Ergebnisse
print(f"Prozentsatz der Parks in Würzburg: {prozent_parks_wuerzburg:.2f}%")
print(f"Prozentsatz der Parks in München: {prozent_parks_muenchen:.2f}%")


Prozentsatz der Parks in Würzburg: 2.74%
Prozentsatz der Parks in München: 7.37%
