# Self-Organizing Map

Lastly, we want to grasp an overview over the distribution of the features selected in {numref}`sec:correlations` (population density, overall crime rate, number of burdens (environmental justice), unemployment rate, social benefits receiver rate, child poverty rate).
The best way for us to do so without losing too much information is to use a so called Self-Organizing Map (SOM). The SOM is depicted in {numref}`fig:codes`.


In [1]:
rm(list = ls(all.names = TRUE))
data(swiss)
library(kohonen)
library(RColorBrewer)
setwd("/Users/robert/Documents/Master Data Science/2. Semester/Data Visualization/PROJECT/VisuProj23")

# Load the CSV file into a DataFrame
df <- read.csv("data/07_wichtigste_features_aus_04_bis_06.csv")
df$LOR_str <- sprintf("%08d", df$LOR_str)
df[273,3] = "Schloßstraße Stegl."
df[407,3] = "Schloßstraße Ch'burg"
row.names(df) <- df$PLR_NAME
df <- df[, -c(1:4)]

old <- cur <- Inf
dat <- scale(df)
## iterative improvement of SOM
for (i in 1:100){
  erg <- som(dat, grid=somgrid(15,10,"hexagonal"), rlen=1000)
  cur <- sum(erg$distances)
  if (cur<old){
    erg2 <- erg
    old <- cur
  }
}

som2pts <- function(x){
  stopifnot("kohonen" %in% class(x))
  x$grid$pts[x$unit.classif,]
}

som_out <- som2pts(erg2)

pal <- function(n) brewer.pal(n, "Set3")
par(lab.cex = 1)  # Adjust the legend font size
par(mfrow=c(1,1))
plot(erg2, shape="straight", palette.name=pal)

```{figure} plots/codes_plot.JPG
---
height: 500px
name: fig:codes
---
Self-Organizing Map with the features population density, overall crime rate, number of burdens (environmental justice), unemployment rate, social benefits receiver rate, and child poverty rate.
```
The overall picture is that we have high values for all features on the right and low values on the left. Some things are remarkable: Almost everywhere in the codes plot, we can clearly see the high correlation between unemployment, social benefits and child poverty, as the slices for the three features always are of similar size.  
In the middle of the last line are examples for the fact that a high population density and a high number of burdens do not necessarily lead to high numbers of crimes, unemployment, social benefits or child poverty. In line six, column four, we see the opposite example, where we have low numbers for all features, except for the crime rate, which is nevertheless relatively high.   There are many more insights that one could gain from the SOM plot, mentioning all of them would go beyond the scope of this project.