# Predicting the most popular and trending locations within the localities of twelve boroughs in the city of Berlin 

### Sumudu R Samarasinghe
### 26th June 2019

## 1.	Introduction

### 1.1.	Background



Berlin city is one of the German’s federal states. It consists of twelve boroughs each with its own local government, under Berlin’s city and state government. Each borough is governed by a council with five councilors and a borough mayor. Each borough is made up of several officially recognized neighborhoods/localities. These neighborhoods/ localities do not have their own governmental bodies, but are recognized by the city and the boroughs for planning and statistical purposes. It is beneficial for each borough council to have an effective plan on improving the infrastructure of its localities within a year. This will also help the city government to allocate funds appropriately and ultimately improve the living standards and well-being of the inhabitants. In order to achieve this it is advantageous to predict and identify the behavior of inhabitants within localities of each borough, most popular areas, venues, activities etc. This will in turn help the borough council to identify what, where and how to plan on improving the underlying infrastructure to facilitate its people.



### 1.2.	Problem



Usually each borough’s local government is allocated funds to improve living standards and general well-being of its inhabitants and it is important that these funds are utilized in a way that the population receives the maximum benefits every year. Therefore, it is critical for the local government of each borough to identify the most effective ways, such as on what, where and how to utilize the allocated funds. 

Location data is freely available and easily accessible. Analyzing the location data within the city of Berlin can be used to identify the most popular and trending clusters/ venues among the inhabitants of each locality in each borough. This information will help to identify the common behavioral patterns of inhabitants including popular sports, other leisure activities and places where most people like to spend time during their free time  specially on weekends, E.g. popular places of eating etc. 

Local governments like borough councils can utilize this information in their yearly planing and fund management processes in order to effectively and efficiently plan and manage their funds allocated to improve underlying infrastructure to uplift living standards of its inhabitants and ensure well-being in terms of  physical and mental health, safely, comfort etc.



### 1.3.	Interest


The project findings will directly be beneficial for local governments of each borough in their planning, designing strategies and fund management processes. It will also provide valuable insights on future planning of the borough infrastructure and decision making as well. The findings of the project will also benefit inhabitants of each borough localities ensuring them an easy, healthy, comfortable, safe life style. 


## 2.	Data

### 2.1.	Data Sources

Information about each borough of Berlin city and each of its localities/ neighborhoods (such as names, population density of boroughs and their localities/ neighborhoods) were scraped from the wiki web page:  https://en.wikipedia.org/wiki/Boroughs_and_neighborhoods_of_Berlin

Location data/ geographical coordinates of each borough and each of their localities were obtained using python geocoder. Foursquare API was used to get data about popular/ trending venues, venue categories etc. for each locality.


###### Part of dataframe with latitude, longitude, postal code data for 3 most dense localities per borough
![image.png](attachment:image.png)

###### Map of Berlin with 3 most dense localities per borough (Data as shown in the above dataframe)
![image.png](attachment:image.png)

### 2.2.	Data Preprocessing

Boroughs and each of its localities together with location data (latitudes and longitudes) were arranged in a python dataframe for downstream analysis. Final cleaned dataframe included columns for borough name, locality name, postal code of locality, latitude and longitude of locality. For this analysis three maximum localities were chosen from each borough which had the highest population densities for the reported year.

Based on the information in this final dataframe further analysis were carried out for each of the twelve boroughs of the city of Berlin. Using Foursquare API popular venues were selected for each locality. 10 most popular venues and venue categories were identified using that data and stored in python dataframes for detailed analyses. Trending venues were identified at specific days of the week based on foot traffic at a given time E.g. on weekends. 


## 3. Methodology

Due to limited facilities the analysis was carried out to retrieve a maximum of 100 venues per 500m radius within each locality of a borough. Among the 100 venues the most popular 10 venues per locality were identified including venue category. Identifying trending venues per locality was also added in the analysis pipeline which is useful to identify the popular venues based on foot traffic at a specific time of a specific day.
Clustering of localities within a borough based on most popular venues were also carried out using KMeans clustering algorithm. k number of optimum clusters were chosen. Resulting clusters were then analyzed to identify common features among the localities that clustered together. 


## 4. Results

We analyzed two boroughs of the city of Berlin, Mitte and Friedrichshain-Kreuzberg. Borough Mitte has 6 localities. We obtained postal code, latitude and longitude data for all 6 of them. Foursquare API was used to retrieve venue data per each of the 6 localities with a maximum limit of 100 venues within a radius of 500m. 


###### Locality data of Mitte borough
![image.png](attachment:image.png)

###### Part of dataframe showing venue data of Mitte borough localities
![image.png](attachment:image.png)

Among the venues retrieved the top 10 most popular venues per each locality were identified. 

###### Part of dataframe showing the 5 most popular venues and venue data of Mitte boorugh localities
![image.png](attachment:image.png)

According to the results of analysis, within the borough Mitte, the top popular venues within all localities include restaurants/hotels, cafes and museums.    

Clustering of localities within the Mitte borough based on top venues shows that five of the six localities have high similarity to each other while one locality within Mitte (Wedding) shows less similarity to the rest of the 5 localities. 


###### Dataframe showing the cluster labels attached to each locality of Mitte borough
![image.png](attachment:image.png)

###### Map showing clusters of Mitte borough
![image.png](attachment:image.png)

The most popular venues within the localities of borough are identified as restaurants, cafes, bakery, drugstore, bar, bookstore, metro station etc. 

###### Part of dataframes showing Venue data and most popular venues per locality in Friedrichshain-Kreuzberg borough
![image.png](attachment:image.png)

![image.png](attachment:image.png)

Results of the analysis of Friedrichshain-Kreuzberg borough localities show that restaurants, cafes, drugstore, bar, metro station, book store etc are the most popular venues among its inhabitants.

## 5. Discussion

Analysis of Mitte borough localities suggests that this borough is likely to be historically important and probably a local and foreign tourist attraction site. This is further supported by the fact that restaurants, hotels and cafes are similarly popular within almost all the localities in Mitte borough as well as some of the other outdoor locations of leisure such as park, trail, and tennis court.

According to the cluster analysis results of Mitte borough localities, the most popular venues within the cluster with 5 localities, suggest that they are probably tourist attraction sites with historical value. The single locality that clustered outside the rest seems to be a domestic area/ residential area since the popular venues include supermarket, pharmacy, gas station etc unlike localities in other cluster. 

Analysis of Friedrichshain-Kreuzberg  borough localities suggest that it is mostly a domestic/ residential area. Among the most popular venues, restaurants/cafes, backery, drugstore,bar,metro station, book store were seen supporting the suggestion. 

The above partial analysis results of borough localities and their popular venues using location data clearly shows that this type of analysis is beneficial in identifying which regions/ places to be given the highest priority, what measures to be taken and how they should be implemented in terms of allocating and managing funds to improve the underlying infrastructure of a borough.

Due to limited facilities provided by the IBM watson studio (free personal account) and difficulties in obtaining location data readily the analysis of boroughs of the city of Berlin were carried out partially as a demonstration. The complete analysis could be performed for all the localities per each borough and cluster analysis could be performed per each borough as well to identify the most similar regions of some specific popularity among the population. In addition, as described in the methodology the analysis for trending venues per locality was not reported in this analysis. Trending venues per locality are identified based on foot traffic at a specific time on a specific day the analysis is performed. This analysis should be carried out on a morning and evening during the weekend because majority of the population is expected to be out and spending their leisure with family and friends. This will help to understand where people tend to spend their time and doing what sort of things.   


## 6. Conclusion

A complete analysis of all localities per each and every borough in the Berlin city will ultimately help to understand the most popular places/ venues/ locations among the people and also their popular activities/ behavioral patterns. This information ca n then be effectively used by the local government bodies of boroughs in their decision making/ designing strategies and processes to improve the underlying infrastructure facilities to enhance living standards of mainly its inhabitants as well as its local/ foreign tourists. These measures may include ensuring safety of pedestrians by proper walking tracks/ pavements, pedestrian crossings, improving facilities for dis-able individuals, ample parking spaces with cheap and easy access, beautify the surroundings with trees, plants, ponds etc. Proper disposal facilities of garbage, take measures to ensure  that restaurants/hotels/cafes specially in popular regions provide healthy, clean food etc. 
