# Location Searching based on weather conditions

## Final Report of the Coursera Capstone Project

<p style='text-align: right;'> By Tobias Machnitzki </p>

### Introduction
Today everything is rated. Every Restaurant, Coffee shop and even parks. Many people look at
these ratings from providers like yelp or tripadvisor and then making their decision on where
to go based on these ratings. But what if we could put another component into play which would
be useful for every user: The weather conditions.

I am coming from a nature science background and therefore decided on trying to use
machine learning on meteorological data. More explicitly I will try to use the model output of one 
of the german weather models and cluster the weather conditions location-wise over the northern part
of germany. I will then use the Foursquare location data to examine where to a given location is the
next coffee shop with better weather conditions. I will not use real positional data, but rather 
an example position which will be hardcoded in the program.

This is all just a proof of concept, but in a real world scenario lots of stakeholders could be
interested in such an application. Actually any rating service, such as yelp or tripadvisor 
could use such data to not only provide the best coffee shop close to you, but to further provide
the one with the best actual weather conditions.

### Data
I will need two datasets for my application. One being the weather data from the german weather
service (DWD), second being the Foursquare location data.

#### 1. Weather data

The german weather services provides open access to many of their products. One of these products
are daily values from reanalysis of the past weather over germany. Therefore I will use the
temperature at 2 m above ground, the total precipitation and the sunshine duration for one example
day and cluster it. The day I will be examining is the 31st of July 2018. If this application was
for a real stakeholder we would need to think of how to retrieve live data, but since this is just
a proof of concept the reanalysis data will do just fine.

The reanalysis data can be retrieved over an public accessible ftp server: 
ftp://opendata.dwd.de/climate_environment/REA/COSMO_REA6/daily/2D/ in which the folder contain each
one output variable of the reanalysis model. We will need the following:
- DURSUN: Duration of sunshine
- TOT_PRECIP: Total precipitation
- TMAX_2M: maximum temperature 2m above the ground.

The files in those folders are .grib files, which is a common format for climate and weather data 
and which is quite easy to read with the python packages "xarray" and "cfgrib".

#### 2. Foursquare location data

Foursquare is a location database which provides an API to retrieve location data. We will only use the 
explore endpoint of that API in combination with the search key-word "coffee".

url = 'https://api.foursquare.com/v2/venues/explore'

The usage of that API is straightforward: Just place a get-request with the desired key-word, the 
latitude and longitude of your location and some credentials.
The result will be a json string containing different locations meeting your search requirements.



### Methodology
Once the three weather datasets are loaded, they have to be cleaned of any unused columns. 
Afterwards they are merged into one dataframe. There is no need in removing NAN values ore looking 
for other missing data, because the data follows a special convention, called "cf-convention". 
This means the data has already been checked on any errors in the dataset.

We can now start exploring the distribution of the three targets we are looking at and convert the 
number into categories for better clustering later.

 Variable  | lower boundary  | upper boundary | Category
-----------|-----------------|----------------|-----------
Rain       | 0               | 0.001          | No Rain
Rain       | 0.001           | 0.2            | low rain
Rain       | 0.2             | inf            | heavy rain
-----------|-----------------|----------------|-----------
Sunshine   | 0               | 35000          | low sunshine
Sunshine   | 35000           | 45000          | medium sunshine
Sunshine   | 45000           | inf            | high sunshine
-----------|-----------------|----------------|-----------
Temperature| 0               | 300            | low temperature
Temperature| 300             | 306            | medium temperature
Temperature| 306             | inf            | high temperature

The units of those variables are:
- Temperature [k] (Kelvin)
- Precipitation [kg/m^2]
- Sunshine [s] (Accumulated seconds of sunshine)

Remember that we are using for this example daily values and not instantaneous values.

After the conversion into categorical values a K-means algorithm can be applied using all three 
variables to cluster the weather conditions. I used K=3, so that later we can differentiate at 
each point one of three weather conditions:

0. Raining with low sunshine and low temperatures
1. No rain with medium sunshine and low-medium temperatures
2. No rain and high temperatures with full sunshine

But before the algorithm actually is applied the categorical data is first transformed 
into numerical integers (0, 1, 2) and than transformed into z-scores.

Now that we have our weather conditions at every point, we can start with the locational 
data. We imagine a user being at the following position lat=53.7287773, lon=10.2656004, close to 
the city of hamburg. Via visual exploration of our K-means results we can see that it is raining 
at that location. Lets imagine this user wants to find a coffee shop close by, where it is not 
raining at the moment. Therefore we take the whole dataset of clustered weather conditions and 
find the coordinates of the point of category 2. which is closest to our location. Lets call this 
point the target-location. For this target-location we can now place a Foursquare query with the 
key-word coffee shop. We find many results and take the first one, because the results are 
ordered by distance to the target-location.

That's it! We found the next coffee shop with sunshine to the actual location.

### Results

For the example situation that I have been using, meaning the weather conditions of the 31st of 
July 2018 and the target-location lat=53.7287773, lon=10.2656004 I found 29 results close to the 
target location. This means that the user actually could choose between those results on where to 
go. The clustering actually worked exactly as I hoped and the three categories that the algorithm 
predicted are very accurate and easy to interpret.


### Discussion

The algorithm works and is able to find the closest point of interest within a certain weather 
region. But there are a few things to notice.

First, when placing the query to Foursquare and getting the results it is not again evaluated, if 
the results are really close to the target location. They could as well be again in an area where 
the weather is not so nice again. This could be circumvented by checking the weather conditions of 
each query results again, but this would go to far for this example application.

Second, we are using daily values and not instantaneous measurements. This is ok for this example 
but would not work in a real world scenario. Furthermore it would make even more sense to use 
prediction. The travel time to the desired location could be used and the weather at the time of 
arrival could be evaluated. But again, all this would not fit in this course.

### Conclusion
We tried to find the best spot nearby to drink coffee in the sun when it is actually raining at 
our current location. 

It was shown, that the method provided works, but if this should actually be applicable then lots 
of work still needs to be done. On the other hand, a first example was built which shows the 
potential of the idea and which works with reanalysis weather data. With such an algorithm yelp 
or tripadvisor could expand their possibilities of searching for the right places and this means 
that the number of potential costumers grows. This again means more profit from commercial 
placements on their websites.

For the Customer this would enable a whole new possibility to find the best location to have lunch 
or coffee, because often weather has a very high variability and therefore with only a short 
travel time a much nicer place too eat can be found.