# Final Report of the Coursera Capstone Project


<p style='text-align: justify;'> By Tobias Machnitzki </p>

### Data
I will need two datasets for my application. One being the weather data from the german weather
service (DWD), second being the Foursquare location data.



### 1. Weather Data
The german weather services provides open access to many of their products. One of these products
are daily values from reanalysis of the past weather over germany. Therefore I will use the
temperature at 2 m above ground, the total precipitation and the sunshine duration for one example
day and cluster it. The day I will be examining is the 31st of July 2018. If this application was
for a real stakeholder we would need to think of how to retrieve live data, but since this is just
a proof of concept the reanalysis data will do just fine.

The reanalysis data can be retrieved over an public accessible ftp server: 
ftp://opendata.dwd.de/climate_environment/REA/COSMO_REA6/daily/2D/ in which the folder contain each
one output variable of the reanalysis model. We will need the following:
- DURSUN: Duration of sunshine
- TOT_PRECIP: Total precipitation
- TMAX_2M: maximum temperature 2m above the ground.

The files in those folders are .grb files, which is a common format for climate and weather data 
and which is quite easy to read with the python packages "xarray" and "cfgrib".


In [1]:
# actually cfgrib does not need to be loaded, it just needs to be installed.
import xarray as xr
import pandas as pd

In [2]:
# Open the file:
ds_rain = xr.open_dataset('../data/TOT_PRECIP.2D.201807.DaySum.grb', engine='cfgrib')

# Select the last time step which is the 31st of July 2018 and select only the northern part of germany:
ds_rain = ds_rain.isel(dict(x=slice(400,450), y=slice(450,500))).isel(time=30)

# Load data into pandas dataframe:
df_rain = ds_rain.tp.to_dataframe()
df_rain

Unnamed: 0_level_0,Unnamed: 1_level_0,time,step,surface,latitude,longitude,valid_time,tp
y,x,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
0,0,2018-07-31 01:00:00,1 days,0,51.648840,7.648655,2018-08-01 01:00:00,0.0
0,1,2018-07-31 01:00:00,1 days,0,51.656460,7.736423,2018-08-01 01:00:00,0.0
0,2,2018-07-31 01:00:00,1 days,0,51.664020,7.824221,2018-08-01 01:00:00,0.0
0,3,2018-07-31 01:00:00,1 days,0,51.671512,7.912048,2018-08-01 01:00:00,0.0
0,4,2018-07-31 01:00:00,1 days,0,51.678940,7.999903,2018-08-01 01:00:00,0.0
...,...,...,...,...,...,...,...,...
49,45,2018-07-31 01:00:00,1 days,0,54.612072,11.223554,2018-08-01 01:00:00,0.0
49,46,2018-07-31 01:00:00,1 days,0,54.617060,11.317904,2018-08-01 01:00:00,0.0
49,47,2018-07-31 01:00:00,1 days,0,54.621984,11.412277,2018-08-01 01:00:00,0.0
49,48,2018-07-31 01:00:00,1 days,0,54.626836,11.506672,2018-08-01 01:00:00,0.0


The variable "tp" is the data we are interested in.

This pocedure has to be repeated again for the temperature and the sunshine duration data.

### 2. Foursquare location data

Foursquare is a location database which provides an API to retrieve location data. We will only use the 
explore endpoint of that API in combination with the search key-word "coffee".

url = 'https://api.foursquare.com/v2/venues/explore'

The usage of that API is straightforward: Just place a get-request with the desired key-word, the latitude and longitude of your location and some credentials.
The result will be a json string containing different locations meeting your search requirements.


In [6]:
import requests
import json

In [7]:
CLIENT_ID = ''
CLIENT_SECRET = ''

In [10]:
url = 'https://api.foursquare.com/v2/venues/explore'

lat = 53
lon = 9

params = dict(
  client_id=CLIENT_ID,
  client_secret=CLIENT_SECRET,
  v='20191223',
  ll=f"{lat},{lon}",
  query='coffee',
  limit=100
)
resp = requests.get(url=url, params=params)
data = json.loads(resp.text)
result_list = data["response"]["groups"][0]["items"]
print(f"Found {len(result_list)} results at Foursquare!")

Found 6 results at Foursquare!


Each result looks like the following and has to be further processed in order to be actually usable:

In [12]:
result_list[0]

{'reasons': {'count': 0,
  'items': [{'summary': 'This spot is popular',
    'type': 'general',
    'reasonName': 'globalInteractionReason'}]},
 'venue': {'id': '4c25d012f1272d7f647285c5',
  'name': 'Atrium',
  'location': {'address': 'Obernstr. 38',
   'lat': 53.011943979195955,
   'lng': 9.036327816928425,
   'labeledLatLngs': [{'label': 'display',
     'lat': 53.011943979195955,
     'lng': 9.036327816928425}],
   'distance': 2772,
   'postalCode': '28832',
   'cc': 'DE',
   'city': 'Achim',
   'state': 'Niedersachsen',
   'country': 'Deutschland',
   'formattedAddress': ['Obernstr. 38', '28832 Achim', 'Deutschland']},
  'categories': [{'id': '4bf58dd8d48988d16d941735',
    'name': 'Café',
    'pluralName': 'Cafés',
    'shortName': 'Café',
    'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food/cafe_',
     'suffix': '.png'},
    'primary': True}],
  'photos': {'count': 0, 'groups': []}},
 'referralId': 'e-0-4c25d012f1272d7f647285c5-0'}

Lets make a pandas dataframe out of that, because that is way nicer to explore:

In [13]:
results = []
for result in result_list:
    r = result['venue']['location']
    results.append([r['lat'], r['lng'], result['venue']['name']])
df_results = pd.DataFrame(results)
df_results.rename(columns={0: 'lat', 1: 'lon', 2: 'name'}, inplace=True)
df_results

Unnamed: 0,lat,lon,name
0,53.011944,9.036328,Atrium
1,53.053722,8.961498,Café del Sol
2,53.049475,8.959297,Starbucks
3,53.048535,8.957596,Cafe im Park Weserpark
4,53.049599,8.959073,Barnstorff
5,53.049084,8.957359,Kaffee Werk
