# What is the relationship between geolocation and weather metrics?
## Introduction
[Weather](https://www.merriam-webster.com/dictionary/weather) is the condition of the atmosphere at a given time for a given location. Differences in weather metrics are attributed to the angle of the [Earth's tilt](https://sciencing.com/tilt-earth-affect-weather-8591690.html), which causes different parts of the Earth to be exposed to the Sun differently. In this study, association between geo-location of cities and four weather metrics were investigated: temperature, humidity, cloudiness, and wind speed. 

## Data analyses
Several Python modules are used in running this script. 

* [Citipy](https://pypi.org/project/citipy/) finds nearest cities based on geographical coordinates. 

* [JSON](https://docs.python.org/2/library/json.html) handles data coming from the OpenWeatherData API. 

* [Requests](http://docs.python-requests.org/en/master/) send requests to the OpenWeatherMap API. 

* [Pprint](https://docs.python.org/3/library/pprint.html) allows JSON data to be presented in a more human-readable fashion. 

* Random numbers are generated using [NumPy](http://www.numpy.org/). 

* Dataframes are generated using [Pandas](https://pandas.pydata.org/). 

* [Matplotlib](https://matplotlib.org/) creates scatterplots from the data extracted from JSON files. 

In [1]:
# Dependencies
import os
import pandas as pd
from citipy import citipy # get city and country designations based on latitude and longitude
import numpy as np
from pprint import pprint
from config import api_key
from config import gkey
import matplotlib.pyplot as plt
import json
import requests
import datetime as dt # to put a datestamp on the outputs

Weather changes on a day-to-day basis. Hence, traceability of outputs is enhanced by adding a date suffix to the files names.

In [2]:
# Get the date today for the file date-stamp
today = dt.datetime.today().strftime('%Y%m%d')

### Creating a list of cities
To ensure that the selection of cities is random, 2000 random numbers between -90 and 90 (for latitude) and between -180 and 180 (for longitude) are generated. 

__Notes__: [Latitude](https://www.nasa.gov/audience/forstudents/k-4/dictionary/Latitude.html) is a geographic location metric expressing a point's distance north (positive) or south (negative) of the Equator while [longitude](https://www.nasa.gov/audience/forstudents/k-4/dictionary/Longitude.html) indicates the point's distance east or west from the [Prime Meridian](https://gisgeography.com/prime-greenwich-meridian/) in England. [Coordinates](https://www.britannica.com/science/latitude) are pairs of latitudes and longitudes that point to different locations on Earth. 

In [3]:
# Create a random list of latitudes and longitudes
# np.random.uniform makes sure that all the numbers in the range get equal chance of getting picked

latitudes = np.random.uniform(-90, 90, size = 2000) # 500 random numbers from -90 to 90 deg latitude
longitudes = np.random.uniform(-180, 180, size = 2000) # 500 random numbers from -180 to 180 deg longitude

coordinates = list(zip(latitudes, longitudes))

[Oceans](https://www.oceanicinstitute.org/aboutoceans/aquafacts.html) cover 71% of the Earth's surface, which indicates that there is a high chance that any randomly generated coordinate points to water, rather than to a city. Hence, to get at least __500__ cities in the survey, it is important to get the nearest cities to the coordinates. Country codes are included to reduce ambiguity (i.e., some cities have the same name but are found in different countries).

In [4]:
# Nearby cities per latitude-longitude pair
cities = []

for coord in coordinates:
    lat, lon = coord
    cities.append(citipy.nearest_city(lat, lon))

Several coordinates generated can point to the same city, leading to duplicates. The list of nearby cities are, therefore, further filtered for duplicates.

In [5]:
# Set of nearby cities
city_names = []
country = []

for city in cities:
    city_names.append(city.city_name) # loop through the city coordinates to get the city names
    country.append(city.country_code) # loop through the city coordinates to get the country codes
    
city_dict ={
    "latitude": latitudes,
    "longitude": longitudes,
    "city": city_names,
    "country": country
           }    

city_df = pd.DataFrame(city_dict)
city_df = city_df.drop_duplicates(["city","country"]) # drop city-country duplicates
city_df = city_df.dropna() # drop rows with missing values
city_df.head()

Unnamed: 0,latitude,longitude,city,country
0,22.025584,-84.59876,mantua,cu
1,77.675606,111.078651,saskylakh,ru
2,-33.669056,-20.050412,jamestown,sh
3,26.439971,-5.460941,taoudenni,ml
4,-55.870308,-4.823794,cape town,za


Before determining weather conditions for the randomly selected cities, it is important to make sure that these are included in the cities in the [OpenWeatherMap](https://openweathermap.org/) databases and have city IDs. A [JSON file](http://bulk.openweathermap.org/sample/) containing city names is cross-referenced to collect city IDs for the randomly selected cities.

In [6]:
# Get JSON file containing city ID (downloaded)
filepath = os.path.join("Resources","city.list.json")
with open(filepath) as json_file:
    json_data = json.load(json_file)

In [7]:
# Get the city name, country code, and city ID 
# City IDs are recommended by the OpenWeatherMap API because they are unique identifiers. 
# There could be cities in the list with the same name but are located in different countries.

ct_name = []
co_name = []
ct_ID = []

for i in json_data:
    ct_name.append(i["name"]) # city name from JSON file
    co_name.append(i["country"]) # country name from JSON file
    ct_ID.append(i["id"]) # city ID from JSON file

In [8]:
# Put the JSON-sourced data into a dataframe
json_dict = {
    "city": ct_name,
    "country": co_name,
    "city ID": ct_ID}

json_df = pd.DataFrame(json_dict)
json_df["city"] = json_df["city"].str.lower() # make the letters lowercase
json_df["country"] = json_df["country"].str.lower() # make the letters lowercase
json_df.head()

Unnamed: 0,city,country,city ID
0,hurzuf,ua,707860
1,novinki,ru,519188
2,gorkhā,np,1283378
3,state of haryāna,in,1270260
4,holubynka,ua,708546


In [9]:
# Merge json_df and city_df
city_df2 = pd.merge(city_df,json_df)
city_df2 = city_df2.drop_duplicates(["city", "country"])
print(f"There are {len(city_df2)} cities in this dataframe.")
city_df2.head()

There are 668 cities in this dataframe.


Unnamed: 0,latitude,longitude,city,country,city ID
0,22.025584,-84.59876,mantua,cu,3547930
1,77.675606,111.078651,saskylakh,ru,2017155
2,-33.669056,-20.050412,jamestown,sh,3370903
3,26.439971,-5.460941,taoudenni,ml,2450173
4,-55.870308,-4.823794,cape town,za,3369157


In [10]:
# save the dataframe as a (dated) csv file 
city_df2.to_csv('nearest_cities_{}.csv'.format(today), index = False, encoding = 'utf-8')

### Finding the geo-locations of the cities in the dataframe
With the nearby cities selected, the randomly generated coordinates are no longer needed and the columns can be dropped from the dataframe (to avoid mistaking the coordinates as those of the cities). 

Latitude and longitude information for the cities are generated through the [Google Maps API](https://developers.google.com/maps/documentation/javascript/tutorial).

In [11]:
# drop latitude and longitude
city_df2 = city_df2.drop(["latitude", "longitude"], axis = 1)

# Fill in the coordinates for each city
city_df2["city_latitude"] = "" # make the latitude blank to get city latitude via Google Maps API
city_df2["city_longitude"] = "" # make the longitude blank to get city longitude via Good Maps API
city_df2.head()

Unnamed: 0,city,country,city ID,city_latitude,city_longitude
0,mantua,cu,3547930,,
1,saskylakh,ru,2017155,,
2,jamestown,sh,3370903,,
3,taoudenni,ml,2450173,,
4,cape town,za,3369157,,


The progress of retrieving information from the Google Maps API and of recording the information in the dataframe is indicated by messages. In case there are errors encountered for a particular city, an error message is printed and the information retrieval and recording is continued with the next city.

In [None]:
# Create a parameters dictionary to contain the variables that will be updated through the search
params = {"key": gkey}

# Create the iterations through the list of cities 
for index, row in city_df2.iterrows():
    city = row["city"]
    country = row["country"]
    
    # base url
    base_url = "https://maps.googleapis.com/maps/api/geocode/json"
    
    # add location in params based on the cities in the dataframe
    params["address"] = f'{city},{country}'
    
    # create request and JSON-ify
    print(f"Retrieving the location of City {index}: {city},{country}")
    cities_coords_json = requests.get(base_url, params = params).json()
    
    # add the latitude and longitude of each city into the list
    try:
        print(f"{city} is on the map! Recording coordinates now.")
        city_df2.loc[index, "city_latitude"] = cities_coords_json["results"][0]["geometry"]["location"]["lat"]
        city_df2.loc[index, "city_longitude"] = cities_coords_json["results"][0]["geometry"]["location"]["lng"]
    except (KeyError, IndexError):
        print(f"{city} has not yet been discovered. Moving on...")
    
    print("----------")    

In [None]:
# drop cities that return blanks
city_df2 = city_df2.dropna()

# save the dataframe as a (dated) csv file 
city_df2.to_csv('city_list_{}.csv'.format(today), index = False, encoding = 'utf-8')

### Current weather conditions in nearby cities
Blank columns for the four weather metrics are added to the dataframe.

In [None]:
# Add new columns
city_df2["Temperature (F)"] = ""
city_df2["Humidity (%)"] = ""
city_df2["Cloudiness (%)"] = ""
city_df2["Wind Speed (mph)"] = ""

The OpenWeatherMap API is used to collect current weather metrics for the cities. The progress of the run is indicated. While weather data is being extracted, the cloudiness of each city is printed. In case an error is encountered while retrieving data for a city, an error message ("It will rain meatballs!") is printed and the process is continued for the next city in the queue.

In [None]:
# Formulate the query URL
url = "http://api.openweathermap.org/data/2.5/weather?"
units = "imperial"

# Loop through the rows to get the different cities and countries
for index, row in city_df2.iterrows():
    x = row["city"]
    y = row["country"]
    
    # Create a variable for iteration through the dataframe
    city_ID = row["city ID"]
    
    # Create a query URL
    query_url = f"{url}&appid={api_key}&id={city_ID}&units={units}"
    
    # Create a request and JSON-ify
    print(f"Retrieving information for City {index}: {x},{y} (ID: {city_ID}).")
    response = requests.get(query_url)
    response_json = response.json()
    
    # Extract response
    try:
        print(f"{x},{y} (ID: {city_ID}) has {response_json['clouds']['all']}% cloudiness.")
        
        city_df2.loc[index, "Cloudiness (%)"] = response_json["clouds"]["all"]
        city_df2.loc[index, "Humidity (%)"] = response_json["main"]["humidity"]
        city_df2.loc[index, "Wind Speed (mph)"] = response_json["wind"]["speed"]
        city_df2.loc[index, "Temperature (F)"] = response_json["main"]["temp"]
    
    except (KeyError, IndexError):
        print("The city is not on the list. It will rain meatballs!")
    
    print("----------")

If there are no errors during the run (the forecast does not have meatballs), the dataframe should now contain the weather information for each city. The dataframe is also saved as a (dated) csv file.

In [None]:
# view complete dataframe
city_df2

# save the dataframe as a csv file 
city_df2.to_csv('weather_output_{}.csv'.format(today), index = False, encoding = 'utf-8')

# Preview the dataframe
city_df2.head()

### Visualising weather trends
The patterns for current weather conditions can be visualised via scatterplots. These plots follow the [ggplot](http://ggplot.yhathq.com/) aesthetic.

In [None]:
# Choose ggplot as style for plots
plt.style.use('ggplot')

# Size of plots
fig_size = plt.rcParams["figure.figsize"] # get current size
fig_size[0] = 12
fig_size[1] = 8
plt.rcParams["figure.figsize"] = fig_size # customise plot size

Four plots explore trends between latitude and weather metrics.

In [None]:
# Latitude vs Temperature
plt.scatter(city_df2["latitude"], city_df2["Temperature (F)"])
plt.xlabel("Latitude")
plt.ylabel("Temperature (deg F)")
plt.title(f"Temperature (deg F) by Latitude for {len(city_df2)} Cities.\n(Data retrieved via the OpenWeatherData API on {today}.)")

# Save image
plt.savefig("Images/lat_temp_{}.png".format(today))
plt.show()

In [None]:
# Latitude vs Humidity
plt.scatter(city_df2["latitude"], city_df2["Humidity (%)"])
plt.xlabel("Latitude")
plt.ylabel("Humidity (%)")
plt.title(f"Humidity (%) by Latitude for {len(city_df2)} Cities.\n(Data retrieved via the OpenWeatherData API on {today}.)")

# Save image
plt.savefig("Images/lat_hum_{}.png".format(today))
plt.show()

In [None]:
# Latitude vs Cloudiness
plt.scatter(city_df2["latitude"], city_df2["Cloudiness (%)"])
plt.xlabel("Latitude")
plt.ylabel("Cloudiness (%)")
plt.title(f"Cloudiness (%) by Latitude for {len(city_df2)} Cities.\n(Data retrieved via the OpenWeatherData API on {today}.)")

# Save image
plt.savefig("Images/lat_cloud_{}.png".format(today))
plt.show()

In [None]:
# Latitude vs Wind Speed
plt.scatter(city_df2["latitude"], city_df2["Wind Speed (mph)"])
plt.xlabel("Latitude")
plt.ylabel("Wind Speed (mph)")
plt.title(f"Wind Speed (mph) by Latitude for {len(city_df2)} Cities.\n(Data retrieved via the OpenWeatherData API on {today}.)")

# Save image
plt.savefig("Images/lat_wind_{}.png".format(today))
plt.show()

In [None]:
# Longitude vs Temperature
plt.scatter(city_df2["longitude"], city_df2["Temperature (F)"])
plt.xlabel("Longitude")
plt.ylabel("Temperature (deg F)")
plt.title(f"Temperature (deg F) by Longitude for {len(city_df2)} Cities.\n(Data retrieved via the OpenWeatherData API on {today}.)")

# Save image
plt.savefig("Images/long_temp_{}.png".format(today))
plt.show()

In [None]:
# Longitude vs Humidity
plt.scatter(city_df2["longitude"], city_df2["Humidity (%)"])
plt.xlabel("Longitude")
plt.ylabel("Humidity (%)")
plt.title(f"Humidity (%) by Longitude for {len(city_df2)} Cities.\n(Data retrieved via the OpenWeatherData API on {today}.)")

# Save image
plt.savefig("Images/long_hum_{}.png".format(today))
plt.show()

In [None]:
# Longitude vs Cloudiness
plt.scatter(city_df2["longitude"], city_df2["Cloudiness (%)"])
plt.xlabel("Longitude")
plt.ylabel("Cloudiness (%)")
plt.title(f"Cloudiness (%) by Longitude for {len(city_df2)} Cities.\n(Data retrieved via the OpenWeatherData API on {today}.)")

# Save image
plt.savefig("Images/long_cloud_{}.png".format(today))
plt.show()

In [None]:
# Longitude vs Wind Speed
plt.scatter(city_df2["longitude"], city_df2["Wind Speed (mph)"])
plt.xlabel("Longitude")
plt.ylabel("Wind Speed (mph)")
plt.title(f"Wind Speed (mph) by Longitude for {len(city_df2)} Cities.\n(Data retrieved via the OpenWeatherData API on {today}.)")

# Save image
plt.savefig("Images/long_wind_{}.png".format(today))
plt.show()

In [None]:
# Humidity vs Wind Speed
plt.scatter(city_df2["Humidity (%)"], city_df2["Wind Speed (mph)"])
plt.xlabel("Humidity (%)")
plt.ylabel("Wind Speed (mph)")
plt.title(f"Wind Speed (mph) by Humidity (%) for {len(city_df2)} Cities.\n(Data retrieved via the OpenWeatherData API on {today})")

# Save image
plt.savefig("Images/hum_wind_{}.png".format(today))
plt.show()

In [None]:
# Humidity vs Cloudiness
plt.scatter(city_df2["Humidity (%)"], city_df2["Cloudiness (%)"])
plt.xlabel("Humidity (%)")
plt.ylabel("Cloudiness (%)")
plt.title(f"Cloudiness (%) by Humidity (%) for {len(city_df2)} Cities.\n(Data retrieved via the OpenWeatherData API on {today}.)")

# Save image
plt.savefig("Images/hum_cloud_{}.png".format(today))
plt.show()

In [None]:
# Humidity vs Wind Speed
plt.scatter(city_df2["Cloudiness (%)"], city_df2["Wind Speed (mph)"])
plt.xlabel("Cloudiness (%)")
plt.ylabel("Wind Speed (mph)")
plt.title(f"Wind Speed (mph) by Cloudiness (%) for {len(city_df2)} Cities.\n(Data retrieved via the OpenWeatherData API on {today}.)")

# Save image
plt.savefig("Images/cloud_wind_{}.png".format(today))
plt.show()