# WeatherPy

## Observations

The first observation made from this data analysis is that the maximum temperature of the cities dataset is higher the closer the city is to the zero latitude line or equator.  This observation makes sense as the primary reason for higher temperatures at the equator is because that section of the planet is closer towards the sun than the poles are to the sun.

The second observation is that for this time of year (August 2021), it can be observed that the maximum temperature of the cities occurs at slightly higher latitudes from the equator due to the summer solstice.  Conversely, temperatures for cities below the zero line of the equator are lower at this time of year due to the souther winter solstice.

The third observation is that there is little to no correlation between a city's latitude and its humidity, cloudiness, or wind speed.  Humidity and cloudiness are probably more related to how close the city is to a water source such as an ocean, lake or other large body of water.  Wind speeds are probably more correlated with other natural phenomena such as global patterns of movement in the Earth's atmosphere, and whether the city is near mountains and has higher elevation, or is located in an open lower elevation plain.

More detail on these observations is provided below under each graph figure.

In [1]:
# Dependencies and Setup
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import requests
import time
from scipy.stats import linregress
import math
import os
from dotenv import load_dotenv

In [2]:
# API key setup
load_dotenv()

weather_api_key = os.getenv("API_KEY_OPEN_WEATHER")

In [3]:
from citipy import citipy

# Output File (CSV)
output_data_file = "output_data/cities.csv"

# Range of latitudes and longitudes
lat_range = (-90, 90)
lng_range = (-180, 180)

## Generate Cities List

In [4]:
# List for holding lat_lngs and cities
lat_lngs = []
cities = []

# Create a set of random lat and lng combinations
lats = np.random.uniform(lat_range[0], lat_range[1], size=1500)
lngs = np.random.uniform(lng_range[0], lng_range[1], size=1500)
lat_lngs = zip(lats, lngs)

# Identify nearest city for each lat, lng combination
for lat_lng in lat_lngs:
    city = citipy.nearest_city(lat_lng[0], lat_lng[1]).city_name
    
    # If the city is unique, then add it to our cities list
    if city not in cities:
        cities.append(city)

# Print the city count to confirm sufficient count
len(cities)

633

In [5]:
city_name = pd.DataFrame(cities,columns=['Name'])
city_name.tail()

Unnamed: 0,Name
628,laufen
629,batagay-alyta
630,abha
631,halifax
632,passo de camaragibe


### Perform API Calls
* Perform a weather check on each city using a series of successive API calls.
* Include a print log of each city as it'sbeing processed (with the city number and city name).


In [6]:
# Build url for queries
base_url = "http://api.openweathermap.org/data/2.5/weather?"

In [None]:
print("Beginning Data Retrieval")
print("-"*32)
set_number = 1
record = 1
data = []

for x in range(len(city_name)):
    if record > 50:
        record = 1
        set_number += 1
    else:
        print(f"Processing Record {record} of Set {set_number} | #{x}, {city_name['Name'][x]}")
        query_url = f"{base_url}appid={weather_api_key}&q={city_name['Name'][x]}&units=imperial"
        weather = requests.get(query_url).json()
        data.append(weather)
        record += 1

print("-"*32)
print("Data Retrieval Complete")
print("-"*32)

Beginning Data Retrieval
--------------------------------
Processing Record 1 of Set 1 | #0, roma
Processing Record 2 of Set 1 | #1, tautira
Processing Record 3 of Set 1 | #2, victoria
Processing Record 4 of Set 1 | #3, antalya
Processing Record 5 of Set 1 | #4, san cristobal
Processing Record 6 of Set 1 | #5, san quintin
Processing Record 7 of Set 1 | #6, avarua
Processing Record 8 of Set 1 | #7, zhanakorgan
Processing Record 9 of Set 1 | #8, aklavik
Processing Record 10 of Set 1 | #9, morros
Processing Record 11 of Set 1 | #10, albany
Processing Record 12 of Set 1 | #11, port macquarie
Processing Record 13 of Set 1 | #12, nadym
Processing Record 14 of Set 1 | #13, qaanaaq
Processing Record 15 of Set 1 | #14, meulaboh
Processing Record 16 of Set 1 | #15, semey
Processing Record 17 of Set 1 | #16, ushuaia
Processing Record 18 of Set 1 | #17, quelimane
Processing Record 19 of Set 1 | #18, inirida
Processing Record 20 of Set 1 | #19, mataura
Processing Record 21 of Set 1 | #20, vaini
Pro

Processing Record 27 of Set 4 | #179, gra liyia
Processing Record 28 of Set 4 | #180, qaqortoq
Processing Record 29 of Set 4 | #181, samfya
Processing Record 30 of Set 4 | #182, ravar
Processing Record 31 of Set 4 | #183, palembang
Processing Record 32 of Set 4 | #184, nalut
Processing Record 33 of Set 4 | #185, huazolotitlan
Processing Record 34 of Set 4 | #186, mahebourg
Processing Record 35 of Set 4 | #187, sahuaripa
Processing Record 36 of Set 4 | #188, rehoboth
Processing Record 37 of Set 4 | #189, umzimvubu
Processing Record 38 of Set 4 | #190, luderitz
Processing Record 39 of Set 4 | #191, nuevo progreso
Processing Record 40 of Set 4 | #192, san patricio
Processing Record 41 of Set 4 | #193, porto novo
Processing Record 42 of Set 4 | #194, pevek
Processing Record 43 of Set 4 | #195, port-gentil
Processing Record 44 of Set 4 | #196, loralai
Processing Record 45 of Set 4 | #197, la ronge
Processing Record 46 of Set 4 | #198, mumford
Processing Record 47 of Set 4 | #199, hobart
Pro

### Convert Raw Data to DataFrame
* Export the city data into a .csv.
* Display the DataFrame

In [None]:
import json

print(json.dumps(data,indent=4,sort_keys=True))

In [None]:
from datetime import datetime # This is needed to get the timestamp for plotting purposes

dataframe = []

for x in range(len(data)):
    if data[x]['cod']=='404':
        dataframe.append([city_name['Name'][x]])
    else:
        dataframe.append([city_name['Name'][x],
                         data[x]['coord']['lat'],
                         data[x]['coord']['lon'],
                         data[x]['main']['temp_max'],
                         data[x]['main']['humidity'],
                         data[x]['clouds']['all'],
                         data[x]['wind']['speed'],
                         data[x]['sys']['country'],
                         data[x]['dt']])

df = pd.DataFrame(dataframe, columns =['Name','Latitude','Longitude',
                                       'Max_Temp','Humidity',
                                       'Cloudiness','Wind_Speed',
                                       'Country','Date'])
df_clean = df.dropna() # remove NaN values
df_clean = df_clean.reset_index(drop=True)
df_clean.to_csv('../output_data/weather_data.csv')
timestamp = datetime.now()
timestampstr = timestamp.strftime("%m/%d/%y") # this will be used for plotting
df_clean.tail()

## Inspect the data and remove the cities where the humidity > 100%.
----
Skip this step if there are no cities that have humidity > 100%. 

In [None]:
#  Get the indices of cities that have humidity over 100%.
humidity_check = df_clean[(df_clean['Humidity']>100)]
humid_count = humidity_check.Humidity.count()
if humid_count > 0:
    print(f"There are {humid_count} cities with humidity greater than 100%.")
    df_clean = df_clean[(df_clean['Humidity']<=100)]
else:
    print("There are no cities with humidity greater than 100%.")

## Plotting the Data
* Use proper labeling of the plots using plot titles (including date of analysis) and axes labels.
* Save the plotted figures as .pngs.

## Fig 1. Latitude vs. Temperature Plot

In [None]:
x = df_clean['Latitude']
y = df_clean['Max_Temp']
plt.title(f"City Latitude vs. Max Temperature ({timestampstr})")
plt.xlabel("Latitude")
plt.ylabel("Max Temperature (F)")
plt.grid()
plt.scatter(x,y,edgecolor="black")
plt.savefig('../output_data/fig1_lat_v_temp.png')
plt.show()

Figure 1 shows the relationship between the max temperature and latitude for the date specified.

As can be seen, latitudes closer towards zero are generally higher, however we should add that at the current date of 8/27/2021, there is a lean towards higher temperatures at slightly higher latitudes due to the summer soltice.

## Fig 2. Latitude vs. Humidity Plot

In [None]:
x = df_clean['Latitude']
y = df_clean['Humidity']
plt.title(f"City Latitude vs. Humidity ({timestampstr})")
plt.xlabel("Latitude")
plt.ylabel("Humidity (%)")
plt.grid()
plt.scatter(x,y,edgecolor="black")
plt.savefig('../output_data/fig2_lat_v_humidity.png')
plt.show()

Figure 2 shows the relationship between the humidity and latitude for the date specified.

There is no clear relationship between latitude and humidity.

## Fig 3. Latitude vs. Cloudiness Plot

In [None]:
x = df_clean['Latitude']
y = df_clean['Cloudiness']
plt.title(f"City Latitude vs. Cloudiness ({timestampstr})")
plt.xlabel("Latitude")
plt.ylabel("Cloudiness (%)")
plt.grid()
plt.scatter(x,y,edgecolor="black")
plt.savefig('../output_data/fig3_lat_v_cloudiness.png')
plt.show()

Figure 3 shows the relationship between the cloudiness and latitude for the date specified.

There is no clear relationship between latitude and cloudiness.

## Fig 4. Latitude vs Wind Speed Plot

In [None]:
x = df_clean['Latitude']
y = df_clean['Wind_Speed']
plt.title(f"City Latitude vs. Wind Speed ({timestampstr})")
plt.xlabel("Latitude")
plt.ylabel("Wind Speed (mph)")
plt.grid()
plt.scatter(x,y,edgecolor="black")
plt.savefig('../output_data/fig4_lat_v_windspeed.png')
plt.show()

Figure 4 shows the relationship between the wind speed and latitude for the date specified.

There is a very weak inverse correlation in wind speed vs latitude and max temperature vs latitude.  For example, the latitudes where max temperature is currently highest appear to have lower windspeed in general than latitudes at the extremes such as near the poles.  However, this inverse correlation appears to be very weak.

## Linear Regression

In [None]:
# Divide data between northern and southern hemispheres
n_hem = df_clean[(df_clean.Latitude > 0)]
s_hem = df_clean[(df_clean.Latitude < 0)]

####  Fig 5. Northern Hemisphere - Max Temp vs. Latitude Linear Regression

In [None]:
x = n_hem['Latitude']
y = n_hem['Max_Temp']
(slope, intercept, rvalue, pvalue, stderr) = linregress(x, y)
line = x * slope + intercept
plt.plot(x,line,'r',
         label='y = {:.2f}x + {:.2f}\n   $r^2$ = {:.2f}'.format(slope,intercept,rvalue))
plt.title(f"Northern Hemisphere Latitude vs. Max Temperature ({timestampstr})")
plt.xlabel("Latitude")
plt.ylabel("Max Temperature (F)")
plt.legend(fontsize=12)
plt.grid()
plt.scatter(x,y,edgecolor="black")
plt.savefig('../output_data/fig5_n_hem_lat_v_temp.png')
plt.show()

####  Fig 6. Southern Hemisphere - Max Temp vs. Latitude Linear Regression

In [None]:
x = s_hem['Latitude']
y = s_hem['Max_Temp']
(slope, intercept, rvalue, pvalue, stderr) = linregress(x, y)
line = x * slope + intercept
plt.plot(x,line,'r',
         label='y = {:.2f}x + {:.2f}\n   $r^2$ = {:.2f}'.format(slope,intercept,rvalue))
plt.title(f"Southern Hemisphere Latitude vs. Max Temperature ({timestampstr})")
plt.xlabel("Latitude")
plt.ylabel("Max Temperature (F)")
plt.legend(fontsize=12)
plt.grid()
plt.scatter(x,y,edgecolor="black")
plt.savefig('../output_data/fig6_s_hem_lat_v_temp.png')
plt.show()

In Figures 5 and 6, we look at the relationship between max temperature vs latitude for both the northern and southern hemispheres, respectively.

As is evidenced by the data through the linear regression r squared values of greater than or equal to 70, there is a strong correlation between latitudes that are closer to the zero latitude line and higher max temperatures.

As of this date, 8/27/2021, it can be seen that the r squared value of the southern hemisphere has a greater value than that of the northern hemisphere.  This is most probably due to the summer solstice in the northern hemisphere which causes higher temperatures to occur at slightly higher than the zero latitude line.

####  Fig 7. Northern Hemisphere - Humidity (%) vs. Latitude Linear Regression

In [None]:
x = n_hem['Latitude']
y = n_hem['Humidity']
(slope, intercept, rvalue, pvalue, stderr) = linregress(x, y)
line = x * slope + intercept
plt.plot(x,line,'r',
         label='y = {:.2f}x + {:.2f}\n   $r^2$ = {:.2f}'.format(slope,intercept,rvalue))
plt.title(f"Northern Hemisphere Latitude vs. Humidity ({timestampstr})")
plt.xlabel("Latitude")
plt.ylabel("Humidity (%)")
plt.legend(fontsize=12)
plt.grid()
plt.scatter(x,y,edgecolor="black")
plt.savefig('../output_data/fig7_n_hem_lat_v_humidity.png')
plt.show()

####  Fig 8. Southern Hemisphere - Humidity (%) vs. Latitude Linear Regression

In [None]:
x = s_hem['Latitude']
y = s_hem['Humidity']
(slope, intercept, rvalue, pvalue, stderr) = linregress(x, y)
line = x * slope + intercept
plt.plot(x,line,'r',
         label='y = {:.2f}x + {:.2f}\n   $r^2$ = {:.2f}'.format(slope,intercept,rvalue))
plt.title(f"Southern Hemisphere Latitude vs. Humidity ({timestampstr})")
plt.xlabel("Latitude")
plt.ylabel("Humidity (%)")
plt.legend(fontsize=12)
plt.grid()
plt.scatter(x,y,edgecolor="black")
plt.savefig('../output_data/fig8_s_hem_lat_v_humidity.png')
plt.show()

In Figures 7 and 8, we look at the relationship between humidity vs latitude for both the northern and southern hemispheres, respectively.

As is evidenced by the data through the linear regression r squared values of close to zero, there is little to no correlation between humidity and latitude.

####  Fig 9. Northern Hemisphere - Cloudiness (%) vs. Latitude Linear Regression

In [None]:
x = n_hem['Latitude']
y = n_hem['Cloudiness']
(slope, intercept, rvalue, pvalue, stderr) = linregress(x, y)
line = x * slope + intercept
plt.plot(x,line,'r',
         label='y = {:.2f}x + {:.2f}\n   $r^2$ = {:.2f}'.format(slope,intercept,rvalue))
plt.title(f"Northern Hemisphere Latitude vs. Cloudiness ({timestampstr})")
plt.xlabel("Latitude")
plt.ylabel("Cloudiness (%)")
plt.legend(fontsize=12)
plt.grid()
plt.scatter(x,y,edgecolor="black")
plt.savefig('../output_data/fig9_n_hem_lat_v_cloudiness.png')
plt.show()

####  Fig 10. Southern Hemisphere - Cloudiness (%) vs. Latitude Linear Regression

In [None]:
x = s_hem['Latitude']
y = s_hem['Cloudiness']
(slope, intercept, rvalue, pvalue, stderr) = linregress(x, y)
line = x * slope + intercept
plt.plot(x,line,'r',
         label='y = {:.2f}x + {:.2f}\n   $r^2$ = {:.2f}'.format(slope,intercept,rvalue))
plt.title(f"Southern Hemisphere Latitude vs. Cloudiness ({timestampstr})")
plt.xlabel("Latitude")
plt.ylabel("Cloudiness (%)")
plt.legend(fontsize=12)
plt.grid()
plt.scatter(x,y,edgecolor="black")
plt.savefig('../output_data/fig10_s_hem_lat_v_cloudiness.png')
plt.show()

In Figures 9 and 10, we look at the relationship between cloudiness vs latitude for both the northern and southern hemispheres, respectively.

As is evidenced by the data through the linear regression r squared values of close to zero, there is little to no correlation between humidity and latitude.

However, it should also be noted that on this particular day 8/27/2021, the data is more polarized, showing many datapoints at the extremes of either close to zero or close to 100 for both the northern and southern hemispheres with less datapoints at the midline of around 40 to 60% cloudiness.

####  Fig 11. Northern Hemisphere - Wind Speed (mph) vs. Latitude Linear Regression

In [None]:
x = n_hem['Latitude']
y = n_hem['Wind_Speed']
(slope, intercept, rvalue, pvalue, stderr) = linregress(x, y)
line = x * slope + intercept
plt.plot(x,line,'r',
         label='y = {:.2f}x + {:.2f}\n   $r^2$ = {:.2f}'.format(slope,intercept,rvalue))
plt.title(f"Northern Hemisphere Latitude vs. Wind Speed ({timestampstr})")
plt.xlabel("Latitude")
plt.ylabel("Wind Speed (mph)")
plt.legend(fontsize=12)
plt.grid()
plt.scatter(x,y,edgecolor="black")
plt.savefig('../output_data/fig11_n_hem_lat_v_windspeed.png')
plt.show()

####  Fig 12. Southern Hemisphere - Wind Speed (mph) vs. Latitude Linear Regression

In [None]:
x = s_hem['Latitude']
y = s_hem['Wind_Speed']
(slope, intercept, rvalue, pvalue, stderr) = linregress(x, y)
line = x * slope + intercept
plt.plot(x,line,'r',
         label='y = {:.2f}x + {:.2f}\n   $r^2$ = {:.2f}'.format(slope,intercept,rvalue))
plt.title(f"Southern Hemisphere Latitude vs. Wind Speed ({timestampstr})")
plt.xlabel("Latitude")
plt.ylabel("Wind Speed (mph)")
plt.legend(fontsize=12)
plt.grid()
plt.scatter(x,y,edgecolor="black")
plt.savefig('../output_data/fig12_s_hem_lat_v_windspeed.png')
plt.show()

In Figures 11 and 12, we look at the relationship between wind speed vs latitude for both the northern and southern hemispheres, respectively.

As is evidenced by the data through the linear regression r squared values of close to zero, there is little to no correlation between wind speed and latitude.