<a href="https://colab.research.google.com/github/gsheara/Seattle-Weather/blob/main/GS_Analysis_Notebook.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Introduction
The purpose of this notebook is to answer whether it rains more in Seattle or New York City utilizing percipitation data as measured by one weather station in each city across four years. Because of the relatively subjective nature of the question, the dataset will be explored in a few different aspects to find a more comprehensive comparison of the two cities.

## Import libraries

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_theme(style='whitegrid')
import missingno as msno
import altair as alt

## Load clean data

In [123]:
df = pd.read_csv('https://raw.githubusercontent.com/gsheara/Seattle-Weather/main/Seattle-Weather-clean.csv')
df.head()

Unnamed: 0,date,city,precipitation
0,2020-01-01,NYC,0.0
1,2020-01-02,NYC,0.0
2,2020-01-03,NYC,0.13
3,2020-01-04,NYC,0.16
4,2020-01-05,NYC,0.0


This dataset uses percipitation measurements collected by weather stations in New York City and Seattle over the course of 2020 through 2023. In an attempt to isolate comprable data, only one weather station from each city's airport has been used. The original data sets can be found here:

New York Data Set Link: https://github.com/gsheara/Seattle-Weather/blob/e34949f8fe3c82e2c1858e380be5d2c517d67f9f/ny_rain.csv

Seattle Data Set Link: https://github.com/gsheara/Seattle-Weather/blob/dbc8bb88f07899c70ba996b695aee64fb078f9e0/seattle_rain.csv

## State your questions
I intend to look at the data in two main ways:

1.   In which city is there a higher quantity of rain?
2.   In which city does it rain more often?

To answer question one, I will not only look at the amount of rainfall overall, but compare which city sees more heavy rainfall. For question two I will take a similar approach, comparing both overall number of days with rain and patterns throughout the years in rainy day rates (i.e. is rain more common in certain months? Are rainfall patterns comparable between the two cities?).

## Analysis
#### 1a: In which city is there a higher quantity of rain?
This question focuses on the amount of rainfall each city sees. Chart 1 does a simple comparison of the overall inches of rainfall, revealing that NYC sees more rain at 172.70 in than Seattle at 147.85 in. While the chart techincally answers the questions and gives a good overview of the two cities, it isn't a comprehensive or very insightful look at the data overall.

In [29]:
df.groupby('city').sum()

Unnamed: 0_level_0,date,precipitation
city,Unnamed: 1_level_1,Unnamed: 2_level_1
NYC,2020-01-012020-01-022020-01-032020-01-042020-0...,172.7
SEA,2020-01-012020-01-022020-01-032020-01-042020-0...,147.85


In [77]:
chart1 = alt.Chart(df).mark_bar().encode(
    alt.X('city:N', title='City'),
    alt.Y('precipitation:Q', title='Precipitation (in)', aggregate='sum'),
).properties(title='Chart 1: Total precipitation by city', width=100, height=400)

chart1

#### 1b: In which city is there heavier rainfall?
I also wanted to try and compare which city saw 'heavier' rain as opposed to lighter/sprinkling rainfall, but after some (admitted limited) research online I found that the definition of heavy rainfall seemed to be more subjective, varying by study, and also based on individual observation, such as the wind making the rain feel heavier. The most consistent answer I found was 0.3 inches of rainfall per hour, with weather stations generally measuring for about 5 consecutive hours in a day. I decided to count anything over 1.8 inches of rain in one row (or one day) as heavy rain.

In [157]:
chart2 = alt.Chart(df).mark_circle(opacity=0.5).encode(
    alt.X('month(date):T', title='Month (2020-2023)'),
    alt.Y('precipitation:Q', title='Precipitation (in)'),
    alt.Color('city:N'),
    tooltip=['date:T', 'city', 'precipitation']
).transform_filter('datum.precipitation >= 1.5'
).properties(title='Chart 2: Days with heavy rain by month')

chart3 = alt.Chart(df).mark_bar().encode(
    alt.X('month(date):T', title='Month (2020-2023)'),
    alt.Y('count():Q'),
    alt.Color('city:N'),
    tooltip=['date:T', 'city', 'precipitation']
).transform_filter('datum.precipitation >= 1.5'
).properties(title='Chart 3: Number of days with heavy rain by month')

chart2 | chart3

Charts 2 and 3 both confirm that New York City not only sees more rainfall overall, but more days with heavy rain than Seattle (at 12 in NYC versus 10 in Seattle). Chart 2 helps demonstrate that each city also seems to expereince different patterns in its heavy rainfall, with almost but one of Seattle's heavy rain days happening in the Winter months (December through Febraury), and most of NYC's heavy rain days happening in the late summer to early fall months (July through October). While it may seem to be an error that a weather station recorded over 8 inches of rain in one day, it did in fact rain this much at JFK airport in September 2023.

#### 2a: In which city does it rain the most often?

Next I wanted to see which city experienced the most rainy days, which could be also be a justified interpretation of it "raining more" as a resident. I began similarly as with question 1, starting with a basic overview of the total data.

In [99]:
chart4 = alt.Chart(df).mark_bar().encode(
    alt.Y('count():Q'),
    alt.X('city:N', title='City')
).transform_filter('datum.precipitation > 0.00'
).properties(title='Chart 4: Total number of days with rainfall')

chart5 = alt.Chart(df).mark_bar().encode(
    alt.Y('count():Q'),
    alt.X('month(date):T', title='Month (2020-2023)')
).transform_filter('datum.precipitation > 0.00'
).properties(title='Chart 5: Total number of days with rainfall by month')


chart4 | chart5

#### 2b: Are there different patterns in rainfall in each city?

While NYC had a higher quantity of rain, it initially seems that Seattle has a higher number of rainy days. Chart 5 also shows an increase in rainy days in the winter months and decrease in the summer months, which was similar to Seattle's pattern of days with heavy rain and dissimilar to NYC's (as expressed in Chart 3).

In [125]:
chart6 = alt.Chart(df).mark_bar().encode(
    alt.Y('count():Q'),
    alt.X('month(date):T', title='Month (2020-2023)')
).transform_filter('datum.precipitation > 0.00'
).transform_filter('datum.city == "SEA"'
).properties(title='Chart 6: Number of rainy days by month, Seattle')

chart7 = alt.Chart(df).mark_bar().encode(
    alt.Y('count():Q', scale=alt.Scale(domain=[0, 90])),
    alt.X('month(date):T', title='Month (2020-2023)')
).transform_filter('datum.precipitation > 0.00'
).transform_filter('datum.city == "NYC"'
).properties(title='Chart 7: Number of rainy days by month, New York City')

chart6 | chart7

In [143]:
#Or, visualizaed differently:
chart6 = alt.Chart(df).mark_area(opacity=0.4, color='orange').encode(
    alt.Y('count():Q'),
    alt.X('month(date):T', title='Month (2020-2023)'),
    alt.Color('city:N')
).transform_filter('datum.precipitation > 0.00'
).transform_filter('datum.city == "SEA"'
).properties(title='Chart 8: Number of rainy days by month, Seattle and NYC')

dots6= alt.Chart(df).mark_circle(opacity=0.4, color='black').encode(
    alt.Y('count():Q'),
    alt.X('month(date):T', title='Month (2020-2023)')
).transform_filter('datum.precipitation > 0.00'
).transform_filter('datum.city == "SEA"'
).properties(title='Chart 8: Number of rainy days by month, Seattle and NYC')

line6 = alt.Chart(df).mark_line(opacity=0.4).encode(
    alt.Y('count():Q'),
    alt.X('month(date):T', title='Month (2020-2023)')
).transform_filter('datum.precipitation > 0.00'
).transform_filter('datum.city == "SEA"'
).properties(title='Chart 8: Number of rainy days by month, Seattle and NYC')

text6 = line6.mark_text(dy= -7).encode(text='count():Q')

nyc6 = alt.Chart(df).mark_area(opacity=0.4).encode(
    alt.Y('count():Q', scale=alt.Scale(domain=[0, 90])),
    alt.X('month(date):T', title='Month (2020-2023)'),
    alt.Color('city:N')
).transform_filter('datum.precipitation > 0.00'
).transform_filter('datum.city == "NYC"')

chart6 + dots6 + line6 + text6 + nyc6

In [151]:
chart7 = alt.Chart(df).mark_area(opacity=0.9, color='orange').encode(
    alt.Y('count():Q'),
    alt.X('month(date):T', title='Month (2020-2023)'),
    alt.Color('city:N')
).transform_filter('datum.precipitation > 0.00'
).transform_filter('datum.city == "NYC"'
).properties(title='Chart 9: Number of rainy days by month, Seattle and NYC')

dots7= alt.Chart(df).mark_circle(opacity=0.4, color='black').encode(
    alt.Y('count():Q'),
    alt.X('month(date):T', title='Month (2020-2023)')
).transform_filter('datum.precipitation > 0.00'
).transform_filter('datum.city == "NYC"')

line7 = alt.Chart(df).mark_line(opacity=0.4).encode(
    alt.Y('count():Q'),
    alt.X('month(date):T', title='Month (2020-2023)')
).transform_filter('datum.precipitation > 0.00'
).transform_filter('datum.city == "NYC"')

text7 = line7.mark_text(dy= -7).encode(text='count():Q')

sea7 = alt.Chart(df).mark_area(opacity=0.2).encode(
    alt.Y('count():Q', scale=alt.Scale(domain=[0, 90])),
    alt.X('month(date):T', title='Month (2020-2023)'),
    alt.Color('city:N')
).transform_filter('datum.precipitation > 0.00'
).transform_filter('datum.city == "SEA"')

chart7 + dots7 + line7 + text7 + sea7

Charts 6 and 7 also confirm that not only are the overall number of rainy days higher in Seattle than New York, but that there are more months in the year where Seattle sees more rainy days than NYC, showing consistency relative to Seattle's rainfall pattern throughout the year. The only exception is when Seattle's number of rainy days dips in the summer months, which is around the same time NYC's number of days with heavy rain and number of overall rainy days increases.

### Results for communication assignment

The graphs used in the communication assignment are Charts 1, 3, and 8 + 9.

## Conclusion

After exploring the data set, it seems that to a certain extent both cities can claim they expereince more rain than the other. New York City sees a higher quantity of rainfall, having not only more inches of rainfall cumutively but also more days with heavy rainfall (defined as over 0.3 inches of rain per hour.) However, Seattle has more days with any rain at all, and also experiences a higher variation in the number of rainy days based on the month than New York City--measured across the four years, there are more months where Seattle has more rainy days than New York, with the summer months being an exception. To conclude, if a New Yorker were hoping to visit Seattle but wary of rain, the summer months should likely see a lower quantity of rainfall and a lower number of rainy days in Seattle than New York.