# Weather Adjustment

**About Dataset** 

This repository contains a comprehensive collection of weather events data across 49 states in the United States. The dataset comprises a staggering 8.6 million events, ranging from regular occurrences like rain and snow to extreme weather phenomena such as storms and freezing conditions. The data spans from January 2016 to December 2022 and is sourced from 2,071 airport-based weather stations nationwide. For more detailed information about the dataset, refer to the official dataset page.

**Source**: https://www.kaggle.com/datasets/sobhanmoosavi/us-weather-events

The data processing steps using Python's pandas library are summarized below:

- Data is imported from a CSV file with the name "WeatherEvents_Jan2016-Dec2022.csv".
- The size of the data frame is inspected.
- The first few lines are previewed.
- The "StartTime(UTC)" column is converted to a date-time object.
- A new data frame is created by selecting the relevant columns.
- New columns are created by parsing the date information ("StartDate" and "Date").
- The "StartTime(UTC)" column, which is no longer needed, is removed.
- After grouping by date and city, the specified function is applied to each group.
- The results are saved in a new data frame named "most_common_daily_types".
- The resulting data frame is exported as "filtered_weather_2016_2018.csv".


With these operations, data cleaning and analysis steps are performed to create a dataset containing summary information on weather severity.

In [12]:
import pandas as pd

In [13]:
weather = pd.read_csv("WeatherEvents_Jan2016-Dec2022.csv")

In [14]:
weather.shape

(8627181, 14)

In [28]:
weather.head()

Unnamed: 0,EventId,Type,Severity,StartTime(UTC),EndTime(UTC),Precipitation(in),TimeZone,AirportCode,LocationLat,LocationLng,City,County,State,ZipCode
0,W-1,Snow,Light,2016-01-06 23:14:00,2016-01-07 00:34:00,0.0,US/Mountain,K04V,38.0972,-106.1689,Saguache,Saguache,CO,81149.0
1,W-2,Snow,Light,2016-01-07 04:14:00,2016-01-07 04:54:00,0.0,US/Mountain,K04V,38.0972,-106.1689,Saguache,Saguache,CO,81149.0
2,W-3,Snow,Light,2016-01-07 05:54:00,2016-01-07 15:34:00,0.03,US/Mountain,K04V,38.0972,-106.1689,Saguache,Saguache,CO,81149.0
3,W-4,Snow,Light,2016-01-08 05:34:00,2016-01-08 05:54:00,0.0,US/Mountain,K04V,38.0972,-106.1689,Saguache,Saguache,CO,81149.0
4,W-5,Snow,Light,2016-01-08 13:54:00,2016-01-08 15:54:00,0.0,US/Mountain,K04V,38.0972,-106.1689,Saguache,Saguache,CO,81149.0


---------------------------

**Grouping operation to find the most common daily weather types**

In [18]:
most_common_daily_types = filtered_weather.groupby(['City', filtered_weather['StartTime(UTC)'].dt.date])['Type'].apply(lambda x: x.value_counts().idxmax()).reset_index()

----------------------------

In [19]:
most_common_daily_types.columns = ['City', 'Date', 'Type']

In [26]:
most_common_daily_types.sample(14)

Unnamed: 0,City,Date,Type
605540,Oakes,2018-03-11,Fog
481354,Luray,2018-02-07,Rain
132702,Chandler,2016-09-01,Rain
98148,Brooksville,2017-07-09,Rain
113763,Camden,2018-09-26,Rain
442626,Lamoni,2017-09-03,Fog
60214,Belgrade,2017-01-05,Cold
467892,Llano South,2017-12-17,Rain
515743,Meadville,2017-10-16,Rain
874883,West Chester,2018-01-08,Snow


In [24]:
most_common_daily_types.isnull().sum()

City    0
Date    0
Type    0
dtype: int64

In [25]:
most_common_daily_types.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 908359 entries, 0 to 908358
Data columns (total 3 columns):
 #   Column  Non-Null Count   Dtype 
---  ------  --------------   ----- 
 0   City    908359 non-null  object
 1   Date    908359 non-null  object
 2   Type    908359 non-null  object
dtypes: object(3)
memory usage: 20.8+ MB


In [27]:
most_common_daily_types.to_csv("filtered_weather_2016_2018.csv", index=False)