# Weather in Szeged 2006-2016

#### Weather is not just small talk; it's a global conversation that affects us all, influencing everything from our daily plans to long-term climate strategies. The increasing attention to weather and climate change highlights our shared responsibility to protect the environment and adapt to the challenges of a changing world.

### Import importand libraries and read data 

In [21]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

In [2]:
df=pd.read_csv(r"weatherHistory.csv")
df

Unnamed: 0,Formatted Date,Summary,Precip Type,Temperature (C),Apparent Temperature (C),Humidity,Wind Speed (km/h),Wind Bearing (degrees),Visibility (km),Loud Cover,Pressure (millibars),Daily Summary
0,2006-04-01 00:00:00.000 +0200,Partly Cloudy,rain,9.472222,7.388889,0.89,14.1197,251.0,15.8263,0.0,1015.13,Partly cloudy throughout the day.
1,2006-04-01 01:00:00.000 +0200,Partly Cloudy,rain,9.355556,7.227778,0.86,14.2646,259.0,15.8263,0.0,1015.63,Partly cloudy throughout the day.
2,2006-04-01 02:00:00.000 +0200,Mostly Cloudy,rain,9.377778,9.377778,0.89,3.9284,204.0,14.9569,0.0,1015.94,Partly cloudy throughout the day.
3,2006-04-01 03:00:00.000 +0200,Partly Cloudy,rain,8.288889,5.944444,0.83,14.1036,269.0,15.8263,0.0,1016.41,Partly cloudy throughout the day.
4,2006-04-01 04:00:00.000 +0200,Mostly Cloudy,rain,8.755556,6.977778,0.83,11.0446,259.0,15.8263,0.0,1016.51,Partly cloudy throughout the day.
...,...,...,...,...,...,...,...,...,...,...,...,...
96448,2016-09-09 19:00:00.000 +0200,Partly Cloudy,rain,26.016667,26.016667,0.43,10.9963,31.0,16.1000,0.0,1014.36,Partly cloudy starting in the morning.
96449,2016-09-09 20:00:00.000 +0200,Partly Cloudy,rain,24.583333,24.583333,0.48,10.0947,20.0,15.5526,0.0,1015.16,Partly cloudy starting in the morning.
96450,2016-09-09 21:00:00.000 +0200,Partly Cloudy,rain,22.038889,22.038889,0.56,8.9838,30.0,16.1000,0.0,1015.66,Partly cloudy starting in the morning.
96451,2016-09-09 22:00:00.000 +0200,Partly Cloudy,rain,21.522222,21.522222,0.60,10.5294,20.0,16.1000,0.0,1015.95,Partly cloudy starting in the morning.


### Cleaning data and preprocessing

In [3]:
print (df.info())
print ("-----------------------------------------")
print (df.shape)
print ("-----------------------------------------")
print (df.isna().sum())
print ("-----------------------------------------")
print (df.duplicated().sum())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 96453 entries, 0 to 96452
Data columns (total 12 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   Formatted Date            96453 non-null  object 
 1   Summary                   96453 non-null  object 
 2   Precip Type               95936 non-null  object 
 3   Temperature (C)           96453 non-null  float64
 4   Apparent Temperature (C)  96453 non-null  float64
 5   Humidity                  96453 non-null  float64
 6   Wind Speed (km/h)         96453 non-null  float64
 7   Wind Bearing (degrees)    96453 non-null  float64
 8   Visibility (km)           96453 non-null  float64
 9   Loud Cover                96453 non-null  float64
 10  Pressure (millibars)      96453 non-null  float64
 11  Daily Summary             96453 non-null  object 
dtypes: float64(8), object(4)
memory usage: 8.8+ MB
None
-----------------------------------------
(96453, 12)
----------

In [4]:
df.dropna(inplace=True)
df.drop_duplicates(inplace=True)

In [5]:
print (df.info())
print ("-----------------------------------------")
print (df.shape)
print ("-----------------------------------------")
print (df.isna().sum())
print ("-----------------------------------------")
print (df.duplicated().sum())

<class 'pandas.core.frame.DataFrame'>
Index: 95912 entries, 0 to 96452
Data columns (total 12 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   Formatted Date            95912 non-null  object 
 1   Summary                   95912 non-null  object 
 2   Precip Type               95912 non-null  object 
 3   Temperature (C)           95912 non-null  float64
 4   Apparent Temperature (C)  95912 non-null  float64
 5   Humidity                  95912 non-null  float64
 6   Wind Speed (km/h)         95912 non-null  float64
 7   Wind Bearing (degrees)    95912 non-null  float64
 8   Visibility (km)           95912 non-null  float64
 9   Loud Cover                95912 non-null  float64
 10  Pressure (millibars)      95912 non-null  float64
 11  Daily Summary             95912 non-null  object 
dtypes: float64(8), object(4)
memory usage: 9.5+ MB
None
-----------------------------------------
(95912, 12)
---------------

In [6]:
# Convert Date column type from object to datetime and set as index
df['Formatted Date']=pd.to_datetime(df['Formatted Date'])
df['Formatted Date']=df['Formatted Date'].apply(lambda x:x.timestamp())
df

  df['Formatted Date']=pd.to_datetime(df['Formatted Date'])


Unnamed: 0,Formatted Date,Summary,Precip Type,Temperature (C),Apparent Temperature (C),Humidity,Wind Speed (km/h),Wind Bearing (degrees),Visibility (km),Loud Cover,Pressure (millibars),Daily Summary
0,1.143842e+09,Partly Cloudy,rain,9.472222,7.388889,0.89,14.1197,251.0,15.8263,0.0,1015.13,Partly cloudy throughout the day.
1,1.143846e+09,Partly Cloudy,rain,9.355556,7.227778,0.86,14.2646,259.0,15.8263,0.0,1015.63,Partly cloudy throughout the day.
2,1.143850e+09,Mostly Cloudy,rain,9.377778,9.377778,0.89,3.9284,204.0,14.9569,0.0,1015.94,Partly cloudy throughout the day.
3,1.143853e+09,Partly Cloudy,rain,8.288889,5.944444,0.83,14.1036,269.0,15.8263,0.0,1016.41,Partly cloudy throughout the day.
4,1.143857e+09,Mostly Cloudy,rain,8.755556,6.977778,0.83,11.0446,259.0,15.8263,0.0,1016.51,Partly cloudy throughout the day.
...,...,...,...,...,...,...,...,...,...,...,...,...
96448,1.473440e+09,Partly Cloudy,rain,26.016667,26.016667,0.43,10.9963,31.0,16.1000,0.0,1014.36,Partly cloudy starting in the morning.
96449,1.473444e+09,Partly Cloudy,rain,24.583333,24.583333,0.48,10.0947,20.0,15.5526,0.0,1015.16,Partly cloudy starting in the morning.
96450,1.473448e+09,Partly Cloudy,rain,22.038889,22.038889,0.56,8.9838,30.0,16.1000,0.0,1015.66,Partly cloudy starting in the morning.
96451,1.473451e+09,Partly Cloudy,rain,21.522222,21.522222,0.60,10.5294,20.0,16.1000,0.0,1015.95,Partly cloudy starting in the morning.


In [7]:
print(df['Summary'].value_counts())
print ("----------------------------------------")
print (df['Daily Summary'].value_counts())

Summary
Partly Cloudy                          31628
Mostly Cloudy                          27914
Overcast                               16516
Clear                                  10746
Foggy                                   7117
Breezy and Overcast                      528
Breezy and Mostly Cloudy                 516
Breezy and Partly Cloudy                 386
Dry and Partly Cloudy                     86
Windy and Partly Cloudy                   67
Light Rain                                63
Breezy                                    54
Windy and Overcast                        45
Humid and Mostly Cloudy                   40
Drizzle                                   39
Breezy and Foggy                          35
Windy and Mostly Cloudy                   35
Dry                                       34
Humid and Partly Cloudy                   17
Dry and Mostly Cloudy                     14
Rain                                      10
Windy                                      8
Hu

In [8]:
# convet categorical data to numerical 
summary=pd.get_dummies(df['Summary'],drop_first=True,dtype='i')
daily_summary=pd.get_dummies(df['Daily Summary'],drop_first=True,dtype='i')
precip_type=pd.get_dummies(df['Precip Type'],drop_first=True,dtype='i')

In [9]:
df=pd.concat([df,summary,daily_summary,precip_type],axis=1)
df.drop(columns=['Summary'], inplace=True)
df.drop(columns=['Daily Summary'], inplace=True)
df.drop(columns=['Precip Type'], inplace=True)
df

Unnamed: 0,Formatted Date,Temperature (C),Apparent Temperature (C),Humidity,Wind Speed (km/h),Wind Bearing (degrees),Visibility (km),Loud Cover,Pressure (millibars),Breezy and Dry,...,Partly cloudy until morning.,Partly cloudy until night and breezy in the afternoon.,Partly cloudy until night and breezy in the morning.,Partly cloudy until night and breezy starting in the morning continuing until afternoon.,Partly cloudy until night.,Rain throughout the day.,Rain until afternoon.,Rain until morning.,Windy in the afternoon.,snow
0,1.143842e+09,9.472222,7.388889,0.89,14.1197,251.0,15.8263,0.0,1015.13,0,...,0,0,0,0,0,0,0,0,0,0
1,1.143846e+09,9.355556,7.227778,0.86,14.2646,259.0,15.8263,0.0,1015.63,0,...,0,0,0,0,0,0,0,0,0,0
2,1.143850e+09,9.377778,9.377778,0.89,3.9284,204.0,14.9569,0.0,1015.94,0,...,0,0,0,0,0,0,0,0,0,0
3,1.143853e+09,8.288889,5.944444,0.83,14.1036,269.0,15.8263,0.0,1016.41,0,...,0,0,0,0,0,0,0,0,0,0
4,1.143857e+09,8.755556,6.977778,0.83,11.0446,259.0,15.8263,0.0,1016.51,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
96448,1.473440e+09,26.016667,26.016667,0.43,10.9963,31.0,16.1000,0.0,1014.36,0,...,0,0,0,0,0,0,0,0,0,0
96449,1.473444e+09,24.583333,24.583333,0.48,10.0947,20.0,15.5526,0.0,1015.16,0,...,0,0,0,0,0,0,0,0,0,0
96450,1.473448e+09,22.038889,22.038889,0.56,8.9838,30.0,16.1000,0.0,1015.66,0,...,0,0,0,0,0,0,0,0,0,0
96451,1.473451e+09,21.522222,21.522222,0.60,10.5294,20.0,16.1000,0.0,1015.95,0,...,0,0,0,0,0,0,0,0,0,0


In [10]:
# Split data to x and y
x=df.drop('Temperature (C)',axis=1)
y=df.iloc[:,2]

In [11]:
# Scalling data
from sklearn.preprocessing import MinMaxScaler
scl=MinMaxScaler()
x_scl=scl.fit_transform(x)

In [12]:
# Split data to train and test
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=.20)

In [13]:
# Build model
from sklearn.linear_model import LinearRegression
lr=LinearRegression()
lr.fit(x_train,y_train)

In [14]:
# Evalute the model
from sklearn.metrics import mean_absolute_error
y_pred=lr.predict(x_test)
mae = mean_absolute_error(y_test, y_pred)
print ("mean absolute error using Multiregression",mae) # accuracy

mean absolute error using Multiregression 2.696804079474428e-10


In [15]:
### Polynomial regression

In [16]:
# Deep Learning 

In [17]:
# Build model
from tensorflow.keras.layers import Dense
from tensorflow.keras.models import Sequential

model = Sequential([
    Dense(128,activation='relu',input_dim=248),
    Dense(64,activation='relu'),
    Dense(32,activation='relu'),
    Dense(1,activation=None)
])

In [18]:
from tensorflow.keras.metrics import RootMeanSquaredError

model.compile(
    optimizer='adam',
    loss=['mean_absolute_error','binary_crossentropy'],
    metrics=[RootMeanSquaredError()]
)

model.fit(
    x_train,
    y_train,
    validation_data=(x_test,y_test),
    epochs=10,
    batch_size=32
)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.src.callbacks.History at 0x280c144abf0>

In [19]:
# Evalute the model

from sklearn.metrics import mean_absolute_error

# Predict on test data
y_pred = model.predict(x_test)

# Calculate Mean Absolute Error
mae = mean_absolute_error(y_test, y_pred)
print("mean absolute error using Deep learning:",mae)

mean absolute error using Deep learning: 8.993387584658329
