# Temperature Forecast Project using ML

## Project Description
This data is for the purpose of bias correction of next-day maximum and minimum air temperatures forecast of the LDAPS model operated by the Korea Meteorological Administration over Seoul, South Korea. This data consists of summer data from 2013 to 2017. The input data is largely composed of the LDAPS model's next-day forecast data, in-situ maximum and minimum temperatures of present-day, and geographic auxiliary variables. There are two outputs (i.e. next-day maximum and minimum air temperatures) in this data. Hindcast validation was conducted for the period from 2015 to 2017.

Attribute Information:
For more information, read [Cho et al, 2020].

    1. station - used weather station number: 1 to 25
    2. Date - Present day: yyyy-mm-dd ('2013-06-30' to '2017-08-30')
    3. Present_Tmax - Maximum air temperature between 0 and 21 h on the present day (Â°C): 20 to 37.6
    4. Present_Tmin - Minimum air temperature between 0 and 21 h on the present day (Â°C): 11.3 to 29.9
    5. LDAPS_RHmin - LDAPS model forecast of next-day minimum relative humidity (%): 19.8 to 98.5
    6. LDAPS_RHmax - LDAPS model forecast of next-day maximum relative humidity (%): 58.9 to 100
    7. LDAPS_Tmax_lapse - LDAPS model forecast of next-day maximum air temperature applied lapse rate (Â°C): 17.6 to 38.5
    8. LDAPS_Tmin_lapse - LDAPS model forecast of next-day minimum air temperature applied lapse rate (Â°C): 14.3 to 29.6
    9. LDAPS_WS - LDAPS model forecast of next-day average wind speed (m/s): 2.9 to 21.9
    10. LDAPS_LH - LDAPS model forecast of next-day average latent heat flux (W/m2): -13.6 to 213.4
    11. LDAPS_CC1 - LDAPS model forecast of next-day 1st 6-hour split average cloud cover (0-5 h) (%): 0 to 0.97
    12. LDAPS_CC2 - LDAPS model forecast of next-day 2nd 6-hour split average cloud cover (6-11 h) (%): 0 to 0.97
    13. LDAPS_CC3 - LDAPS model forecast of next-day 3rd 6-hour split average cloud cover (12-17 h) (%): 0 to 0.98
    14. LDAPS_CC4 - LDAPS model forecast of next-day 4th 6-hour split average cloud cover (18-23 h) (%): 0 to 0.97
    15. LDAPS_PPT1 - LDAPS model forecast of next-day 1st 6-hour split average precipitation (0-5 h) (%): 0 to 23.7
    16. LDAPS_PPT2 - LDAPS model forecast of next-day 2nd 6-hour split average precipitation (6-11 h) (%): 0 to 21.6
    17. LDAPS_PPT3 - LDAPS model forecast of next-day 3rd 6-hour split average precipitation (12-17 h) (%): 0 to 15.8
    18. LDAPS_PPT4 - LDAPS model forecast of next-day 4th 6-hour split average precipitation (18-23 h) (%): 0 to 16.7
    19. lat - Latitude (Â°): 37.456 to 37.645
    20. lon - Longitude (Â°): 126.826 to 127.135
    21. DEM - Elevation (m): 12.4 to 212.3
    22. Slope - Slope (Â°): 0.1 to 5.2
    23. Solar radiation - Daily incoming solar radiation (wh/m2): 4329.5 to 5992.9
    24. Next_Tmax - The next-day maximum air temperature (Â°C): 17.4 to 38.9
    25. Next_Tmin - The next-day minimum air temperature (Â°C): 11.3 to 29.8T

You have to build separate models that can predict the minimum temperature for the next day and the maximum temperature for the next day based on the details provided in the dataset.

Dataset Link-

https://github.com/dsrscientist/Dataset2/blob/main/temperature.csv


In [1]:
# Step 1: Data loading and exploration
import pandas as pd

# Load the dataset
url = "https://raw.githubusercontent.com/dsrscientist/Dataset2/main/temperature.csv"
df = pd.read_csv(url)
 
df.head()

Unnamed: 0,station,Date,Present_Tmax,Present_Tmin,LDAPS_RHmin,LDAPS_RHmax,LDAPS_Tmax_lapse,LDAPS_Tmin_lapse,LDAPS_WS,LDAPS_LH,...,LDAPS_PPT2,LDAPS_PPT3,LDAPS_PPT4,lat,lon,DEM,Slope,Solar radiation,Next_Tmax,Next_Tmin
0,1.0,30-06-2013,28.7,21.4,58.255688,91.116364,28.074101,23.006936,6.818887,69.451805,...,0.0,0.0,0.0,37.6046,126.991,212.335,2.785,5992.895996,29.1,21.2
1,2.0,30-06-2013,31.9,21.6,52.263397,90.604721,29.850689,24.035009,5.69189,51.937448,...,0.0,0.0,0.0,37.6046,127.032,44.7624,0.5141,5869.3125,30.5,22.5
2,3.0,30-06-2013,31.6,23.3,48.690479,83.973587,30.091292,24.565633,6.138224,20.57305,...,0.0,0.0,0.0,37.5776,127.058,33.3068,0.2661,5863.555664,31.1,23.9
3,4.0,30-06-2013,32.0,23.4,58.239788,96.483688,29.704629,23.326177,5.65005,65.727144,...,0.0,0.0,0.0,37.645,127.022,45.716,2.5348,5856.964844,31.7,24.3
4,5.0,30-06-2013,31.4,21.9,56.174095,90.155128,29.113934,23.48648,5.735004,107.965535,...,0.0,0.0,0.0,37.5507,127.135,35.038,0.5055,5859.552246,31.2,22.5


In [2]:
# Display information about the dataset
print("\nInformation about the dataset:")
df.info()



Information about the dataset:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7752 entries, 0 to 7751
Data columns (total 25 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   station           7750 non-null   float64
 1   Date              7750 non-null   object 
 2   Present_Tmax      7682 non-null   float64
 3   Present_Tmin      7682 non-null   float64
 4   LDAPS_RHmin       7677 non-null   float64
 5   LDAPS_RHmax       7677 non-null   float64
 6   LDAPS_Tmax_lapse  7677 non-null   float64
 7   LDAPS_Tmin_lapse  7677 non-null   float64
 8   LDAPS_WS          7677 non-null   float64
 9   LDAPS_LH          7677 non-null   float64
 10  LDAPS_CC1         7677 non-null   float64
 11  LDAPS_CC2         7677 non-null   float64
 12  LDAPS_CC3         7677 non-null   float64
 13  LDAPS_CC4         7677 non-null   float64
 14  LDAPS_PPT1        7677 non-null   float64
 15  LDAPS_PPT2        7677 non-null   float64
 16  LDAPS_PPT3

In [3]:
# Summary statistics of numerical columns
print("\nSummary statistics of numerical columns:")
df.describe()


Summary statistics of numerical columns:


Unnamed: 0,station,Present_Tmax,Present_Tmin,LDAPS_RHmin,LDAPS_RHmax,LDAPS_Tmax_lapse,LDAPS_Tmin_lapse,LDAPS_WS,LDAPS_LH,LDAPS_CC1,...,LDAPS_PPT2,LDAPS_PPT3,LDAPS_PPT4,lat,lon,DEM,Slope,Solar radiation,Next_Tmax,Next_Tmin
count,7750.0,7682.0,7682.0,7677.0,7677.0,7677.0,7677.0,7677.0,7677.0,7677.0,...,7677.0,7677.0,7677.0,7752.0,7752.0,7752.0,7752.0,7752.0,7725.0,7725.0
mean,13.0,29.768211,23.225059,56.759372,88.374804,29.613447,23.512589,7.097875,62.505019,0.368774,...,0.485003,0.2782,0.269407,37.544722,126.991397,61.867972,1.257048,5341.502803,30.274887,22.93222
std,7.211568,2.969999,2.413961,14.668111,7.192004,2.947191,2.345347,2.183836,33.730589,0.262458,...,1.762807,1.161809,1.206214,0.050352,0.079435,54.27978,1.370444,429.158867,3.12801,2.487613
min,1.0,20.0,11.3,19.794666,58.936283,17.624954,14.272646,2.88258,-13.603212,0.0,...,0.0,0.0,0.0,37.4562,126.826,12.37,0.098475,4329.520508,17.4,11.3
25%,7.0,27.8,21.7,45.963543,84.222862,27.673499,22.089739,5.678705,37.266753,0.146654,...,0.0,0.0,0.0,37.5102,126.937,28.7,0.2713,4999.018555,28.2,21.3
50%,13.0,29.9,23.4,55.039024,89.79348,29.703426,23.760199,6.54747,56.865482,0.315697,...,0.0,0.0,0.0,37.5507,126.995,45.716,0.618,5436.345215,30.5,23.1
75%,19.0,32.0,24.9,67.190056,93.743629,31.71045,25.152909,8.032276,84.223616,0.575489,...,0.018364,0.007896,4.1e-05,37.5776,127.042,59.8324,1.7678,5728.316406,32.6,24.6
max,25.0,37.6,29.9,98.524734,100.000153,38.542255,29.619342,21.857621,213.414006,0.967277,...,21.621661,15.841235,16.655469,37.645,127.135,212.335,5.17823,5992.895996,38.9,29.8


In [4]:
# Step 2: Data preprocessing
df.dropna(inplace=True)
df.drop(columns=['Date'],inplace=True)

In [5]:
df.shape

(7588, 24)

In [7]:
# Separate features (independent variables) and target variables (dependent variables)
X = df.drop(columns=["Next_Tmax", "Next_Tmin"])  # Features
y_max = df["Next_Tmax"]  # Target variable for maximum temperature
y_min = df["Next_Tmin"]  # Target variable for minimum temperature


In [8]:
# Step 3: Splitting the dataset into training and testing sets
from sklearn.model_selection import train_test_split

# Split the data into 80% training and 20% testing sets for both maximum and minimum temperature prediction
X_train, X_test, y_max_train, y_max_test, y_min_train, y_min_test = train_test_split(X, y_max, y_min, test_size=0.2, random_state=42)

print("\nShape of training set for maximum temperature prediction:", X_train.shape)
print("Shape of testing set for maximum temperature prediction:", X_test.shape)
print("Shape of training set for minimum temperature prediction:", X_train.shape)
print("Shape of testing set for minimum temperature prediction:", X_test.shape)



Shape of training set for maximum temperature prediction: (6070, 22)
Shape of testing set for maximum temperature prediction: (1518, 22)
Shape of training set for minimum temperature prediction: (6070, 22)
Shape of testing set for minimum temperature prediction: (1518, 22)


In [9]:
# Step 4: Building separate models for maximum and minimum temperature prediction

from sklearn.ensemble import RandomForestRegressor

# Initialize RandomForestRegressor models for both maximum and minimum temperature prediction
max_temp_model = RandomForestRegressor(random_state=42)
min_temp_model = RandomForestRegressor(random_state=42)

# Train the models on the training data
max_temp_model.fit(X_train, y_max_train)
min_temp_model.fit(X_train, y_min_train)

print("\nModels trained successfully.")



Models trained successfully.


In [10]:
import joblib

# Save the best model to a file
joblib.dump(max_temp_model, 'max_temp_model.pkl')
joblib.dump(min_temp_model, 'min_temp_model.pkl')

print("Models saved successfully.")


Models saved successfully.


In [11]:
import joblib

# Load the saved model from the file
loaded_model_max = joblib.load('max_temp_model.pkl')
loaded_model_min = joblib.load('min_temp_model.pkl')

print("Models loaded successfully.")


Models loaded successfully.


In [13]:
# Take input for prediction
X.head(1)

Unnamed: 0,station,Present_Tmax,Present_Tmin,LDAPS_RHmin,LDAPS_RHmax,LDAPS_Tmax_lapse,LDAPS_Tmin_lapse,LDAPS_WS,LDAPS_LH,LDAPS_CC1,...,LDAPS_CC4,LDAPS_PPT1,LDAPS_PPT2,LDAPS_PPT3,LDAPS_PPT4,lat,lon,DEM,Slope,Solar radiation
0,1.0,28.7,21.4,58.255688,91.116364,28.074101,23.006936,6.818887,69.451805,0.233947,...,0.130928,0.0,0.0,0.0,0.0,37.6046,126.991,212.335,2.785,5992.895996


In [14]:
import pandas as pd
import joblib

# Load the test dataset
test_data = X.head(1)

loaded_model_max = joblib.load('max_temp_model.pkl')
loaded_model_min = joblib.load('min_temp_model.pkl')

# Make predictions on the preprocessed test dataset
prediction_max = loaded_model_max.predict(test_data)
prediction_min = loaded_model_min.predict(test_data)

# Print the predictions
print("Tomorrow's maximum weather prediction:", prediction_max)
print("Tomorrow's maximum weather prediction:", prediction_min)


Tomorrow's maximum weather prediction: [29.528]
Tomorrow's maximum weather prediction: [21.543]
