# Rainfall Prediction

Step 1: Data Collection
First, we need to collect rainfall data for the past five years for the four major cities. Let's assume the cities are Delhi, Mumbai, Kolkata, and Chennai.

Step 2: Data Preparation
Prepare the data by organizing it into a suitable format, such as a pandas DataFrame. Ensure that the data includes columns for year, city, and rainfall amount.

Step 3: Data Preprocessing
Clean the data, handle missing values, and perform any necessary transformations.

Step 4: Feature Selection
Select relevant features for the model, such as year and city. Encode categorical features if necessary.

Step 5: Split the Data
Split the data into training and testing sets.

Step 6: Model Selection
Choose a suitable regression model for the prediction. We'll use a linear regression model for this example.

Step 7: Model Training
Train the model using the training data.

Step 8: Model Evaluation
Evaluate the model's performance using the testing data.

Step 9: Predictions
Make predictions using the trained model.

In [2]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

Step 1: Data Collection (Dummy Data)

In [3]:
data = {
    'Year': [2020, 2021, 2022, 2023, 2024] * 4,
    'City': ['Delhi', 'Mumbai', 'Kolkata', 'Chennai'] * 5,
    'Rainfall': [850, 1200, 1500, 1300, 950, 1100, 1250, 1400, 1350, 950, 1000, 1150, 1600, 1450, 1400, 1200, 1050, 1000, 1350, 1500]
}
df = pd.DataFrame(data)

In [4]:
df

Unnamed: 0,Year,City,Rainfall
0,2020,Delhi,850
1,2021,Mumbai,1200
2,2022,Kolkata,1500
3,2023,Chennai,1300
4,2024,Delhi,950
5,2020,Mumbai,1100
6,2021,Kolkata,1250
7,2022,Chennai,1400
8,2023,Delhi,1350
9,2024,Mumbai,950


In [1]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

# Step 1: Data Collection (Dummy Data)
data = {
    'Year': [2020, 2021, 2022, 2023, 2024] * 4,
    'City': ['Delhi', 'Mumbai', 'Kolkata', 'Chennai'] * 5,
    'Rainfall': [850, 1200, 1500, 1300, 950, 1100, 1250, 1400, 1350, 950, 1000, 1150, 1600, 1450, 1400, 1200, 1050, 1000, 1350, 1500]
}
df = pd.DataFrame(data)

# Step 3: Data Preprocessing
df['City'] = df['City'].astype('category').cat.codes

# Step 4: Feature Selection
X = df[['Year', 'City']]
y = df['Rainfall']

# Step 5: Split the Data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Step 6: Model Selection
model = LinearRegression()

# Step 7: Model Training
model.fit(X_train, y_train)

# Step 8: Model Evaluation
y_pred = model.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
print(f'Mean Squared Error: {mse}')

# Step 9: Predictions
future_data = pd.DataFrame({
    'Year': [2025, 2025, 2025, 2025],
    'City': [0, 1, 2, 3]  # Encoded values for Delhi, Mumbai, Kolkata, Chennai
})
predictions = model.predict(future_data)
print(f'Predicted Rainfall for 2025: {predictions}')


Mean Squared Error: 42931.180919909784
Predicted Rainfall for 2025: [1410.63664596 1373.91304348 1337.18944099 1300.46583851]
