## Road Accident Severity Prediction using Linear Regression
This notebook demonstrates how to use linear regression to predict accident severity based on multiple features.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
import joblib

## Step 2: Upload and Load the Dataset
Upload the `sample_road_accidents.csv` file.

In [None]:
from google.colab import files
uploaded = files.upload()
df = pd.read_csv(next(iter(uploaded)))
df.head()

## Step 3: Preprocess the Dataset
Handle missing values and convert categorical variables to numeric format.

In [None]:
df = df.dropna()
df_encoded = pd.get_dummies(df, columns=['Weather', 'Surface', 'Light'], drop_first=True)
X = df_encoded.drop('Accident_Severity', axis=1)
y = df_encoded['Accident_Severity']

## Step 4: Split the Data

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

## Step 5: Train the Linear Regression Model

In [None]:
model = LinearRegression()
model.fit(X_train, y_train)
predictions = model.predict(X_test)
mse = mean_squared_error(y_test, predictions)
print("Mean Squared Error:", mse)

## Step 6: Save the Trained Model

In [None]:
joblib.dump(model, 'accident_severity_model.pkl')
from google.colab import files
files.download('accident_severity_model.pkl')

## Step 7: Predict Using Hypothetical Data

In [None]:
sample_input = pd.DataFrame([{
    'Speed_limit': 80,
    'Vehicles_involved': 3,
    'Weather_Rainy': 1,
    'Surface_Wet': 1,
    'Light_Night': 1
}])
for col in X.columns:
    if col not in sample_input.columns:
        sample_input[col] = 0
sample_input = sample_input[X.columns]
predicted_severity = model.predict(sample_input)
print("Predicted Accident Severity:", predicted_severity[0])

## Step 8: Visualize the Correlation Matrix

In [None]:
plt.figure(figsize=(10, 6))
sns.heatmap(df_encoded.corr(), annot=True, cmap='coolwarm')
plt.title("Correlation Heatmap")
plt.show()

## Conclusion
This model helps identify factors that influence road accident severity. Such insights can be useful for planning and reducing risks in underdeveloped countries by improving infrastructure and public awareness.