# CS145: Project 3 | Project Name

## Collaborators:
Please list the names and SUNet IDs of your collaborators below:
* Kashif Muneer 2021-CE-34
* Mehmood Ul Haq 2021-CE-35

## Project Overview

**Description:**

The goal of this project is to develop a machine learning model to predict the incident group based on London Fire Brigade service calls data. The model will use a Gradient Boosting Classifier to classify incidents into different groups, contributing to better resource allocation and incident response management.

**Program Language:**
Python

**Environment:**
 Google Colab

**Dataset Name:**
London Fire Brigade Service Calls Dataset

**Dataset Link:**
[BigQuery Public Dataset: London Fire Brigade Service Calls](https://\:console.cloud.google.com/bigquery?project=gcp-2021-ce-35&supportedpurview=project&ws=!1m10!1m4!4m3!1sbigquery-public-data!2slondon_fire_brigade!3sfire_brigade_service_calls!1m4!4m3!1sbigquery-public-data!2schicago_crime!3scrime)

---


## Set Up
**Description:** Initialize the BigQuery client and set up the project ID.

In [None]:
from google.colab import auth
auth.authenticate_user()

**Description:** We import the necessary libraries for working with data and building machine learning models. The libraries include tools for data manipulation (pandas), model training and evaluation (scikit-learn), and handling datetime operations.

In [None]:
from google.cloud import bigquery
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, classification_report
project_id = "rich-principle-407105"

## Analysis of Dataset

---

**Description:** Query a subset of data from BigQuery for initial analysis, load it into a DataFrame, and display a snapshot.

----

In [None]:
# Replace with your Google Cloud project ID
client = bigquery.Client(project=project_id)

query = """
SELECT
  incident_group,
  borough_code,
  date_of_call,
  hour_of_call
FROM
  `bigquery-public-data.london_fire_brigade.fire_brigade_service_calls`
ORDER BY
  date_of_call ASC, hour_of_call ASC
LIMIT 100
"""

df = client.query(query).to_dataframe()
df.head()


## Data Exploration

---

**Description:** Query the complete dataset from BigQuery, preprocess it, and visualize the count of incidents by group.

---

In [None]:
# Load data from BigQuery
query = """
SELECT incident_group, borough_code, date_of_call, hour_of_call
FROM `bigquery-public-data.london_fire_brigade.fire_brigade_service_calls`
WHERE incident_group IS NOT NULL
"""
df = client.query(query).to_dataframe()
df['date_in_seconds'] = pd.to_datetime(df['date_of_call']).astype(int) // 10**9

# Drop the original 'date_of_call' column as it is no longer needed
df = df.drop(columns=['date_of_call'])

# Convert categorical variables to numerical representations
df['borough_code'] = pd.Categorical(df['borough_code']).codes

# Extract features and target variable
X = df[['borough_code', 'date_in_seconds', 'hour_of_call']]
y = df['incident_group']

# Split the Dataset

**Description:** To evaluate the model's performance, we split the dataset into training and testing sets. The training set is used to train the model, and the testing set is used to assess its performance on unseen data. The train_test_split function from scikit-learn is commonly used for this purpose.

In [None]:
# Split the data into training and testing sets for prediction
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


## Data Prediction

---

**Description:** Extract features and target variable for prediction, split the data, train a Gradient Boosting model, make predictions, and evaluate the model.

**Model:** Gradient Boosting Regressor.

**Task:**  Predict the response time of the Fire Brigade based on input features.

---

In [None]:
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import accuracy_score, classification_report

# Create and train the Gradient Boosting model
gb_model = GradientBoostingClassifier(random_state=42)
gb_model.fit(X_train, y_train)

# Make predictions on the test set
y_pred_gb = gb_model.predict(X_test)

# Evaluate the model
accuracy_gb = accuracy_score(y_test, y_pred_gb)
print(f"Accuracy (Gradient Boosting): {accuracy_gb:.2f}")
print("Classification Report (Gradient Boosting):\n", classification_report(y_test, y_pred_gb))

## Conclusion

---

This conclusion addresses the model's performance, identifies key findings, provides recommendations for improvement, , and emphasizes the importance of ongoing refinement for better results.

---

In [None]:
# Section 6: Conclusion

# General Observations
print("General Observations:")
print("- The initial data analysis provided a snapshot of the dataset, including key features.")

# Prediction Results
print("\nPrediction Results:")
print(f"- The Gradient Boosting model achieved an accuracy of {accuracy_gb:.2f} on the test set.")
print("- The classification report provides detailed metrics for each incident group, indicating strengths and areas for improvement.")
print("   - Classification Report:\n", classification_report(y_test, y_pred_gb))

# Implications for Fire Brigade Operations
print("\nImplications for Fire Brigade Operations:")
print("- Predictive models can assist in anticipating incident types and response times, enhancing operational efficiency.")

# Concluding Remarks
print("\nConcluding Remarks:")
print("- The analysis and prediction serve as valuable tools for enhancing decision-making within the Fire Brigade.")

# Note: Replace placeholders (e.g., {accuracy_gb:.2f}) with actual values obtained during the analysis.


## Visualization

---
The visualization code in the below code creates a bar graph showing the count of incidents for each incident group. This graph helps visualize the distribution of incidents across different groups, providing insights into the frequency of each type of incident in your dataset.

---

In [None]:
import pandas as pd
import matplotlib.pyplot as plt

# Assuming you have already loaded the data into the 'df' DataFrame

# Count the occurrences of each incident group
incident_counts = df['incident_group'].value_counts()

# Plotting the bar graph
plt.figure(figsize=(6, 6))
incident_counts.plot(kind='bar', color='skyblue')
plt.title('Count of Incidents by Incident Group')
plt.xlabel('Incident Group')
plt.ylabel('Count')
plt.xticks(rotation=45, ha='right')  # Rotate x-axis labels for better visibility
plt.show()


---
**Explanation:**

This code displaying model performance metrics.
The accuracy and a detailed classification report (precision, recall, F1-score, etc.) for the Gradient Boosting model are printed.
These metrics provide insights into how well the model is performing in predicting incident groups.

---

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt
# Visualize temporal patterns using seaborn
#------
df = client.query(query).to_dataframe()
df['date_in_seconds'] = pd.to_datetime(df['date_of_call']).astype(int) // 10**9

# Drop the original 'date_of_call' column as it is no longer needed
df = df.drop(columns=['date_of_call'])

# Convert categorical variables to numerical representations
df['borough_code'] = pd.Categorical(df['borough_code']).codes
#-----
plt.figure(figsize=(12, 6))
sns.lineplot(x=df['date_in_seconds'], y=df['hour_of_call'], hue=df['incident_group'])
plt.title('Temporal Patterns of Incident Groups')
plt.xlabel('Date in Seconds')
plt.ylabel('Hour of Call')
plt.show()


---
**Explanation:**

This code addresses Research Question 2 by visualizing temporal patterns of incident groups over time.
The seaborn library is used to create a line plot showing the relationship between date_in_seconds and hour_of_call, with different colors indicating different incident groups.
The plot helps explore if there are specific patterns or trends in incidents based on the time of the call.

---

In [None]:
# Visualize the influence of borough on incident groups
plt.figure(figsize=(12, 6))
sns.countplot(x='borough_code', hue='incident_group', data=df)
plt.title('Incident Groups by Borough')
plt.xlabel('Borough Code')
plt.ylabel('Count')
plt.show()


---
**Explanation:**

This code addresses Research Question 3 by visualizing the influence of borough on incident groups.
The seaborn library is used to create a count plot showing the distribution of incident groups within each borough (borough_code).
The plot helps explore how incident groups are distributed across different boroughs.

---