# **Cloud Computing Project - Elevator PdM**
###**The PdM - Predictive Maintenance System**
It is a system that aims to predict possible faults before they occur and cause material, maintenance and profit losses at the company level.


This notebook includes an **elevator maintenance prediction system** to predict possible faults before they occur.

Depend on a Cloud Native Approach **(Google Cloud Platform - GCP).**



---

**First: Google Colab Setup**


This section sets up the environment in Google Colab. To installs libraries for working with Google Cloud services, data processing, and machine learning.

In [None]:
# Google Colab Setup
print("Setting up Google Colab environment...")
# Install required libraries
!pip install google-cloud-storage google-cloud-bigquery google-cloud-aiplatform scikit-learn pandas joblib

Setting up Google Colab environment...




---

**Second:Authenticate with Google Cloud**


Upload a Google Cloud Platform (GCP) service account JSON file for authentication.

In [None]:
# Authenticate with Google Cloud
from google.colab import files
print("Upload your GCP service account JSON file.")
uploaded = files.upload()
service_account_file = list(uploaded.keys())[0]

Upload your GCP service account JSON file.


Saving elevator-maintenance-system-5a14e992a0bd.json to elevator-maintenance-system-5a14e992a0bd.json




---
**Third:Set Environment Variable for GCP & Verify GCP Authentication**


Sets the `GOOGLE_APPLICATION_CREDENTIALS` environment variable, enabling GCP authentication for subsequent operations.


In [None]:
import os
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = service_account_file
print("GCP authenticated.")

GCP authenticated.


In [None]:
# Verify GCP authentication
from google.cloud import storage
client = storage.Client()
buckets = list(client.list_buckets())
print("Buckets in GCP project:")
for bucket in buckets:
    print(bucket.name)

Buckets in GCP project:
elevator-backet2
elevator-model-bucket




---

# **Start The Model**
####**Upload and Read Data File**
The elevator DataSet contain 10 features with 112,002 insights.

In [None]:
import pandas as pd
from google.colab import files

print("Upload your data file.")
uploaded = files.upload()
data_file = list(uploaded.keys())[0]

print(f"Data file uploaded: {data_file}")

data = pd.read_csv(data_file)

Upload your data file.


Saving elevator_dataset.csv to elevator_dataset.csv
Data file uploaded: elevator_dataset.csv


In [None]:
# Import libraries
import pandas as pd
import numpy as np
from sklearn.model_selection import StratifiedShuffleSplit, cross_val_score
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, roc_auc_score
from sklearn.utils import resample
import joblib

# **1. Data Processing**

In [None]:
# Data Processing
print("Initial dataset overview...")
data.head()

Initial dataset overview...


Unnamed: 0,ID,revolutions,humidity,vibration,x1,x2,x3,x4,x5,failure_indicator
0,1,93.744,73.999,18.0,167.743,19.745,1.266828,8787.937536,5475.852001,0
1,2,93.74,73.999,18.0,167.739,19.741,1.266774,8787.1876,5475.852001,0
2,3,93.736,73.998,18.0,167.734,19.738,1.266737,8786.437696,5475.704004,0
3,4,93.732,73.998,18.0,167.73,19.734,1.266683,8785.687824,5475.704004,0
4,5,93.729,73.998,18.0,167.727,19.731,1.266642,8785.125441,5475.704004,0


In [None]:
data.describe()

Unnamed: 0,ID,revolutions,humidity,vibration,x1,x2,x3,x4,x5,failure_indicator
count,112001.0,112001.0,112001.0,109563.0,112001.0,112001.0,112001.0,112001.0,112001.0,112001.0
mean,56001.0,46.275195,74.22414,28.340276,120.499335,-27.948945,0.623759,2503.994994,5509.691804,0.255516
std,32332.048087,19.042179,0.684711,24.2925,18.984921,19.123796,0.258677,1874.972912,101.395621,0.436153
min,1.0,16.933,72.399,2.0,90.132,-56.353,0.231328,286.726489,5241.615201,0.0
25%,28001.0,29.651,73.914,8.0,103.85,-44.548,0.399615,879.181801,5463.279396,0.0
50%,56001.0,43.348,74.212,21.28,117.64,-31.443,0.580561,1879.049104,5507.420944,0.0
75%,84001.0,63.997,74.731,39.21,138.119,-10.012,0.86533,4095.616009,5584.722361,1.0
max,112001.0,93.744,75.4,100.0,167.743,19.745,1.266828,8787.937536,5685.16,1.0


In [None]:
print("Dataset info:")
data.info()

Dataset info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 112001 entries, 0 to 112000
Data columns (total 10 columns):
 #   Column             Non-Null Count   Dtype  
---  ------             --------------   -----  
 0   ID                 112001 non-null  int64  
 1   revolutions        112001 non-null  float64
 2   humidity           112001 non-null  float64
 3   vibration          109563 non-null  float64
 4   x1                 112001 non-null  float64
 5   x2                 112001 non-null  float64
 6   x3                 112001 non-null  float64
 7   x4                 112001 non-null  float64
 8   x5                 112001 non-null  float64
 9   failure_indicator  112001 non-null  int64  
dtypes: float64(8), int64(2)
memory usage: 8.5 MB


In [None]:
# Check the NULL values count in every column and then fill the nulls with mean strategy

print("Missing values in each column:")
print(data.isnull().sum())

Missing values in each column:
ID                      0
revolutions             0
humidity                0
vibration            2438
x1                      0
x2                      0
x3                      0
x4                      0
x5                      0
failure_indicator       0
dtype: int64


In [None]:
# Handle missing values in 'vibration' (impute with mean)
data['vibration'].fillna(data['vibration'].mean(), inplace=True)

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  data['vibration'].fillna(data['vibration'].mean(), inplace=True)


In [None]:
print(data.isnull().sum())

ID                   0
revolutions          0
humidity             0
vibration            0
x1                   0
x2                   0
x3                   0
x4                   0
x5                   0
failure_indicator    0
dtype: int64


# **2. Check and Balance Class Distribution**
Checks the class imbalance in the **target** column `failure_indicator` and balances the dataset using **upsampling** for any imbalance is detected.

In [None]:
# Check for class imbalance
target = 'failure_indicator'
class_counts = data[target].value_counts()
print("\nClass distribution:")
print(class_counts)


Class distribution:
failure_indicator
0    83383
1    28618
Name: count, dtype: int64


In [None]:
# Balance the dataset using upsampling

if class_counts.min() / class_counts.max() < 0.5:
    majority_class = data[data[target] == class_counts.idxmax()]
    minority_class = data[data[target] == class_counts.idxmin()]
    minority_class_upsampled = resample(
        minority_class,
        replace=True,
        n_samples=len(majority_class),
        random_state=42
    )
    data = pd.concat([majority_class, minority_class_upsampled])
    print("\nBalanced class distribution:")
    print(data[target].value_counts())


Balanced class distribution:
failure_indicator
0    83383
1    83383
Name: count, dtype: int64


# **3. Feature Selection and Stratified Splitting**
Performs feature selection by excluding ID and target columns, then applies a stratified split to partition the dataset into training, validation, and test sets.

In [None]:
# Feature selection
features = data.drop(columns=['ID', target]).columns
X = data[features]
y = data[target]

In [None]:
from sklearn.model_selection import StratifiedShuffleSplit, train_test_split

# Stratified Split to ensure balanced distribution
sss = StratifiedShuffleSplit(n_splits=1, test_size=0.3, random_state=42)
for train_index, temp_index in sss.split(X, y):
    X_train, X_temp = X.iloc[train_index], X.iloc[temp_index]
    y_train, y_temp = y.iloc[train_index], y.iloc[temp_index]

# Split validation and test sets from the temp data
X_val, X_test, y_val, y_test = train_test_split(X_temp, y_temp, test_size=0.5, random_state=42)

# **4. Train a Random Forest Classifier**
- Trains a Random Forest model on the training set.
- Performs cross-validation and computes metrics such as F1 scores, validation classification report, and ROC-AUC score.

The **ROC-AUC score** measures a model's ability to distinguish between classes.
- ROC (Receiver Operating Characteristic): A curve showing the trade-off between the true positive rate (sensitivity) and false positive rate at different thresholds.
- AUC (Area Under Curve): A single number summarizing the ROC curve. A perfect model has an AUC of 1.0, while random guessing is 0.5.

In [None]:
# Train a Random Forest model
model = RandomForestClassifier(
    random_state=42,
    n_estimators=50,
    max_depth=3,
    min_samples_split=10,
    min_samples_leaf=5
)
model.fit(X_train, y_train)

# Cross-validation
print("Performing cross-validation:\n")
cv_scores = cross_val_score(model, X, y, cv=5, scoring='f1_macro')
print(f"Cross-validation F1 scores: {cv_scores}")

Performing cross-validation:

Cross-validation F1 scores: [1.         1.         0.67704845 0.99994004 1.        ]


In [None]:
# Validation
val_preds = model.predict(X_val)
val_probs = model.predict_proba(X_val)[:, 1]

print("\nValidation Report:")
print(classification_report(y_val, val_preds))


Validation Report:
              precision    recall  f1-score   support

           0       1.00      1.00      1.00     12497
           1       1.00      1.00      1.00     12518

    accuracy                           1.00     25015
   macro avg       1.00      1.00      1.00     25015
weighted avg       1.00      1.00      1.00     25015



In [None]:
# Compute ROC-AUC
roc_auc = roc_auc_score(y_val, val_probs)
print(f"ROC-AUC Score: {roc_auc:.2f}")

ROC-AUC Score: 1.00


# **5. Generate Predictions with Alerts and Recommendations**
### **Then Save Predictions Locally**
Saves the predictions with alerts and recommendations to a CSV file for easy access and review.

In [None]:
# Generate predictions on test set with alerts and recommendations
test_preds = model.predict(X_test)
test_probs = model.predict_proba(X_test)[:, 1]

# Align indices to avoid reindexing errors
data = data.reset_index(drop=True)
predictions_df = X_test.copy()
predictions_df['elevator_id'] = data.loc[X_test.index, 'ID']
predictions_df['prediction'] = test_preds
predictions_df['probability'] = test_probs

# Add alerts and recommendations
predictions_df['alert'] = predictions_df.apply(
    lambda row: 'Critical' if row['prediction'] == 1 and row['vibration'] > 50 else 'Normal',
    axis=1
)

predictions_df['recommendation'] = predictions_df.apply(
    lambda row: (
        'Inspect immediately due to high vibration and failure prediction.'
        if row['alert'] == 'Critical' else
        'Monitor elevator performance.'
    ),
    axis=1
)

In [None]:
# Save predictions with alerts and recommendations
predictions_df.to_csv('predictions_with_alerts.csv', index=False)

In [None]:
# Save model locally
model_filename = 'model.pkl'
joblib.dump(model, model_filename)

['model.pkl']

# **6. Upload Model to Cloud Storage**
Uploads the trained model file to a specified Google Cloud Storage bucket named `elevator-model-bucket` for storage and potential deployment.

In [None]:
# Cloud Storage bucket setup
bucket_name = 'elevator-model-bucket'
model_blob_name = 'models/' + model_filename
blob = client.bucket(bucket_name).blob(model_blob_name)
blob.upload_from_filename(model_filename)
print(f"Model uploaded to bucket: {bucket_name}")

Model uploaded to bucket: elevator-model-bucket


# **7. Verify Model in Cloud Storage**
Confirms whether the model file exists in the designated Cloud Storage location. Lists files in the "models" (`models/model.pkl`)directory for verification.

In [None]:
# Cloud Storage bucket setup
model_blob_path = 'models/model.pkl'
bucket_name = 'elevator-model-bucket'
bucket = client.bucket(bucket_name)

# Check if model file already exists in Cloud Storage
blobs = list(bucket.list_blobs(prefix='models/'))
if not any(blob.name == model_blob_path for blob in blobs):
    blob = bucket.blob(model_blob_path)
    blob.upload_from_filename(model_filename)
    print(f"Model uploaded to Cloud Storage at: gs://{bucket_name}/{model_blob_path}")
else:
    print(f"Model already exists at: gs://{bucket_name}/{model_blob_path}")

# Verify uploaded files
print("Files in the 'models/' directory:")
for blob in blobs:
    print(blob.name)

# Test the artifact URI
artifact_uri = f"gs://{bucket_name}/models/"
test_blob = bucket.blob(model_blob_path)
if test_blob.exists():
    print(f"Model file exists at: {artifact_uri}")
else:
    print("Model file not found at the specified artifact URI!")

Model already exists at: gs://elevator-model-bucket/models/model.pkl
Files in the 'models/' directory:
models/elevator_maintenance_model.pkl
models/model.pkl
Model file exists at: gs://elevator-model-bucket/models/


# **8. Uploading model to Vertex AI Manually through the GCP itself**
After enabling the VertexAI API on the elevator-maintenance-system project on the google cloud console.


I tried to do the uploading via a code snippet but it doesn't work and fail in some point so I did it manually through the console BY:
- **VertexAI / Model Registry** and importing the model form Google Colab in .ipynb format to create the model in name of `elevator-model1`.
- After the model imported successfully and shows in **deployed** deployement status, Click in the model and goes to the **DEPLOY & TEST** to create an **endpoint** and deploy the model to it.
- After the Endpoint been created with `elevator-endpoint` name. The model is ready to test and predict.







In [None]:
# A print Statement to tell how the Vertex AI been initialize and the model Deployed in it.

print("Model uploaded to Cloud Storage. Proceed to GCP for manual deployment.")

Model uploaded to Cloud Storage. Proceed to GCP for manual deployment.


In [None]:
#Leave this experiment to show how deploying the model into VertexAI through a python code goes bad.

'''from google.cloud import aiplatform

# Initialize Vertex AI
project_id = 'elevator-maintenance-system'
region = 'us-central1'
aiplatform.init(project=project_id, location=region)

try:
    print("Uploading model to Vertex AI...")
    model = aiplatform.Model.upload(
        display_name="elevator_maintenance_model",
        artifact_uri=f"gs://{bucket_name}/models/",
        serving_container_image_uri="us-docker.pkg.dev/vertex-ai/prediction/sklearn-cpu.0-24:latest",
    )
    print("Model uploaded. Deploying endpoint...")

    # Deploy the model
    endpoint = model.deploy(
        machine_type="n1-standard-2",
        traffic_split={"0": 100},  # Direct all traffic to the deployed model
    )
    print("Model deployed successfully at endpoint:", endpoint.resource_name)
except Exception as e:
    print("Deployment failed with error:", str(e))'''

'from google.cloud import aiplatform\n\n# Initialize Vertex AI\nproject_id = \'elevator-maintenance-system\'\nregion = \'us-central1\'\naiplatform.init(project=project_id, location=region)\n\ntry:\n    print("Uploading model to Vertex AI...")\n    model = aiplatform.Model.upload(\n        display_name="elevator_maintenance_model",\n        artifact_uri=f"gs://{bucket_name}/models/",\n        serving_container_image_uri="us-docker.pkg.dev/vertex-ai/prediction/sklearn-cpu.0-24:latest",\n    )\n    print("Model uploaded. Deploying endpoint...")\n\n    # Deploy the model\n    endpoint = model.deploy(\n        machine_type="n1-standard-2",\n        traffic_split={"0": 100},  # Direct all traffic to the deployed model\n    )\n    print("Model deployed successfully at endpoint:", endpoint.resource_name)\nexcept Exception as e:\n    print("Deployment failed with error:", str(e))'

# **9. Prepare JSON Payload for Vertex AI**
The main purpose of create a JSON payload with test instances is for use with Google Vertex AI and saves it locally for easy integration.

- But I use it for testing the deploy of the model and its prediction just.



In [None]:
import pandas as pd
import json

test_instance = X_test.iloc[0].to_dict()

# Create JSON payload for Vertex AI
json_request = {
    "instances": [test_instance]
}

# Save JSON to file for easy copying
with open("test_payload.json", "w") as json_file:
    json.dump(json_request, json_file, indent=2)

print(json.dumps(json_request, indent=2))


{
  "instances": [
    {
      "revolutions": 47.036,
      "humidity": 75.176,
      "vibration": 8.97,
      "x1": 122.212,
      "x2": -28.14,
      "x3": 0.625678408,
      "x4": 2212.385296,
      "x5": 5651.430976
    }
  ]
}


# **10. Prepare Input Data for Batch Prediction**
After deploy the model to an endpoint in VertexAI there is a section near to the DEPLOY & TEST that done on it the creation of endpoint called
**BATCH PREDICT**


The preparation of the input file is to get a file with all data features without the target feature or column. Used to make predictions on a large dataset all at once, instead of one at a time. It's useful for efficiently processing and analyzing data in bulk, especially for tasks like forecasting, classification, or generating alerts across many inputs.

In [None]:
features = ['revolutions', 'humidity', 'vibration', 'x1', 'x2', 'x3', 'x4', 'x5']

In [None]:
# Select the feature columns from the test dataset
input_data = X_test.copy()

# Save the feature data to a CSV file
input_data.to_csv('input-data.csv', index=False)

print("Input data for batch prediction saved as 'input-data.csv'.")


Input data for batch prediction saved as 'input-data.csv'.


In [None]:
# The input data file been uploaded to the cloud storage in the project Cloud Storage Bucket named elevator-model-bucket.

input_blob = bucket.blob('input-data.csv')
input_blob.upload_from_filename('input-data.csv')

print("Input data uploaded to Cloud Storage at: gs://elevator-model-bucket/input-data.csv")


Input data uploaded to Cloud Storage at: gs://elevator-model-bucket/input-data.csv


# **Batch Prediction**
Here I also decide to do the batch predict on the GCP itself through the Vertex AI.

These code snippets guide me to see the process in the batch prediction how's goes on and if all the needs to be done is available in the notebook also.

In [None]:
batch_input = 'gs://elevator-model-bucket/input-data.csv'  # here the batch_input that been created previously
batch_output = 'gs://elevator-model-bucket/output-predictions/'
# the batch_output represent the output folder that the predictions from the input will take a place in it on the cloud storage.
# in the same bucket that the model pickled on it.

# Shows that the Batch Prediction will take the input from the batch_input file that created previously and make the predictions
# in which elevators need maintenance and which not then save it on the batch_output folder in CSV or JSONL file format.

In [None]:
# List files in the bucket to confirm upload
blobs = list(bucket.list_blobs(prefix='input-data.csv'))
print("Files in the bucket:")
for blob in blobs:
    print(blob.name)

# The Output shows that the bucket just have the input file.

Files in the bucket:
input-data.csv


In [None]:
# Here getting what been looking for in the bucket (Both input and output files)

from google.cloud import storage

# Initialize Cloud Storage client
client = storage.Client()

# Specify bucket and paths
bucket_name = 'elevator-model-bucket'
input_file = 'input-data.csv'
output_folder = 'output-predictions/'

# Check input file
bucket = client.bucket(bucket_name)
input_blob = bucket.blob(input_file)

if input_blob.exists():
    print(f"Input file exists at: gs://{bucket_name}/{input_file}")
else:
    print(f"Input file NOT found at: gs://{bucket_name}/{input_file}")

# Check output folder
blobs = list(bucket.list_blobs(prefix=output_folder))
if blobs:
    print(f"Output folder exists at: gs://{bucket_name}/{output_folder}")
else:
    print(f"Output folder NOT found or empty: gs://{bucket_name}/{output_folder}")


# The Output shows Both Files exists in the Cloud Storage on the elevator-model-bucket.

Input file exists at: gs://elevator-model-bucket/input-data.csv
Output folder exists at: gs://elevator-model-bucket/output-predictions/


# **11. Load Model and Generate Predictions**
Loads the previously trained model and generates predictions on the test data. Adds alerts and recommendations based on prediction results.
- get a CSV file that contain the predictions.

In [None]:
import pandas as pd
import joblib
from google.cloud import storage

# Load the saved model
model_filename = "model.pkl"
model = joblib.load(model_filename)

# Prepare test data
# Ensure the test data is loaded with only the features used for predictions

features = ['revolutions', 'humidity', 'vibration', 'x1', 'x2', 'x3', 'x4', 'x5']
X_test = pd.read_csv('input-data.csv')

# Generate predictions
predictions = model.predict(X_test)
prediction_probabilities = model.predict_proba(X_test)[:, 1]

# Add predictions to the DataFrame
X_test["predicted_label"] = predictions
X_test["failure_probability"] = prediction_probabilities

# Add alerts and recommendations
X_test["alert"] = X_test.apply(
    lambda row: "Critical" if row["predicted_label"] == 1 and row["vibration"] > 50 else "Normal",
    axis=1
)
X_test["recommendation"] = X_test.apply(
    lambda row: (
        "Inspect immediately due to high vibration and failure prediction."
        if row["alert"] == "Critical" else "Monitor elevator performance."
    ),
    axis=1
)

# Save the results locally
output_file = "predictions_with_alerts.csv"
X_test.to_csv(output_file, index=False)
print(f"Predictions saved locally to {output_file}")


Predictions saved locally to predictions_with_alerts.csv


# **12. Save Predictions and Upload to Cloud Storage**
Uploads the prediction results to a specified Cloud Storage location for sharing or further analysis.

In [None]:
# Set up Google Cloud Storage client
storage_client = storage.Client()
bucket_name = "elevator-model-bucket"
output_blob_name = "predictions/predictions_with_alerts.csv"

# Upload the file
bucket = storage_client.bucket(bucket_name)
blob = bucket.blob(output_blob_name)
blob.upload_from_filename(output_file)

print(f"Predictions uploaded to: gs://{bucket_name}/{output_blob_name}")


Predictions uploaded to: gs://elevator-model-bucket/predictions/predictions_with_alerts.csv
