**Step 1: Load Dataset from Azure ML**
Now that my dataset "Banking_Customer" is properly uploaded and registered in Azure ML, I will:
1. Connect to the Azure ML Workspace.
2. Retrieve the dataset using Dataset.get_by_name().
3. Convert it into a Pandas DataFrame.
4. Display the first few rows to verify that the dataset is loaded correctly.


In [7]:
from azureml.core import Workspace, Dataset
import pandas as pd

# Connect to Azure ML Workspace
ws = Workspace.from_config()

# Load dataset (Ensure the dataset name matches exactly)
dataset = Dataset.get_by_name(ws, name="Banking_Customer")

# Convert dataset directly to Pandas DataFrame
df = dataset.to_pandas_dataframe()

# Display first few rows to verify loading
df.head()


{'infer_column_types': 'False', 'activity': 'to_pandas_dataframe'}
{'infer_column_types': 'False', 'activity': 'to_pandas_dataframe', 'activityApp': 'TabularDataset'}


Unnamed: 0,customer_id,credit_score,country,gender,age,tenure,balance,products_number,credit_card,active_member,estimated_salary,churn
0,15634602,619,France,Female,42,2,0.0,1,1,1,101348.88,1
1,15647311,608,Spain,Female,41,1,83807.86,1,0,1,112542.58,0
2,15619304,502,France,Female,42,8,159660.8,3,1,0,113931.57,1
3,15701354,699,France,Female,39,1,0.0,2,0,0,93826.63,0
4,15737888,850,Spain,Female,43,2,125510.82,1,1,1,79084.1,0


**Step 2: Data Preprocessing - Handle Missing Values**
Missing values in a dataset can lead to incorrect model predictions. 
We will check for missing values and handle them by:
- Filling missing numerical values with the column mean.
- Dropping rows with missing categorical values.


In [9]:
# Check for missing values
print("Missing values per column:\n", df.isnull().sum())

# Fill missing numerical values with column mean (avoid warning)
df.fillna(df.mean(numeric_only=True), inplace=True)

# Drop rows with missing categorical values (if any)
df.dropna(inplace=True)

# Verify if missing values are handled
print("Missing values after handling:\n", df.isnull().sum())


Missing values per column:
 customer_id         0
credit_score        0
country             0
gender              0
age                 0
tenure              0
balance             0
products_number     0
credit_card         0
active_member       0
estimated_salary    0
churn               0
dtype: int64
Missing values after handling:
 customer_id         0
credit_score        0
country             0
gender              0
age                 0
tenure              0
balance             0
products_number     0
credit_card         0
active_member       0
estimated_salary    0
churn               0
dtype: int64


**Step 3: Convert Categorical Data to Numeric**
Machine learning models require numerical data. We will:
- Encode categorical variables (e.g., Gender, Geography) into numerical form.


In [11]:
print("Column Names in Dataset:\n", df.columns)


Column Names in Dataset:
 Index(['customer_id', 'credit_score', 'country', 'gender', 'age', 'tenure',
       'balance', 'products_number', 'credit_card', 'active_member',
       'estimated_salary', 'churn'],
      dtype='object')


In [12]:
# Convert categorical variables into numeric

# Encoding 'gender' (lowercase in dataset)
df['gender'] = df['gender'].map({'Male': 0, 'Female': 1})

# Encoding 'country' (instead of 'Geography')
df['country'] = df['country'].astype('category').cat.codes  

# Verify changes
df.head()


Unnamed: 0,customer_id,credit_score,country,gender,age,tenure,balance,products_number,credit_card,active_member,estimated_salary,churn
0,15634602,619,0,1,42,2,0.0,1,1,1,101348.88,1
1,15647311,608,2,1,41,1,83807.86,1,0,1,112542.58,0
2,15619304,502,0,1,42,8,159660.8,3,1,0,113931.57,1
3,15701354,699,0,1,39,1,0.0,2,0,0,93826.63,0
4,15737888,850,2,1,43,2,125510.82,1,1,1,79084.1,0


**Step 4: Feature Selection**
Not all columns are useful for churn prediction. We will:
- Select only relevant features for training.
- Drop unnecessary columns like Customer ID, Name, etc.


In [15]:
# Select relevant features and target variable (using correct column names)
X = df[['credit_score', 'age', 'balance', 'products_number', 'active_member']]
y = df['churn']  # Correct target column name

# Display selected features
X.head()


Unnamed: 0,credit_score,age,balance,products_number,active_member
0,619,42,0.0,1,1
1,608,41,83807.86,1,1
2,502,42,159660.8,3,0
3,699,39,0.0,2,0
4,850,43,125510.82,1,1


**Step 5: Split Data into Training & Testing Sets**
To evaluate our model, we divide our dataset into:
- Training Set (80%): Used to train the machine learning model.
- Testing Set (20%): Used to check the model’s accuracy.
This helps in understanding how well the model performs on unseen data.


In [16]:
from sklearn.model_selection import train_test_split  

# Split data into training (80%) and testing (20%)  
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)  

# Display dataset shapes
print(f"Training data: {X_train.shape}, Testing data: {X_test.shape}")


Training data: (8000, 5), Testing data: (2000, 5)


**Step 6: Train a Machine Learning Model**
I will use Logistic Regression, a simple and effective model for predicting churn.
The model learns patterns from training data and makes predictions on unseen customers.


In [17]:
from sklearn.linear_model import LogisticRegression  

# Train the Logistic Regression model  
model = LogisticRegression()  
model.fit(X_train, y_train)  

print("Model training completed.")


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


In [18]:
from sklearn.preprocessing import StandardScaler

# Scale the features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Train Logistic Regression on scaled data
model = LogisticRegression(max_iter=500)
model.fit(X_train_scaled, y_train)

print("Model training completed.")


Model training completed.


**Step 7: Evaluate Model Performance**
Once the model is trained, we will:
- Measure accuracy
- Generate a classification report (precision, recall, and F1-score)
- Check the ROC Curve to understand performance


In [20]:
from sklearn.metrics import accuracy_score, classification_report  

# Make sure X_test is scaled before predicting
y_pred = model.predict(X_test_scaled)  # Use the scaled version!

# Print accuracy and classification report  
print("Accuracy:", accuracy_score(y_test, y_pred))  
print("\nClassification Report:\n", classification_report(y_test, y_pred))


Accuracy: 0.8105

Classification Report:
               precision    recall  f1-score   support

           0       0.82      0.97      0.89      1607
           1       0.57      0.15      0.24       393

    accuracy                           0.81      2000
   macro avg       0.70      0.56      0.57      2000
weighted avg       0.77      0.81      0.76      2000



**Step 8: Deploy Model on Azure Machine Learning**
Once the model is trained, we deploy it using Azure ML to enable real-time predictions.


In [22]:
import joblib

# Save the trained model
joblib.dump(model, "banking_churn_model.pkl")

print("Model saved successfully.")


Model saved successfully.


In [23]:
import os

# Get the current directory
print("Current working directory:", os.getcwd())

# List all files in the directory
print("Files in directory:", os.listdir())


Current working directory: /mnt/batch/tasks/shared/LS_root/mounts/clusters/bank-churn-ml-instance/code/Users/rawatp181
Files in directory: ['.amlignore', '.amlignore.amltmp', '.ipynb_aml_checkpoints', 'Banking Customer Churn Prediction.ipynb', 'banking customer churn prediction.ipynb.amltmp', 'banking_churn_model.pkl']


In [25]:
import os

# List all files in the directory
print("Files in the current directory:\n", os.listdir())


Files in the current directory:
 ['.amlignore', '.amlignore.amltmp', '.ipynb_aml_checkpoints', 'Banking Customer Churn Prediction.ipynb', 'banking customer churn prediction.ipynb.amltmp', 'banking_churn_model.pkl']


In [29]:
import shutil
import os

# Define source and destination paths
source_file = "banking_churn_model.pkl"
destination_path = os.path.join(os.getcwd(), source_file)  # Move it to the current working directory

# Move the file
shutil.move(source_file, destination_path)

print(f"✅ Model moved successfully! Now download it from: {destination_path}")


✅ Model moved successfully! Now download it from: /mnt/batch/tasks/shared/LS_root/mounts/clusters/bank-churn-ml-instance/code/Users/rawatp181/banking_churn_model.pkl


In [31]:
import os

file_path = "/mnt/batch/tasks/shared/LS_root/mounts/clusters/bank-churn-ml-instance/code/Users/rawatp181/banking_churn_model.pkl"

# Check if file exists
if os.path.exists(file_path):
    print("✅ Model file exists at:", file_path)
else:
    print("❌ Model file not found.")


✅ Model file exists at: /mnt/batch/tasks/shared/LS_root/mounts/clusters/bank-churn-ml-instance/code/Users/rawatp181/banking_churn_model.pkl


In [32]:
import os

directory_path = "/mnt/batch/tasks/shared/LS_root/mounts/clusters/bank-churn-ml-instance/code/Users/rawatp181/"

# List all files
print("Files in directory:\n", os.listdir(directory_path))


Files in directory:
 ['.amlignore', '.amlignore.amltmp', '.ipynb_aml_checkpoints', 'Banking Customer Churn Prediction.ipynb', 'banking customer churn prediction.ipynb.amltmp', 'banking_churn_model.pkl']


In [33]:
import shutil

source_path = "/mnt/batch/tasks/shared/LS_root/mounts/clusters/bank-churn-ml-instance/code/Users/rawatp181/banking_churn_model.pkl"
destination_path = "/mnt/data/banking_churn_model.pkl"

try:
    shutil.move(source_path, destination_path)
    print("✅ Model moved successfully. Now download it from /mnt/data/")
except Exception as e:
    print("❌ Error:", e)


❌ Error: [Errno 2] No such file or directory: '/mnt/data/banking_churn_model.pkl'


In [34]:
import os

source_path = "/mnt/batch/tasks/shared/LS_root/mounts/clusters/bank-churn-ml-instance/code/Users/rawatp181/banking_churn_model.pkl"
destination_path = "/mnt/data/banking_churn_model.pkl"

# Check if source file exists
if os.path.exists(source_path):
    print("✅ Source file exists.")
else:
    print("❌ Source file does NOT exist.")

# Check write permission in destination
if os.access("/mnt/data/", os.W_OK):
    print("✅ Write permission available in /mnt/data/")
else:
    print("❌ No write permission in /mnt/data/")


✅ Source file exists.
❌ No write permission in /mnt/data/


In [36]:
import shutil
import os

source_path = "/mnt/batch/tasks/shared/LS_root/mounts/clusters/bank-churn-ml-instance/code/Users/rawatp181/banking_churn_model.pkl"
zip_path = os.path.expanduser("~/banking_churn_model.zip")  # Save zip in home directory

try:
    shutil.make_archive(zip_path.replace(".zip", ""), 'zip', os.path.dirname(source_path))
    print(f"✅ Model zipped successfully! Download it from: {zip_path}")
except Exception as e:
    print("❌ Error:", e)


✅ Model zipped successfully! Download it from: /home/azureuser/banking_churn_model.zip


In [37]:
import os

# List all directories and files in your working directory
print("Current working directory:", os.getcwd())
print("Files in directory:", os.listdir(os.getcwd()))


Current working directory: /mnt/batch/tasks/shared/LS_root/mounts/clusters/bank-churn-ml-instance/code/Users/rawatp181
Files in directory: ['.amlignore', '.amlignore.amltmp', '.ipynb_aml_checkpoints', 'Banking Customer Churn Prediction.ipynb', 'banking customer churn prediction.ipynb.amltmp', 'banking_churn_model.pkl']


In [38]:
import shutil

source_path = "/mnt/batch/tasks/shared/LS_root/mounts/clusters/bank-churn-ml-instance/code/Users/rawatp181/banking_churn_model.pkl"
destination_path = "/mnt/data/banking_churn_model.pkl"

try:
    shutil.copy(source_path, destination_path)
    print(f"✅ File copied successfully! Download it from: {destination_path}")
except Exception as e:
    print(f"❌ Error copying file: {e}")


❌ Error copying file: [Errno 2] No such file or directory: '/mnt/data/banking_churn_model.pkl'


**Step 1: Load the Trained Model & Test Data**

In this step, we will:
- Load the trained churn prediction model (`.pkl` file) from Azure Notebook.
- Load the test dataset (`CSV` file).
- Select the required features for predictions.


In [40]:
with open("banking_churn_model.pkl", "rb") as file:
    content = file.read(10)  # Read first 10 bytes to check if it's a valid pickle
print(content)


b'\x80\x04\x95\t\x02\x00\x00\x00\x00\x00'


In [42]:
import pickle

# Save the model properly again
model_filename = "banking_churn_model.pkl"
with open(model_filename, "wb") as file:
    pickle.dump(model, file)

print("Model saved successfully!")


Model saved successfully!


In [44]:
import os

file_size = os.path.getsize("banking_churn_model.pkl")
print(f"File size: {file_size} bytes")



File size: 755 bytes


In [45]:
import os
print(os.getcwd())  # Print the current working directory
print(os.listdir())  # List all files in the directory


/mnt/batch/tasks/shared/LS_root/mounts/clusters/bank-churn-ml-instance/code/Users/rawatp181
['.amlignore', '.amlignore.amltmp', '.ipynb_aml_checkpoints', 'Banking Customer Churn Prediction.ipynb', 'banking customer churn prediction.ipynb.amltmp', 'banking_churn_model.pkl']


In [46]:
import pickle

# Load the model
with open("banking_churn_model.pkl", "rb") as file:
    model = pickle.load(file)

print("✅ Model loaded successfully!")


✅ Model loaded successfully!


In [48]:
import os

# Print current working directory
print("Current Directory:", os.getcwd())

# List all files in the directory
print("Files Available:", os.listdir())


Current Directory: /mnt/batch/tasks/shared/LS_root/mounts/clusters/bank-churn-ml-instance/code/Users/rawatp181
Files Available: ['.amlignore', '.amlignore.amltmp', '.ipynb_aml_checkpoints', 'Banking Customer Churn Prediction.ipynb', 'banking customer churn prediction.ipynb.amltmp', 'banking_churn_model.pkl']
