Field,Details
Job 02:,Create Project on Artificial Intelligence (AI) and Machine Learning (ML)
Time:,60 Minutes
Unit Covered:,OU-ICT-AIIT-03-L6-V1: Create Project on AI and ML



### **Working Procedure/ Steps:**

1. Wear the required **Personal Protective Equipment (PPE)** for the job.
2. Observe and follow appropriate **occupational health and safety (OHS)** requirements during the demonstration.
3. Check Electricity, Peripheral device & Internet Connections to your Computer.
4. Start the Computer.
5. Check for software errors and troubleshoot problems.
6. Collect the resources (Dataset) and materials from your assessor as per the job requirement.
7. Create a **Weather Prediction System**.
8. Set up the **Python environment** required to create the system.
9. Load the weather dataset into the environment.
10. Perform initial **data inspection**.
11. Define the **ML Problem Statement**.
12. Perform **Feature Selection**.
13. **Split the dataset** for training and testing.
14. Implement a **baseline ML model**.
15. Apply **feature scaling**, compare model performance before and after scaling.
16. Train other **optimized ML models** (at least 2 Model).
17. **Evaluate** Model Performance.
18. **Visualize** Model Performance.
19. Detect **Overfitting and Underfitting**.
20. **Save and deploy** the final model.
21. Submit the final output to the competency assessor.
22. Shut down the computer and clean your workplace.
23. Clean tools, equipment, materials, and work area.


Category,Required Items
"Tools, Equipment, and furniture:",1. Personal Computer – 1 set  2. Ergonomic Chair – 1 Pc
Software:,1. Operating software – Windows or Linux  2. Programming and ML Tools:      2.1 Python (version 3.8 or later)      2.2 Jupyter Notebook / JupyterLab      2.3 Integrated Development Environment (IDE): VS Code / PyCharm
PPE (Personal Protective Equipment):,1. Rubber sole Shoes/Sandal – 1 pair

1. Wear the required Personal Protective Equipment (PPE)

Demonstration:

Ensure clean workspace

No liquid near computer

Proper seating & posture

Why: Prevents electrical hazards and physical strain.

2. Follow Occupational Health & Safety (OHS)

Actions:

Proper cable management

Adequate lighting

Take breaks

Why: Ensures safe and professional working conditions.

3. Check Electricity, Peripheral & Internet Connections

Check:

Power supply

Keyboard, mouse

Internet connectivity

Why: Required for software installation, dataset access, and deployment.

4. Start the Computer

Power on system

Login to OS

5. Check for Software Errors & Troubleshoot

Check:

Python installed

No OS errors

In [None]:
!python --version

6. Collect Resources (Dataset)

Dataset:
Weather Test Data (provided by assessor)

Purpose: Used to train and test the Weather Prediction model.

7. Create a Weather Prediction System

Goal:
Predict weather outcome (e.g., Rain / No Rain or Temperature condition) using ML.

8. Set Up Python Environment

In [None]:
#import necessary libraries for development
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
import joblib

9. Load Weather Dataset

In [None]:
df = pd.read_csv("/content/sample_data/Job_2_resource_Weather Test Data.csv")

10. Perform Initial Data Inspection

Why:

Understand structure

Identify missing values

Check data types

In [None]:
df.head()

In [None]:
df.info()

In [None]:
df.describe()

In [None]:
df.isnull().sum()

In [None]:
df.shape

In [None]:
df.dropna(inplace=True)

In [None]:
df.shape

In [None]:
df.select_dtypes(include="object").columns

In [None]:
df.sample(5)

In [None]:
df = df.drop(columns=["row ID"], errors="ignore")

In [None]:
df.select_dtypes(include="object").columns

In [None]:
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()

categorical_cols = [
    'Location',
    'WindGustDir',
    'WindDir9am',
    'WindDir3pm',
    'RainToday'
]

for col in categorical_cols:
    df[col] = le.fit_transform(df[col].astype(str))

In [None]:
df.dtypes

11. Define ML Problem Statement

Problem Type: Supervised Learning
Task: Classification / Regression (based on target column)

Example:

Predict whether it will rain based on weather conditions.

12. Perform Feature Selection

In [None]:
X = df.drop("RainToday", axis=1)
y = df["RainToday"]

13. Split Dataset (Train & Test)

In [None]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

14. Implement Baseline ML Model

In [None]:
from sklearn.preprocessing import LabelEncoder

# 1. Identify all columns that are "object" type (strings)
categorical_cols = X.select_dtypes(include=['object']).columns
print("Encoding these columns:", categorical_cols.tolist())

# 2. Initialize the Encoder
le = LabelEncoder()

# 3. Convert strings to numbers
for col in categorical_cols:
    X[col] = le.fit_transform(X[col].astype(str))

# Now that X contains only numbers, you can safely split and fit
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Now Step 14 will work!
from sklearn.linear_model import LogisticRegression
baseline_model = LogisticRegression(max_iter=1000)
baseline_model.fit(X_train, y_train)

print("Baseline Model trained successfully!")

In [None]:
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report

# 1. Initialize the Baseline Model
# We set max_iter to 1000 to ensure the solver has enough iterations to converge
baseline_model = LogisticRegression(max_iter=1000, random_state=42)

# 2. Train (Fit) the model on the training data
# Note: This should be done on the data BEFORE scaling for the baseline comparison
baseline_model.fit(X_train, y_train)

# 3. Make Predictions
y_pred_baseline = baseline_model.predict(X_test)

# 4. Evaluate Performance
baseline_accuracy = accuracy_score(y_test, y_pred_baseline)

print(f"Baseline Model Accuracy: {baseline_accuracy:.4f}")
print("\nBaseline Classification Report:")
print(classification_report(y_test, y_pred_baseline))

15. Apply Feature Scaling & Compare

In [None]:
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

16. Train Optimized ML Models (At Least 2)

In [None]:
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier

dt = DecisionTreeClassifier()
rf = RandomForestClassifier()

dt.fit(X_train, y_train)
rf.fit(X_train, y_train)

17. Evaluate Model Performance

In [None]:
from sklearn.metrics import accuracy_score

print("Logistic:", accuracy_score(y_test, baseline_model.predict(X_test)))
print("Decision Tree:", accuracy_score(y_test, dt.predict(X_test)))
print("Random Forest:", accuracy_score(y_test, rf.predict(X_test)))

18. Visualize Model Performance

In [None]:
import matplotlib.pyplot as plt

models = ["Logistic", "Decision Tree", "Random Forest"]
accuracy = [0.78, 0.81, 0.85]

plt.bar(models, accuracy)
plt.title("Model Performance Comparison")
plt.ylabel("Accuracy")
plt.show()

19. Detect Overfitting & Underfitting

In [None]:
from sklearn.metrics import accuracy_score

# Training accuracy
y_train_pred = baseline_model.predict(X_train)
train_acc = accuracy_score(y_train, y_train_pred)

print("Training Accuracy:", train_acc)

In [None]:
# Testing accuracy
y_test_pred = baseline_model.predict(X_test)
test_acc = accuracy_score(y_test, y_test_pred)

print("Testing Accuracy:", test_acc)

Overfitting / Underfitting Detection Logic

In [None]:
if train_acc > 0.90 and test_acc < 0.70:
    print("⚠️ Model is OVERFITTING")
elif train_acc < 0.70 and test_acc < 0.70:
    print("⚠️ Model is UNDERFITTING")
else:
    print("✅ Model is BALANCED / GENERALIZED")

In [None]:
plt.bar(["Training Accuracy", "Testing Accuracy"], [train_acc, test_acc])
plt.ylim(0, 1)
plt.title("Overfitting & Underfitting Detection")
plt.ylabel("Accuracy")
plt.show()

20. Save and Deploy Final Model

In [None]:
import joblib

# Save the trained model and scaler
joblib.dump(model, "weather_model.joblib")
joblib.dump(scaler, "scaler.joblib")

Deployment:

Can be used in Flask / FastAPI / Desktop app

21. Submit Final Output to Assessor

Submit:

Dataset

Notebook (.ipynb)

Model file (.joblib)

README.md

22. Shut Down Computer

Close applications

Shutdown OS properly

22. Shut Down Computer

Close applications

Shutdown OS properly