# Task 1: AI-Powered Code Completion

In [1]:
#Manual code
def sort_dicts_by_key(dict_list, key):
    return sorted(dict_list, key=lambda x: x[key])

people = [
    {"name": "Alice", "age": 25},
    {"name": "Bob", "age": 20},
    {"name": "Charlie", "age": 30}
]

sorted_people = sort_dicts_by_key(people, "age")

for person in sorted_people:
    print(person)


{'name': 'Bob', 'age': 20}
{'name': 'Alice', 'age': 25}
{'name': 'Charlie', 'age': 30}


In [2]:
#Github copilot
def fancy_sort_dicts(dict_list, sort_key, reverse=False, missing_policy='last', key_transform=lambda x: x):
    """
    Sort a list of dictionaries by a specific key in a fancy way.

    Args:
        dict_list (list): List of dictionaries to sort.
        sort_key (str): The key to sort by.
        reverse (bool): If True, sort in descending order.
        missing_policy (str): What to do if the key is missing in a dict.
            Options: 'last' (put missing at end), 'first' (put missing at start), 'ignore' (skip dicts missing key).
        key_transform (callable): Function to transform key before sorting (e.g., str.lower).

    Returns:
        list: Sorted list of dictionaries.
    """
    if missing_policy == 'ignore':
        filtered = [d for d in dict_list if sort_key in d]
    else:
        filtered = dict_list[:]

    def sort_helper(d):
        if sort_key not in d:
            # Large or small value depending on missing_policy
            return (float('inf') if missing_policy == 'last' else float('-inf'))
        value = d[sort_key]
        try:
            return key_transform(value)
        except Exception:
            return value

    return sorted(filtered, key=sort_helper, reverse=reverse)

# Example usage:
if __name__ == "__main__":
    people = [
        {"name": "Alice", "age": 30},
        {"name": "Bob", "age": 25},
        {"name": "Charlie"},
        {"name": "Dave", "age": 35},
        {"name": "Eve", "age": 28}
    ]

    print("Sort by age ascending, missing last:")
    print(fancy_sort_dicts(people, 'age'))

    print("\nSort by age descending, missing first:")
    print(fancy_sort_dicts(people, 'age', reverse=True, missing_policy='first'))

    print("\nSort by name (case-insensitive), missing last:")
    print(fancy_sort_dicts(people, 'name', key_transform=str.lower))

    print("\nSort by age, ignore missing:")
    print(fancy_sort_dicts(people, 'age', missing_policy='ignore'))

Sort by age ascending, missing last:
[{'name': 'Bob', 'age': 25}, {'name': 'Eve', 'age': 28}, {'name': 'Alice', 'age': 30}, {'name': 'Dave', 'age': 35}, {'name': 'Charlie'}]

Sort by age descending, missing first:
[{'name': 'Dave', 'age': 35}, {'name': 'Alice', 'age': 30}, {'name': 'Eve', 'age': 28}, {'name': 'Bob', 'age': 25}, {'name': 'Charlie'}]

Sort by name (case-insensitive), missing last:
[{'name': 'Alice', 'age': 30}, {'name': 'Bob', 'age': 25}, {'name': 'Charlie'}, {'name': 'Dave', 'age': 35}, {'name': 'Eve', 'age': 28}]

Sort by age, ignore missing:
[{'name': 'Bob', 'age': 25}, {'name': 'Eve', 'age': 28}, {'name': 'Alice', 'age': 30}, {'name': 'Dave', 'age': 35}]


#         Analysis
# Manual Code
The manual code is simple and works perfectly for one specific job: sorting a list of information (dictionaries) based on one piece of information inside each, as long as that piece of information is always there.
It is good for quick, straightforward sorting where you're sure every item has the key you're sorting by.
Limitation: If even one item in your list is missing that key, your code will stop working and show an error. It's like trying to hammer a nail when you don't have a nail
 # AI Code

 This code is much more advanced. It can do everything your code does, but it also handles many other situations automatically.
For Example
 Handles Missing Information: Imagine some items in your list don't have the key you want to sort by (like a person with no age listed). This code can decide what to do with them: put them at the beginning, at the end, or completely ignore them. The manual code would crash in this situation.
 Flexible Sorting: You can tell it to sort forwards or backward. You can also tell it to change the information before sorting (like sorting names alphabetically without caring about uppercase or lowercase letters).
Less Work for You: Because it's so flexible, you don't have to write a new piece of code every time you have a slightly different sorting need. This saves you time and prevents mistakes. Which is more efficient.
For the very simple task of sorting where everything is perfect, both codes are equally fast at the core sorting step. However, the GitHub Copilot code is more "efficient" overall for real-world use.
In conclusion the manual code is efficient for one tiny, specific task while the Copilot code is efficient because it's like having a multi-tool. It can handle many different sorting problems without you having to build a new tool each time. It prevents errors that your code would face, saving you troubleshooting time and making your programs more reliable hence more robust and ultimately more productive in a wider range of situations.



# Task 3: Predictive Analytics for Resource Allocation

In [13]:
import kagglehub

# Download latest version
path = kagglehub.dataset_download("reihanenamdari/breast-cancer")

print("Path to dataset files:", path)

Downloading from https://www.kaggle.com/api/v1/datasets/download/reihanenamdari/breast-cancer?dataset_version_number=1...


100%|██████████| 42.8k/42.8k [00:00<00:00, 29.4MB/s]

Extracting files...
Path to dataset files: /root/.cache/kagglehub/datasets/reihanenamdari/breast-cancer/versions/1





In [25]:
#  Import Libraries
import pandas as pd
import numpy as np
import os
import kagglehub
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import accuracy_score, f1_score, classification_report

In [26]:

# Download the dataset
path = kagglehub.dataset_download("reihanenamdari/breast-cancer")

# List files in the directory
print("Files in dataset folder:")
for file in os.listdir(path):
    print(file)

Files in dataset folder:
Breast_Cancer.csv


In [27]:
df = pd.read_csv(f"{path}/Breast_Cancer.csv")


In [28]:
# Quick Exploration
print(df.head())
print(df.columns)
print(df.info())
print(df['Status'].unique())  # Check values in target column

   Age   Race Marital Status T Stage  N Stage 6th Stage  \
0   68  White        Married       T1      N1       IIA   
1   50  White        Married       T2      N2      IIIA   
2   58  White       Divorced       T3      N3      IIIC   
3   58  White        Married       T1      N1       IIA   
4   47  White        Married       T2      N1       IIB   

               differentiate Grade   A Stage  Tumor Size Estrogen Status  \
0      Poorly differentiated     3  Regional           4        Positive   
1  Moderately differentiated     2  Regional          35        Positive   
2  Moderately differentiated     2  Regional          63        Positive   
3      Poorly differentiated     3  Regional          18        Positive   
4      Poorly differentiated     3  Regional          41        Positive   

  Progesterone Status  Regional Node Examined  Reginol Node Positive  \
0            Positive                      24                      1   
1            Positive                      1

In [30]:
# Create 'priority' label (Target)
df['priority'] = df['Status'].apply(lambda x: 'high' if str(x).strip().lower() == 'dead' else 'low')

In [31]:
# Encode 'priority'
label_encoder = LabelEncoder()
df['priority_encoded'] = label_encoder.fit_transform(df['priority'])

In [32]:
#  Prepare Features
# Drop unused or non-numeric columns; use get_dummies for categoricals
drop_cols = ['Status', 'priority', 'priority_encoded']
df_encoded = pd.get_dummies(df.drop(columns=drop_cols))

X = df_encoded
y = df['priority_encoded']

In [33]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


In [34]:
# Train Random Forest Model
model = RandomForestClassifier(random_state=42)
model.fit(X_train, y_train)

In [35]:
# Predictions
y_pred = model.predict(X_test)

In [37]:
#  Evaluation
acc = accuracy_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred, average='macro')

print(f"\n✅ Accuracy: {acc:.2f}")
print(f"✅ F1 Score (macro): {f1:.2f}")
print("\n📊 Classification Report:")
print(classification_report(y_test, y_pred, target_names=label_encoder.classes_))


✅ Accuracy: 0.91
✅ F1 Score (macro): 0.79

📊 Classification Report:
              precision    recall  f1-score   support

        high       0.84      0.52      0.64       120
         low       0.92      0.98      0.95       685

    accuracy                           0.91       805
   macro avg       0.88      0.75      0.79       805
weighted avg       0.91      0.91      0.90       805



# Part 3: Ethical Reflection

Prompt: Your predictive model from Task 3 is deployed in a company. Discuss:

Potential biases in the dataset (e.g., underrepresented teams):

Under representation of minority groups: If your model uses attributes such as Race, groups like Black or Asian patients may be underrepresented, meaning the classifier might perform less accurately for them, leading to unfair priority predictions.

Survivor selection bias: Patients who survive longer are more likely included in datasets, skewing the outcome distribution (“Status” = Alive or Dead). This can distort learning and reduce predictive fairness for groups with different survival rates.


How fairness tools like IBM AI Fairness 360 could address these biases.

 Pre-processing Techniques
Reweighing: Assign higher weight to under represented race groups during training so the model treats minority cases as equally important, addressing dataset imbalance
Optimized preprocessing: Transform the input data to remove bias-inducing signal, while preserving predictive features.

In‑processing Techniques
Prejudice Remover regularizer: Embeds fairness constraints into the learning algorithm’s loss function to penalize biased outcomes systematically during training
Adversarial debiasing: Trains the model in tandem with a fairness adversary that detects and penalizes race‑based predictive differences.

Post‑processing Techniques
Equalized odds / calibrated equalized odds: Adjust model outputs to equalize false positive/false negative rates across demographic groups (e.g., race), ensuring balanced error rates

 Fairness Metrics and Monitoring
AIF360 provides 70+ fairness metrics—e.g. demographic parity difference, statistical parity difference, average odds difference—that help evaluate whether your predictive model treats groups fairly

It also supports continuous monitoring via dashboards (e.g., IBM WatsonX Governance) that detect performance drift or subgroup bias over time
Markaicode

 Ethical Reflection & Practice
When deploying your model in production, you’d first audit for baseline fairness: compare performance metrics (accuracy, recall, false‑positive/negative rates) across race, gender, or other protected attributes.

Then iterate: apply pre‑processing (reweighing or resampling), retrain using in‑processing methods (e.g. prejudice remover), or use post‑processing (equalized odds) to correct unfair outcomes.

Finally, maintain an ongoing fairness monitoring system that alerts stakeholders if metrics diverge for protected subgroups or drift over time