<a href="https://colab.research.google.com/github/reyhan-mf/reyhan-mf/blob/main/Editorial_Antigranular_Heart_Disease_Prediction_Contest_(ft_Harvard_OpenDP_and_TPDP).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 📊 **Antigranular** Heart Disease Prediction Contest (ft. **Harvard/OpenDP** and **TPDP**)

🎉 Welcome to a new [Antigranular](https://antigranular.com) contest in collaboration with the [TPDP Workshop](https://tpdp.journalprivacyconfidentiality.org/2024/) and [Harvard's OpenDP Community Meeting](https://opendp.org/)!

🩺 This time, we are focusing on [heart condition detection](https://en.wikipedia.org/wiki/Cardiovascular_disease) using our new [TensorFlow Privacy](https://github.com/tensorflow/privacy) and [Opacus (PyTorch)](https://opacus.ai/) models!

🦜 Any questions? Head over to our [Discord](https://discord.com/invite/KJwApgXs4s)!



## 🏃‍♂️ Getting Started

In this section we will download the antigranular package and login




### 📦 Install Antigranular

This command installs the [Antigranular PyPI Package](https://pypi.org/project/antigranular/) on the local enviroment.


In [1]:
# Install the Antigranular package
!pip install antigranular &> /dev/null

### ✍ Login to the Enclave

Head over to [Competitions](https://www.antigranular.com/competitions) to find your `<user_id>`, `<user_secret>` and the competition's name and copy that command here.

![img](https://docs.antigranular.com/shots/comp_cell.png)

In [2]:
import antigranular as ag
session = ag.login("fEdjt9m5vZoefm/ISSdxmlt8elAMURSZ", "NlPUzhwQsJo9v7HafhMCDY6CX62ippps9/LmZHbh3goMW8lQm8U6zrM2+D1yOxme", competition = "Heart Disease Prediction Hackathon")

Dataset "Heart Disease Prediction Hackathon Dataset" loaded to the kernel as [92mheart_disease_prediction_hackathon_dataset[0m
Key Name                       Value Type     
---------------------------------------------
train_y                        PrivateDataFrame
train_x                        PrivateDataFrame
test_x                         DataFrame      

Connected to Antigranular server session id: 3d2f0a93-3268-4c62-92d6-4906b3accad5, the session will time out if idle for 25 minutes
Cell magic '%%ag' registered successfully, use `%%ag` in a notebook cell to execute your python code on Antigranular private python server
🚀 Everything's set up and ready to roll!


### 🤖 Using AG

You can now simply use ``%%ag`` to run code on an enclave! You can always head over to our [Docs](https://docs.antigranular.com/) to learn more about AG, but for now, we can define train and test variables as follows.

In [3]:
%%ag
x_train = heart_disease_prediction_hackathon_dataset["train_x"]
y_train = heart_disease_prediction_hackathon_dataset["train_y"]
x_test = heart_disease_prediction_hackathon_dataset["test_x"]

### 🕵️‍♂️ Exploring data

Exploring data in Antigranular involves spending your epsilon budget, be mindful of your usage but remember that the less epsilon you use, the less accurate your results will get!

In [6]:
%%ag
x_train.info()

+----+----------+-------------+---------------+---------+------------+
|    | Column   | numerical   | categorical   | dtype   | bounds     |
|----+----------+-------------+---------------+---------+------------|
|  0 | age      | True        | False         | int64   | (21, 86)   |
|  1 | sex      | True        | False         | int64   | (0, 1)     |
|  2 | bp       | True        | False         | int64   | (80, 215)  |
|  3 | ch       | True        | False         | int64   | (102, 597) |
|  4 | bs       | True        | False         | int64   | (67, 157)  |
|  5 | phr      | True        | False         | int64   | (62, 222)  |
+----+----------+-------------+---------------+---------+------------+



In [11]:
%%ag
y_train.info()

+----+-----------+-------------+---------------+---------+----------+
|    | Column    | numerical   | categorical   | dtype   | bounds   |
|----+-----------+-------------+---------------+---------+----------|
|  0 | condition | True        | False         | int64   | (0, 1)   |
+----+-----------+-------------+---------------+---------+----------+



In [12]:
%%ag
# We can start by exploring the data, carefully using our epsilon
describe = x_train.describe(eps=0.1)
ag_print(describe)

               age          sex  ...           bs          phr
count  7695.000000  7752.000000  ...  7899.000000  8942.000000
mean     61.946775     0.622969  ...   104.820724   136.874356
std       8.565136     0.357713  ...    17.930347    17.454178
min      21.000000     0.000000  ...    67.000000    62.000000
25%      42.913354     0.420733  ...    80.603841   132.972221
50%      60.839196     0.951897  ...    97.487650   144.750982
75%      60.285386     0.970305  ...   131.634379   183.124354
max      58.928956     0.969281  ...   127.226633   189.291690

[8 rows x 6 columns]



In [13]:
%%ag
# We can start by exploring the data, carefully using our epsilon
describe = y_train.describe(eps=0.1)
ag_print(describe)

         condition
count  7971.000000
mean      0.536517
std       0.483828
min       0.000000
25%       0.006343
50%       0.302440
75%       0.995573
max       0.970229



In [14]:
%%ag
# x_test is a public test set, so we can print it without using epsilon
ag_print(x_test)

      age  sex   bp   ch   bs  phr
0      71    1  128  326   95  117
1      61    1  153  270   98  123
2      59    1  113  236  106  181
3      69    0  109  151  109  108
4      55    0  137  235  101  150
...   ...  ...  ...  ...  ...  ...
1995   60    1  128  261  112  143
1996   50    1  143  216   94  100
1997   64    1  120  172   87  142
1998   56    1  158  294   82  144
1999   69    0  117  559  112  157

[2000 rows x 6 columns]



0.5505

### 🎈 A quick solution

In this section we evaluate an editorial solution in AG using TensorFlow!

In [72]:
%%ag
import tensorflow as tf
from op_pandas import standard_scaler, PrivateDataFrame
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, BatchNormalization
from op_tensorflow import PrivateKerasModel, PrivateDataLoader

seqM = Sequential([
    Dense(32, activation='relu', input_shape=(6,)),
    BatchNormalization(),
    Dropout(0.2),
    Dense(64, activation='relu'),
    BatchNormalization(),
    Dropout(0.2),
    Dense(1, activation='sigmoid')  # Binary classification
])

# Create DP keras model
dp_model = PrivateKerasModel(model=seqM, l2_norm_clip=1.5, noise_multiplier=0.8)

# Use a learning rate scheduler
def lr_schedule(epoch):
    return 0.001 * (0.1 ** int(epoch / 10))

lr_scheduler = tf.keras.callbacks.LearningRateScheduler(lr_schedule)

optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)

# PrivateKerasModel uses similar API as standard Keras
dp_model.compile(
    optimizer=optimizer,
    loss='binary_crossentropy',
    metrics=["accuracy"]
)


In [73]:
%%ag
x_train_scaled = standard_scaler(x_train, eps=.1)
x_train_scaled.info()

+----+----------+-------------+---------------+---------+------------------------------------------+
|    | Column   | numerical   | categorical   | dtype   | bounds                                   |
|----+----------+-------------+---------------+---------+------------------------------------------|
|  0 | age      | True        | False         | float64 | (-5.036653324423086, 4.626091810823674)  |
|  1 | sex      | True        | False         | float64 | (-1.4570691893922267,                    |
|    |          |             |               |         | 0.6950163449540622)                      |
|  2 | bp       | True        | False         | float64 | (-2.3828568800839087, 3.704169925070209) |
|  3 | ch       | True        | False         | float64 | (-1.4030797485122417, 3.782606722815023) |
|  4 | bs       | True        | False         | float64 | (-1.9080174880954714, 2.763831218278033) |
|  5 | phr      | True        | False         | float64 | (-2.1770312705313524, 1.811025143

In [74]:
%%ag
data_loader = PrivateDataLoader(feature_df=x_train_scaled , label_df=y_train, batch_size=32)

In [81]:
%%ag
dp_model.fit(x=data_loader, epochs=4, target_delta=1e-5)

Epoch 1/4

250/250 - 4s - loss: 0.4305 - accuracy: 0.7982 - 4s/epoch - 17ms/step

Epoch 2/4

250/250 - 4s - loss: 0.4388 - accuracy: 0.7898 - 4s/epoch - 17ms/step

Epoch 3/4

250/250 - 4s - loss: 0.4134 - accuracy: 0.8072 - 4s/epoch - 17ms/step

Epoch 4/4

250/250 - 4s - loss: 0.4069 - accuracy: 0.8099 - 4s/epoch - 18ms/step

message: Error ID is - c8a2979a-6e72-4ba1-a50d-7c8bca0e7f83 - Error from upstream service: Client error '400 Bad Request' for url 'http://supervisor-prod-private.antigranular.com/privacyRequest'
For more information check: https://httpstatuses.com/400. Response text: {"detail":"Maximum delta or epsilon exceeded per query. Per query limits are eps : 1 and delta : 1"}

message: Error ID is - 414bca15-e037-432b-ab8d-9dda8be6c38d - Error from upstream service: Client error '400 Bad Request' for url 'http://supervisor-prod-private.antigranular.com/privacyRequest'
For more information check: https://httpstatuses.com/400. Response text: {"detail":"Maximum delta or epsilo

In [None]:
%%ag
dp_model.fit(x=data_loader, epochs=5, target_delta=1e-5)

Epoch 1/5

2000/2000 - 31s - loss: 0.5626 - accuracy: 0.7028 - 31s/epoch - 16ms/step

Epoch 2/5

2000/2000 - 29s - loss: 0.5405 - accuracy: 0.7219 - 29s/epoch - 14ms/step

Epoch 3/5

2000/2000 - 29s - loss: 0.5194 - accuracy: 0.7260 - 29s/epoch - 14ms/step

Epoch 4/5

2000/2000 - 28s - loss: 0.5004 - accuracy: 0.7398 - 28s/epoch - 14ms/step

Epoch 5/5

2000/2000 - 30s - loss: 0.4888 - accuracy: 0.7496 - 30s/epoch - 15ms/step



In [76]:
%%ag
x_test_scaler = standard_scaler(PrivateDataFrame(x_test), eps=.1)
y_pred = dp_model.predict(x_test_scaler, label_columns=["output"])

 1/63 [..............................] - ETA: 18s
 8/63 [==>...........................] - ETA: 0s 



In [78]:
%%ag
# Note that the predictions are a float scalar
# so we scale it
def f(x: float) -> float:
  if x > 0.5:
    return 1
  else:
    return 0

y_pred["output"] = y_pred["output"].map(f, output_bounds=(0, 1))

### 📝 Make your submission (Through AG)

Submit a prediction by simply typing `submit_predictions(your_prediction)` to find out how you rank on the leaderboard.

![img](https://www.antigranular.com/static/media/Step%209.8091828f3cff4324fe6d.png)


In [79]:
%%ag
result = submit_predictions(y_pred)

score: {'leaderboard': 0.6284638700752223, 'logs': {'BIN_ACC': 0.6506948203313454, 'LIN_EPS': -0.02223095025612314}}



### 🎈 Another quick solution

In this section we evaluate an editorial solution in AG using Diffprivlib!

In [24]:
%%ag
# We can follow by importing the OP ("Oblivious Private") version of diffprivlib
from op_diffprivlib.models import LogisticRegression

In [25]:
%%ag
# We can generate a quick regression using epsilon
reg = LogisticRegression(epsilon=0.1, data_norm=3)

# Fit it with 3 of the SNP predictors
reg.fit(x_train[["age", "sex", "bp", "ch", "bs", "phr"]], y_train)

  y = column_or_1d(y, warn=True)



In [26]:
%%ag
# Predict with the same features
y_pred = reg.predict(x_test[["age", "sex", "bp", "ch", "bs", "phr"]])

# Print the result
ag_print(y_pred)

[1 1 1 ... 1 1 1]






### 📩 Export your prediction

We take the prediciton out of AG to later send it to the leaderboards.

In [48]:
%%ag
# Submit prediction
result = submit_predictions((y_pred))

error: Last submission done under 5 minutes ago



### 🎉 That's it!

Congrats! You made your first submission to the competition! Now it's time to keep exploring the data and to try to achieve a better score! Here are the next steps:


1.  🏫 Head over to our [Docs](https://docs.antigranular.com/) and discover Opacus (PyTorch), TensorFlow Privacy and other libraries available in Antigranular!
2. 🦜 Any questions? Head over to our [Discord](https://discord.com/invite/KJwApgXs4s)!

We hope you have fun and enjoy the competition!

Best of luck,

Antigranular Team