(C) Johnson

The Titanic survival prediction project exhibits a strong use of MLOps and DevOps principles, facilitated by Qwack. By automating the end-to-end workflow, from data collection to model deployment and monitoring, the project ensures efficient and reliable operation. The orchestration of MLOps through Qwack provides a streamlined approach to model development, testing, deployment, and maintenance, ensuring the model remains up-to-date and effective in production.

1.   **Problem Statement**: The goal of the Titanic survival prediction model is to build a machine learning model capable of predicting whether a passenger survived or not based on features such as age, sex, class, and other attributes available in the Titanic dataset. The main problem to solve is how to process this data, train a model on it, and deploy it in a way that allows for real-time predictions.
  -  **Data Preprocessing**: The Titanic dataset contains both categorical and numerical features, so preprocessing involves handling missing values, encoding categorical variables, and scaling numerical ones.
  The challenge is ensuring that data fed into the model during training matches the format of data used during prediction.
  -  **Model Training**: Machine learning algorithms such as Logistic Regression, Random Forest, or more advanced techniques are applied to the training data to predict survival.
  Model validation involves using metrics such as accuracy, precision, recall, and AUC to evaluate the model's effectiveness.
  -  **Model Deployment**: Once the model is trained, it needs to be deployed in a way that users can send data and get real-time predictions. This involves exposing the model as a REST API and integrating it with the front-end or other systems for consumption.

2.   **Assumptions:**
  - **Data Quality**: The assumption is that the data provided is sufficient and representative of real-world cases. However, handling missing values and noisy data is a key consideration.
  The model assumes that patterns within historical Titanic data (e.g., sex, age, class) are indicative of survival probabilities in future scenarios.
  - **Model Assumptions**: The assumption is that the machine learning model, trained on this dataset, can generalize well to new, unseen data.
  The model assumes that the features used (e.g., passenger class, age) will remain relevant for making predictions about survival chances in similar contexts (e.g., modern passengers).
  - **Deployment Assumptions**: The system assumes the prediction model will be accessed in real-time (via API), and responses should be quick and accurate.
  Integration with external systems (such as a website or mobile app) is assumed to be feasible via API calls.

3. **MLOps with Qwack:**
MLOps (Machine Learning Operations) is a discipline that combines Machine Learning (ML) with DevOps practices to streamline the development, deployment, and maintenance of machine learning models. In the Titanic survival prediction project, MLOps principles are applied as follows:

  -  **Model Versioning:** Using Qwack’s versioning system, different versions of the Titanic model can be managed, tracked, and deployed. This ensures that you can easily roll back to a previous version if any issues arise with the current model.
  The versioning ensures that the production system always runs a stable version of the model.
  - **Model Deployment:** Qwack helps in seamlessly deploying the trained model to a production environment, exposing it via an API, where data can be sent and predictions returned in real-time.
  Qwack handles aspects like model registry, ensuring that models are properly stored, tracked, and deployed.
  - **Monitoring and Feedback:** With Qwack’s integration into the MLOps pipeline, you can monitor the model’s performance continuously. This includes tracking accuracy, error rates, and any drift in predictions.
  As new data comes in, the model can be retrained periodically to adapt to changes in patterns (data drift).





############################### AIML Continous Integration /Delivery CICD ##################################

**A CICD Process with Qwak**

*   What? (...is Qwak Flow MLOps)

    - Qwak serves well as a CI/CD (Continuous Integration/Continuous Deployment) process and an AI Developer Platform by integrating MLOps principles into a streamlined workflow. It enables AI teams to develop, deploy, and manage machine learning systems efficiently while ensuring security, collaboration, and automation.

*   How? (...CICD works)
      - Continuous Integration:
          Automates testing and validation of machine learning models.
          Integrates with version control systems (e.g., Git) to automatically test and validate new code or model changes.
          Ensures the latest changes in data preprocessing, feature engineering, or model code don't break the pipeline.
      - Continuous Deployment:
          Automates deployment of machine learning models to production environments after successful validation.
          Models can be deployed as REST APIs, batch jobs, or streaming services.
          Changes are pushed to production seamlessly, reducing manual effort and deployment time.

*     Why? (...Choose Qwak Flow)
      - End-to-End Platform (e.g. From pre-processing, to deployment and monitoring)
      - Flexible Scalable and secure Deployment Options (e.g. Hybrid/Pte Cloud/On-Prem options, and managing network traffic spikes and deployment env)
      - Network safety compliance (e.g. ISO27001, Integrates well with IAM systems, support VPN,) and other Security features (e.g. Encryption, Access Control e.g. RBAC, API Authentication)
      - Collaboration (e.g. centralized project managment, multiple teammates to work on different component such as data pipeline synchronously)
      - Reproducibility (e.g. reproducible by tracking data, code, config, versioning)
      - Simplicity and popular with MNC (e.g. API doc, simple for any teammates with low entry, one of largest local bank uses it) API Doc: https://docs.qwak.com/docs/introduction
      - Flexible language support e.g. Python, Shell, Json, etc..


**#################################SETUP & INSTALLATION########################################**

** Setup and Installation with Qwak**
(Note the setup is done on Colab for replicability and push to git)

1.   Installing qwak-sdk for your own environment or container. Qwak-sdk py lib facilitates interaction with Qwak platform is use in MLOps process for  deployment, manage ML workflows project, making prediction (or batch inference)
2.   Qwak client is to facilitate connection to your Qwak account, such as authentication, which once authenticated with your registered API keys can then interact with project such as deploying models.
3.   Qwak-sdk with a verbose deploy is a process in MLOps that makes your model accessible, enabling real-time predictions or batch inference. Commonly can be published to production-ready environment, automating deployment pipeline in production environment.
4.   Qwak-inference is a module use to run inference task with deployed models, which is important for your team and others to use them for inference. A process of applying production data to a ML model/pipeline to generate output or predictions. As well as from this phase we can assess whether model is better than random guessing to be considered useful. Goto Top right click "TestModel"
5.   Qwak Analytics allows querying on various model artefacts and predictions, where it is store in the database
6.   Qwak Monitor is to facilitate the monitoring/feedback of model performance such as data distribution shift. More details and discussion can be found below
7.   Alternative setup with integration of various local libraries setup such as transformers, pandas, etc..




# Setup and Installation: #1 Setting Up




In [None]:
# !pip install qwak-sdk (not installing due to version misalignment in colab environment)
!pip install --upgrade qwak-sdk
!pip install qwak-inference[batch,feedback]
!pip install catboost
# restart session if required
# for checking of version use : !qwak --version

Collecting qwak-sdk
  Downloading qwak_sdk-0.5.85-py3-none-any.whl.metadata (3.0 kB)
Collecting cookiecutter (from qwak-sdk)
  Downloading cookiecutter-2.6.0-py3-none-any.whl.metadata (7.3 kB)
Collecting croniter==1.4.1 (from qwak-sdk)
  Downloading croniter-1.4.1-py2.py3-none-any.whl.metadata (24 kB)
Collecting python-json-logger>=2.0.2 (from qwak-sdk)
  Downloading python_json_logger-2.0.7-py3-none-any.whl.metadata (6.5 kB)
Collecting qwak-core==0.4.111 (from qwak-sdk)
  Downloading qwak_core-0.4.111-py3-none-any.whl.metadata (2.3 kB)
Collecting qwak-inference<0.2.0,>=0.1.18 (from qwak-sdk)
  Downloading qwak_inference-0.1.18-py3-none-any.whl.metadata (2.2 kB)
Collecting yaspin>=2.0.0 (from qwak-sdk)
  Downloading yaspin-3.1.0-py3-none-any.whl.metadata (14 kB)
Collecting chevron==0.14.0 (from qwak-core==0.4.111->qwak-sdk)
  Downloading chevron-0.14.0-py3-none-any.whl.metadata (4.9 kB)
Collecting dacite==1.8.1 (from qwak-core==0.4.111->qwak-sdk)
  Downloading dacite-1.8.1-py3-none-any

Collecting boto3<2.0.0,>=1.24.89 (from qwak-inference[batch,feedback])
  Downloading boto3-1.35.71-py3-none-any.whl.metadata (6.7 kB)
Collecting pyarrow<11.0.0,>=6.0.0 (from qwak-inference[batch,feedback])
  Downloading pyarrow-10.0.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.0 kB)
Collecting botocore<1.36.0,>=1.35.71 (from boto3<2.0.0,>=1.24.89->qwak-inference[batch,feedback])
  Downloading botocore-1.35.71-py3-none-any.whl.metadata (5.7 kB)
Collecting jmespath<2.0.0,>=0.7.1 (from boto3<2.0.0,>=1.24.89->qwak-inference[batch,feedback])
  Downloading jmespath-1.0.1-py3-none-any.whl.metadata (7.6 kB)
Collecting s3transfer<0.11.0,>=0.10.0 (from boto3<2.0.0,>=1.24.89->qwak-inference[batch,feedback])
  Downloading s3transfer-0.10.4-py3-none-any.whl.metadata (1.7 kB)
Downloading boto3-1.35.71-py3-none-any.whl (139 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m139.2/139.2 kB[0m [31m7.5 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading pyarrow-1

In [98]:
import zipfile
# make sure your colab have default mount is /content
# Path to your uploaded zip file
zip_path = '/content/titanic.zip'

# Extract the zip file to a folder
extract_dir = '/content/titanic'
with zipfile.ZipFile(zip_path, 'r') as zip_ref:
    zip_ref.extractall(extract_dir)

print("Extraction complete.")


Extraction complete.


In [99]:
import sys

# Add the parent directory of `main` to sys.path
sys.path.append('/content/titanic')

# Confirm it was added
print(sys.path)


['/content', '/env/python', '/usr/lib/python310.zip', '/usr/lib/python3.10', '/usr/lib/python3.10/lib-dynload', '', '/usr/local/lib/python3.10/dist-packages', '/usr/lib/python3/dist-packages', '/usr/local/lib/python3.10/dist-packages/IPython/extensions', '/usr/local/lib/python3.10/dist-packages/setuptools/_vendor', '/root/.ipython', '/content/titanic']


# Setup and Installation: (#2) Accessing your Qwak Flow Profile


In [None]:
#obtain your registered api key from: https://docs.qwak.com/docs/installing-the-qwak-sdk
from qwak import QwakClient

client = QwakClient()

#this is a demo api key, will removed by next week
token = client.get_token('eu-b72c9863dad642bfb79ed4bb282f8733@-6y>8tl2anfy<^O7#IimlRvw}<dbN9No')

!qwak configure

Please enter your API key: eu-b72c9863dad642bfb79ed4bb282f8733@-6y>8tl2anfy<^O7#IimlRvw}<dbN9No
User successfully configured for the 'default' environment
E0000 00:00:1733033747.927942   11581 init.cc:229] grpc_wait_for_shutdown_with_timeout() timed out.


# Setup and Installation: (#3) Custom Batch or RT Deployment


# You may build your own custom model from scratch or use the default settings from Qwak Dashboard and still able to interact here

In [96]:
# building model
import numpy as np
import pandas as pd
import qwak
# Important to call run_local when using the Build SDK
from qwak.model.tools import run_local
from catboost import CatBoostClassifier, Pool, cv
from catboost.datasets import titanic
from qwak.model.base import QwakModel
from sklearn.model_selection import train_test_split


class TitanicSurvivalPrediction(QwakModel):
    def __init__(self):
        self.model = CatBoostClassifier(
            iterations=1000,
            custom_loss=["Accuracy"],
            loss_function="Logloss",
            learning_rate=None,
        )

    def build(self):
        titanic_train, _ = titanic()
        titanic_train.fillna(-999, inplace=True)

        x = titanic_train.drop(["Survived", "PassengerId"], axis=1)
        y = titanic_train.Survived

        x_train, x_test, y_train, y_test = train_test_split(
            x, y, train_size=0.85, random_state=42
        )

        # mark categorical features
        cate_features_index = np.where(x_train.dtypes != float)[0]

        self.model.fit(
            x_train,
            y_train,
            cat_features=cate_features_index,
            eval_set=(x_test, y_test),
        )

        # Cross validating the model (5-fold)
        cv_data = cv(
            Pool(x, y, cat_features=cate_features_index),
            self.model.get_params(),
            fold_count=5,
        )

    @qwak.api()
    def predict(self, df: pd.DataFrame) -> pd.DataFrame:
        df = df.drop(["PassengerId"], axis=1)
        return pd.DataFrame(
            self.model.predict_proba(df[self.model.feature_names_])[:, 1],
            columns=['Survived_Probability']
        )

In [100]:
#Training the model - for more explaination checkout their model build sdk
from titanic.main import TitanicSurvivalPrediction

# Create a new model instance
qwak_model_instance = TitanicSurvivalPrediction()

# Run the build function which trains the model
qwak_model_instance.build()

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
24:	learn: 0.4647988	test: 0.4989451	best: 0.4989451 (24)	total: 145ms	remaining: 5.67s
25:	learn: 0.4625238	test: 0.4966815	best: 0.4966815 (25)	total: 148ms	remaining: 5.55s
26:	learn: 0.4596816	test: 0.4946564	best: 0.4946564 (26)	total: 152ms	remaining: 5.47s
27:	learn: 0.4558779	test: 0.4920113	best: 0.4920113 (27)	total: 158ms	remaining: 5.47s
28:	learn: 0.4544708	test: 0.4906978	best: 0.4906978 (28)	total: 160ms	remaining: 5.35s
29:	learn: 0.4497458	test: 0.4878697	best: 0.4878697 (29)	total: 166ms	remaining: 5.37s
30:	learn: 0.4476795	test: 0.4859160	best: 0.4859160 (30)	total: 170ms	remaining: 5.31s
31:	learn: 0.4446441	test: 0.4836616	best: 0.4836616 (31)	total: 176ms	remaining: 5.33s
32:	learn: 0.4407296	test: 0.4807482	best: 0.4807482 (32)	total: 185ms	remaining: 5.41s
33:	learn: 0.4386911	test: 0.4798837	best: 0.4798837 (33)	total: 193ms	remaining: 5.48s
34:	learn: 0.4357048	test: 0.4775072	best: 0.4775072 (3

In [None]:
# satisfied to deploy
!qwak models deploy realtime --model-id 'titanic_survival_prediction_89f2f5' --build-id '9739d512-3ec3-49dd-b182-0aa897cbb476' --pods 2  --instance small  --timeout 3000  --server-workers 4  --variation-name default  --daemon-mode

╒═══════════════╤══════════════════════════════════════╕
│ Environment   │ sapl                                 │
├───────────────┼──────────────────────────────────────┤
│ Model ID      │ titanic_survival_prediction_89f2f5   │
├───────────────┼──────────────────────────────────────┤
│ Build ID      │ 9739d512-3ec3-49dd-b182-0aa897cbb476 │
├───────────────┼──────────────────────────────────────┤
│ Deployment ID │ 2c233b26-5175-41b6-899d-94ce4c24b41e │
╘═══════════════╧══════════════════════════════════════╛
Deployment initiated successfully, Use --sync to wait for deployments to be ready.
E0000 00:00:1733033767.784637   11700 init.cc:229] grpc_wait_for_shutdown_with_timeout() timed out.


# Setup and Installation: (#4) Model Inference

In [54]:
from qwak_inference import RealTimeClient

feature_vector = [
{
		"Age" : 0,
		"Fare" : 0.0,
		"Sex" : "",
		"Cabin" : "",
		"SibSp" : 0,
		"Ticket" : "",
		"Pclass" : 0,
		"Name" : "",
		"PassengerId" : 0,
		"Parch" : 0,
		"Embarked" : ""
	}
]

client = RealTimeClient(model_id="titanic_survival_prediction_89f2f5")
client.predict(feature_vector)

[{'Survived_Probability': 0.5164705548}]

# Setup and Installation: (#5) Model Analytics


In [55]:
# Descriptive of Model
!qwak models describe --model-id 'titanic_survival_prediction_89f2f5'


Model id: titanic_survival_prediction_89f2f5
Display name: Titanic Survival Prediction 89f2f5
Description: An example Titanic Survival Prediction model
Creation Date: Saturday, November 30, 2024 06:08:48
Last update: Saturday, November 30, 2024 06:08:48
E0000 00:00:1733040272.643313   37885 init.cc:229] grpc_wait_for_shutdown_with_timeout() timed out.


In [None]:
# Descriptive of various model from the model database, if any - in this case only 1 trained model (due to free limited resources)
from qwak import QwakClient

client = QwakClient()
df = client.run_analytics_query("select * from titanic_survival_prediction_89f2f5")
print(df)

                     timestamp                            partner_id  \
0  2024-12-01T06:34:58.779602Z  0000f88b-7cc6-428a-bd8d-d8bdca36b0e4   

     user_id                              build_id variation_name  \
0  qwak_user  9739d512-3ec3-49dd-b182-0aa897cbb476        default   

                             model_id  batch_job_id  fn_name sdk_version  \
0  titanic_survival_prediction_89f2f5           NaN  predict     0.0.137   

                             time  latency  feature_extraction_latency  \
0  2024-12-01 06:34:53.510000 UTC      NaN                         NaN   

                                     message                     error  \
0  'NoneType' object has no attribute 'drop'  <class 'AttributeError'>   

                                         stack_trace        date  
0  Traceback (most recent call last):\n  File "/o...  2024-12-01  


# Setup and Installation: (#6) Monitoring/Feedback Model

# Monitoring / Feedback of Model
Currently limited type of distribution shift is supported

1.   Data Drift with KL Divergence - measure distance between two data distribution (trained/modelled vs production data) and alert you if exceed your custom threshold
2.   Data Quality with Null Percentage - measure null and alert you if exceed your custom threshold


    . Setting up of Alert can be via PageDuty

    . Qwak dashboard => Integration of alert => PagerDuty

    . Register with PagerDuty Jfrog Artifactory Notification
    

-       Note: Fyi the other alert options supported in Qwak is possible such as slack however some Jfrog Artifact integration issues at both vendor side
-      For more detailed focus check out their cloud Dashboard UI from the app.qwak dashboard



In [None]:
#this will not be required, fyi the api key will only be available till next week
os.environ['QWAK_API_KEY'] = 'eu-b72c9863dad642bfb79ed4bb282f8733@-6y>8tl2anfy<^O7#IimlRvw}<dbN9No'

In [52]:
from qwak.model.experiment_tracking import log_param

log_param({"test": "value"})

In [94]:
# as this process required registration and streaming data will skip this step but an idea is e.g. sampling

#do a couple of sampling from the new distribution say with some sample feature values from
#syntax is to send your actual data over to the server (can be private cloud / on-prem intranet), where model is hosted, note the token is my own token valid till next week
!curl --location --request POST 'https://models.sapl.qwak.ai/v1/titanic_survival_prediction_89f2f5/predict' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer 'eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCIsImtpZCI6IlJuZjAySWIzSEJzTVFUWWJjblNwZSJ9.eyJodHRwczovL2F1dGgtdG9rZW4ucXdhay5haS9xd2FrLWVudmlyb25tbmV0LWlkIjpbeyJidWNrZXQiOiJxd2Frc3RhY2tjbG91ZC1sYWtlZGJkZGQ2ZjYtOTUxdXdzbjhlOGNtIiwibW9kZWxzX2FwaSI6Im1vZGVscy5zYXBsLnF3YWsuYWkiLCJpZCI6IjAwMDBmODhiLTdjYzYtNDI4YS1iZDhkLWQ4YmRjYTM2YjBlNCIsImdycGNfYXBpIjoiZ3JwYy5zYXBsLnF3YWsuYWkiLCJhbGlhcyI6InNhcGwiLCJkZWZhdWx0Ijp0cnVlLCJicmFuZF9sb2dvX2ltYWdlX3BhdGgiOiIifV0sImh0dHBzOi8vYXV0aC10b2tlbi5xd2FrLmFpL3F3YWstcGFydG5lci1pZCI6IjAwMDBmODhiLTdjYzYtNDI4YS1iZDhkLWQ4YmRjYTM2YjBlNCIsImh0dHBzOi8vYXV0aC10b2tlbi5xd2FrLmFpL3F3YWstdXNlci1pZCI6InN1dGRzaGFyZWRyaXZlQGdtYWlsLmNvbSIsImh0dHBzOi8vYXV0aC10b2tlbi5xd2FrLmFpL2dycGMtYXBpIjoiZ3JwYy5zYXBsLnF3YWsuYWkiLCJodHRwczovL2F1dGgtdG9rZW4ucXdhay5haS9tb2RlbHMtYXBpIjoibW9kZWxzLnNhcGwucXdhay5haSIsImh0dHBzOi8vYXV0aC10b2tlbi5xd2FrLmFpL3F3YWstYnVja2V0IjoicXdha3N0YWNrY2xvdWQtbGFrZWRiZGRkNmY2LTk1MXV3c244ZThjbSIsImh0dHBzOi8vYXV0aC10b2tlbi5xd2FrLmFpL2J1Y2tldCI6InF3YWtzdGFja2Nsb3VkLWxha2VkYmRkZDZmNi05NTF1d3NuOGU4Y20iLCJodHRwczovL2F1dGgtdG9rZW4ucXdhay5haS9iaS1hcGkta2V5IjoiIiwiaHR0cHM6Ly9hdXRoLXRva2VuLnF3YWsuYWkvcXdha19hY2NvdW50cyI6eyIwMDAwNzM0My02NWE4LTRmMjEtODkzNS1iMzk1NmU1OWFmNjUiOnsiaWQiOiIwMDAwNzM0My02NWE4LTRmMjEtODkzNS1iMzk1NmU1OWFmNjUiLCJuYW1lIjoic2FwbCIsImRlZmF1bHRfZW52aXJvbm1lbnRfaWQiOiIwMDAwZjg4Yi03Y2M2LTQyOGEtYmQ4ZC1kOGJkY2EzNmIwZTQiLCJxd2FrX2Vudmlyb25tZW50cyI6eyIwMDAwZjg4Yi03Y2M2LTQyOGEtYmQ4ZC1kOGJkY2EzNmIwZTQiOnsiaG9zdHMiOnsiYWkiOnsicXdhayI6eyJjbG91ZCI6eyJwcml2YXRlbGluayI6eyJncnBjIjp7fX19LCJzYXBsIjp7Im1vZGVscyI6e30sImdycGMiOnt9fX19fSwiYWxpYXMiOiJzYXBsIiwiaWQiOiIwMDAwZjg4Yi03Y2M2LTQyOGEtYmQ4ZC1kOGJkY2EzNmIwZTQifX19fSwiaHR0cHM6Ly9hdXRoLXRva2VuLnF3YWsuYWkvcXdha191c2VyX2tpbmQiOiJTSU1QTEVfVVNFUiIsImlzcyI6Imh0dHBzOi8vYXV0aC5xd2FrLmFpLyIsInN1YiI6Imdvb2dsZS1vYXV0aDJ8MTAzODg4MTc3MDAzNjg1MjY0MDkxIiwiYXVkIjpbImh0dHBzOi8vYXV0aC10b2tlbi5xd2FrLmFpLyIsImh0dHBzOi8vZGV2LXF3YWsudXMuYXV0aDAuY29tL3VzZXJpbmZvIl0sImlhdCI6MTczMzAzNjMxNiwiZXhwIjoxNzMzMTIyNzE2LCJzY29wZSI6Im9wZW5pZCBwcm9maWxlIGVtYWlsIG9mZmxpbmVfYWNjZXNzIiwiYXpwIjoidERScGV1SlFmeXZKRzZNMGVVWDJIancyUlhlWjFlSEsifQ.RalJP5-1am3hr0c1fx5B5cQvNogsioVEBarXwrbxUoaFrIPw4h9dRM9nV4AgGEuh30Gy5StO5Td2UuNfZiFakFIoawVlmGbxmzKYu5BwNINepp2i2RBFTLTo8jQYpihSffwSWJcrtUPvcxQvBnEToZIMd6AdYSCfslbkwV63y3ysFWBLRfd9DsH8UPuKWFXGYWoxECoIKdzaebCq0QQZYHWgyb45aUEoTnhDHbQ-QGw2YczJtAY_zZJyUQ7RX1jaFo0O-Y2PlEI64TmKWqOoRhYkTtsweBops30ujLhwwQIDBIziptVLfNIGtNzfwJsxraDERvshk8JE1Z7b72QUWg'' \
--data '{"columns":["Sex","SibSp","Embarked","Cabin","Ticket","PassengerId","Parch","Fare","Name","Pclass","Age"],"index":[0],"data":[["male",0,"S","","",0,0,7.25,"John",3,22]]}'

[{"Survived_Probability":0.1842458654}]

In [91]:
# fyr:
# "columns":["Sex","SibSp","Embarked","Cabin","Ticket","PassengerId","Parch","Fare","Name","Pclass","Age"],
# "data":[["female",0,"S","","",0,0,7.25,"Johnny",3,22]]

!curl --location --request POST 'https://models.sapl.qwak.ai/v1/titanic_survival_prediction_89f2f5/predict' --header 'Content-Type: application/json' --header 'Authorization: Bearer 'eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCIsImtpZCI6IlJuZjAySWIzSEJzTVFUWWJjblNwZSJ9.eyJodHRwczovL2F1dGgtdG9rZW4ucXdhay5haS9xd2FrLWVudmlyb25tbmV0LWlkIjpbeyJidWNrZXQiOiJxd2Frc3RhY2tjbG91ZC1sYWtlZGJkZGQ2ZjYtOTUxdXdzbjhlOGNtIiwibW9kZWxzX2FwaSI6Im1vZGVscy5zYXBsLnF3YWsuYWkiLCJpZCI6IjAwMDBmODhiLTdjYzYtNDI4YS1iZDhkLWQ4YmRjYTM2YjBlNCIsImdycGNfYXBpIjoiZ3JwYy5zYXBsLnF3YWsuYWkiLCJhbGlhcyI6InNhcGwiLCJkZWZhdWx0Ijp0cnVlLCJicmFuZF9sb2dvX2ltYWdlX3BhdGgiOiIifV0sImh0dHBzOi8vYXV0aC10b2tlbi5xd2FrLmFpL3F3YWstcGFydG5lci1pZCI6IjAwMDBmODhiLTdjYzYtNDI4YS1iZDhkLWQ4YmRjYTM2YjBlNCIsImh0dHBzOi8vYXV0aC10b2tlbi5xd2FrLmFpL3F3YWstdXNlci1pZCI6InN1dGRzaGFyZWRyaXZlQGdtYWlsLmNvbSIsImh0dHBzOi8vYXV0aC10b2tlbi5xd2FrLmFpL2dycGMtYXBpIjoiZ3JwYy5zYXBsLnF3YWsuYWkiLCJodHRwczovL2F1dGgtdG9rZW4ucXdhay5haS9tb2RlbHMtYXBpIjoibW9kZWxzLnNhcGwucXdhay5haSIsImh0dHBzOi8vYXV0aC10b2tlbi5xd2FrLmFpL3F3YWstYnVja2V0IjoicXdha3N0YWNrY2xvdWQtbGFrZWRiZGRkNmY2LTk1MXV3c244ZThjbSIsImh0dHBzOi8vYXV0aC10b2tlbi5xd2FrLmFpL2J1Y2tldCI6InF3YWtzdGFja2Nsb3VkLWxha2VkYmRkZDZmNi05NTF1d3NuOGU4Y20iLCJodHRwczovL2F1dGgtdG9rZW4ucXdhay5haS9iaS1hcGkta2V5IjoiIiwiaHR0cHM6Ly9hdXRoLXRva2VuLnF3YWsuYWkvcXdha19hY2NvdW50cyI6eyIwMDAwNzM0My02NWE4LTRmMjEtODkzNS1iMzk1NmU1OWFmNjUiOnsiaWQiOiIwMDAwNzM0My02NWE4LTRmMjEtODkzNS1iMzk1NmU1OWFmNjUiLCJuYW1lIjoic2FwbCIsImRlZmF1bHRfZW52aXJvbm1lbnRfaWQiOiIwMDAwZjg4Yi03Y2M2LTQyOGEtYmQ4ZC1kOGJkY2EzNmIwZTQiLCJxd2FrX2Vudmlyb25tZW50cyI6eyIwMDAwZjg4Yi03Y2M2LTQyOGEtYmQ4ZC1kOGJkY2EzNmIwZTQiOnsiaG9zdHMiOnsiYWkiOnsicXdhayI6eyJjbG91ZCI6eyJwcml2YXRlbGluayI6eyJncnBjIjp7fX19LCJzYXBsIjp7Im1vZGVscyI6e30sImdycGMiOnt9fX19fSwiYWxpYXMiOiJzYXBsIiwiaWQiOiIwMDAwZjg4Yi03Y2M2LTQyOGEtYmQ4ZC1kOGJkY2EzNmIwZTQifX19fSwiaHR0cHM6Ly9hdXRoLXRva2VuLnF3YWsuYWkvcXdha191c2VyX2tpbmQiOiJTSU1QTEVfVVNFUiIsImlzcyI6Imh0dHBzOi8vYXV0aC5xd2FrLmFpLyIsInN1YiI6Imdvb2dsZS1vYXV0aDJ8MTAzODg4MTc3MDAzNjg1MjY0MDkxIiwiYXVkIjpbImh0dHBzOi8vYXV0aC10b2tlbi5xd2FrLmFpLyIsImh0dHBzOi8vZGV2LXF3YWsudXMuYXV0aDAuY29tL3VzZXJpbmZvIl0sImlhdCI6MTczMzAzNjMxNiwiZXhwIjoxNzMzMTIyNzE2LCJzY29wZSI6Im9wZW5pZCBwcm9maWxlIGVtYWlsIG9mZmxpbmVfYWNjZXNzIiwiYXpwIjoidERScGV1SlFmeXZKRzZNMGVVWDJIancyUlhlWjFlSEsifQ.RalJP5-1am3hr0c1fx5B5cQvNogsioVEBarXwrbxUoaFrIPw4h9dRM9nV4AgGEuh30Gy5StO5Td2UuNfZiFakFIoawVlmGbxmzKYu5BwNINepp2i2RBFTLTo8jQYpihSffwSWJcrtUPvcxQvBnEToZIMd6AdYSCfslbkwV63y3ysFWBLRfd9DsH8UPuKWFXGYWoxECoIKdzaebCq0QQZYHWgyb45aUEoTnhDHbQ-QGw2YczJtAY_zZJyUQ7RX1jaFo0O-Y2PlEI64TmKWqOoRhYkTtsweBops30ujLhwwQIDBIziptVLfNIGtNzfwJsxraDERvshk8JE1Z7b72QUWg'' --data '{"columns":["Sex","SibSp","Embarked","Cabin","Ticket","PassengerId","Parch","Fare","Name","Pclass","Age"],"index":[0],"data":[["female",0,"S","","",0,0,7.25,"Johnny",3,22]]}'

[{"Survived_Probability":0.7906566761}]

# After threshold triggered e.g. data distribution shift X times (assuming you've setup monitoring alert from one of vendor e.g. PagerDuty and syncing to/from the Qwak dashboard)
-         Your determination / decision on how to address data/model quality issue is required such as rebuilding models after addressing data distribution shift e.g.:
            - !qwak models deploy realtime --model-id '<new model>' --build-id '<new-build-id>' --pods 2 --instance small --timeout 3000

-          You could also retire the poorly performed or the 'no-longer relevant' model and/or start another relevant model, inclusive of addressing pertinent issue such as data quality yielding quality state to provision a better training or specific model that balance between bias/variance and other plausible issues

# Setup and Installation: (#7) Other plausible use cases with paid resources



# **An alternative idea/usecase with internal KMS Chatbot trained on SLM (e.g. Flan T5)**

(Note: feature not available without paid GPU resources and allocation of resources for further training and fine-tuning)



In [None]:
!qwak models create "Pre Trained Model" --project "Pre-trained models"
#refer to pre-built models on docs.qwak if you have a cloud account

Project with name Pre-trained models doesn't exist. Creating it.
Model created
model id : pre_trained_model
E0000 00:00:1733037993.274895   28707 init.cc:229] grpc_wait_for_shutdown_with_timeout() timed out.


In [None]:
# Import required libraries
# pandas dataframe is to deal with structured dataset
# torch is a DL framework for building your Dnn
# transformers for leveraging on pre-trained models e.g. Flan T5 to fine-tuned to your use case before deploying
import pandas as pd
import qwak
import torch
from qwak.model.adapters.output_adapters.qwak_with_default_fallback import AutodetectOutputAdapter
from qwak.model.base import QwakModel
from qwak.model.schema import ExplicitFeature, ModelSchema
from transformers import T5Tokenizer, T5ForConditionalGeneration


# Define the FLANT5Model class inheriting from QwakModel
# assuming we've got the paid resources to fine-tuned and deploy, which in this case we dont
class FLANT5Model(QwakModel):

    # Initialize model parameters
    def __init__(self):
        self.model_id = "google/flan-t5-small"
        self.max_new_tokens = 50
        self.model = None
        self.tokenizer = None

    # Log model metrics (for demonstration)
    def build(self):
        qwak.log_metric({"val_accuracy": 1})

    # Define the input schema for the model
    def schema(self):
        model_schema = ModelSchema(
            inputs=[
                ExplicitFeature(name="prompt", type=str),
            ])
        return model_schema

    # Load the pre-trained FLAN-T5 model
    def initialize_model(self):
        self.tokenizer = T5Tokenizer.from_pretrained(self.model_id, legacy=False)
        self.model = T5ForConditionalGeneration.from_pretrained(self.model_id)

        self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
        print(f"Inference using device: {self.device}")

    # Generate text based on the input prompt
    # @qwak.api(output_adapter=AutodetectOutputAdapter())
    def predict(self, df):
        input_text = list(df['prompt'].values)
        input_ids = self.tokenizer(input_text, return_tensors="pt").input_ids.to(self.device)

        # Generate text
        with torch.no_grad():
            gen_params = {
                "max_length": 50,
                "top_k": 50
            }
            output_ids = self.model.generate(input_ids, **gen_params)

        # Decode the generated text
        decoded_outputs = self.tokenizer.decode(output_ids[0], skip_special_tokens=True)

        return pd.DataFrame([
            {
                "generated_text": decoded_outputs
            }
        ])


# if there is a gpu option
# this model-id is a replica from my own resources
# !qwak models deploy realtime --model-id 'FLAN-T5 4698dc90' --build-id '606c5769-6909-4af8-8d6c-64cf2addaf05' --pods 2  --instance small  --timeout 3000  --server-workers 4  --variation-name default  --daemon-mode false

#or deply yr model after build is complete with
# !qwak models build --model-id pre_trained_model . --deploy


# Or an example with classic ML and SKLearn Package

In [None]:
from qwak import QwakModel
from sklearn import svm, datasets
from sklearn.metrics import f1_score
from sklearn.model_selection import train_test_split
from qwak.model.experiment_tracking import log_metric


class IrisClassifier(QwakModel):

    def __init__(self):
        self._gamma = 'scale'

    def build(self):
        # Load training data
        iris = datasets.load_iris()
        X, y = iris.data, iris.target
        X_train, X_test, y_train, y_test = train_test_split(X, y)

        # Train model
        clf = svm.SVC(gamma=self._gamma)
        self.model = clf.fit(X_train, y_train)

        # Store model metrics
        y_predicted = self.model.predict(X_test)
        f1 = f1_score(y_test, y_predicted)

        # Log metrics to Qwak
        log_metrics({"f1": f1})

    def predict(self, df):
        return self.model.predict(df)