### Steps for deploying ML models

We'll use the same model we trained and evaluated
previously - the churn prediction model. Now we'll
deploy it as a web service.

**SUMMARY**

* Save models with pickle
* Use `Flask` to turn the model into a web service (other: FastAPI)
* Use `pipenv` dependency & env manager (other: virtual env, conda, poetry)
* Package it in Docker 
* Deploy to the `AWS` cloud (other: GCP, Azure, Heroku, Python Anywhere) 


_Saving and loading the model_

* Saving the model to pickle
* Loading the model from pickle
* Turning our notebook into a Python script

_Web services: introduction to Flask_

* Writing a simple ping/pong app
* Querying it with `curl` and browser

_Serving the churn model with Flask_

* Wrapping the predict script into a Flask app
* Querying it with `requests` 
* Preparing for production: gunicorn
* Running it on Windows with waitress

_Python virtual environment: Pipenv_

* Dependency and environment management
* Why we need virtual environment
* Installing Pipenv
* Installing libraries with Pipenv
* Running things with Pipenv

_Environment management: Docker_

* Why we need Docker
* Running a Python image with docker
* Dockerfile
* Building a docker image
* Running a docker image

_Deployment to the cloud: AWS Elastic Beanstalk_

* Installing the eb cli
* Running eb locally
* Deploying the model

**Explore more**

* Flask is not the only framework for creating web services. Try others, e.g. FastAPI
* Experiment with other ways of managing environment, e.g. virtual env, conda, poetry.
* Explore other ways of deploying web services, e.g. GCP, Azure, Heroku, Python Anywhere, etc

## Most IMP Sections

### Deployment (Chapter 5)

- [Cross-industry standard process for data mining (CRISP-DM)](https://youtu.be/dCa3JvmJbr0?list=PL3MmuxUbc_hIhxl5Ji8t4O6lPAOpHaCLR) as simple data science pipeline
- [ROC AUC](https://youtu.be/hvIQPAwkVZo?list=PL3MmuxUbc_hIhxl5Ji8t4O6lPAOpHaCLR&t=419) probabilistic interpretation
- [K-fold model validation](https://youtu.be/BIIZaVtUbf4?list=PL3MmuxUbc_hIhxl5Ji8t4O6lPAOpHaCLR&t=82) with model AUC $\pm$ std dev
- [Intro to model deployement](https://youtu.be/agIFak9A3m8?list=PL3MmuxUbc_hIhxl5Ji8t4O6lPAOpHaCLR)

    <img src="2025-08-15_22-41.png" alt="alt text" width="400"/>

- [Web services and intro to Flask](https://youtu.be/W7ubna1Rfv8?list=PL3MmuxUbc_hIhxl5Ji8t4O6lPAOpHaCLR)

- [Serving the Churn Model with Flask](https://youtu.be/Q7ZWPgPnRz8?list=PL3MmuxUbc_hIhxl5Ji8t4O6lPAOpHaCLR)
    
    Some useful commands
    ```bash
    python predict.py
    python predict-test.py # w/ dev server warning
    gunicorn --bind 0.0.0.0:9696 predict:app # w/o warning 
    ```

- [Python Virtual Environment: Pipenv](https://youtu.be/BMXh8JGROHM?list=PL3MmuxUbc_hIhxl5Ji8t4O6lPAOpHaCLR)
    --> for handling python dependencies on same machine

    <img src="2025-08-16_18-29.png" alt="alt text" width="400"/>
    <img src="2025-08-16_18-31_1.png" alt="alt text" width="400"/>

    What are dev packages? A: Packages which we only need during the development phase and not in the production (deployment)

    Different virtual envs creation libs

    - virtual env (venv)
    - conda
    - **pipenv** (we use this)
    - poetry
    - ...

    Some useful commands (TIP: do not run the following from an active env)
    ```bash
    pip install pipenv
    pipenv install numpy pandas scikit-learn ... # create Pipfile and Pipfile.lock
    pipenv install # install env
    pipenv shell # activate env
    ```

    After activating the env make the inference using 
    
    `run gunicorn --bind 0.0.0.0:9696 predict:app`. 
    
    Activate the env only to make the inference in one shot using

    `pipenv run gunicorn --bind=0.0.0.0:9696 predict:app`

- [Environment Management: Docker](https://youtu.be/wAtyYZ6zvAs?list=PL3MmuxUbc_hIhxl5Ji8t4O6lPAOpHaCLR)

    <img src="2025-08-18_12-24.png" alt="alt text" width="400"/>
    <img src="2025-08-18_12-25.png" alt="alt text" width="400"/>

    Create a `Dockerfile` with these steps

    NOTE: Get a docker base image from - https://hub.docker.com/_/python e.g. `3.9.23-slim`

    **Dockerfile**

    ```docker
    # base image
    FROM python:3.10-slim

    # we need pipenv to install libs and create env
    RUN pip install pipenv

    # create and enter app dir
    WORKDIR /app

    # copy pipfiles in current dir (i.e. app dir)
    COPY ["Pipfile", "Pipfile.lock", "./"]

    # install all libs system wide as we don't want to create a
    # virtual env in a docker image as it is already isolated
    RUN pipenv install --system --deploy

    # copy model and predict files
    COPY ["model_C=1.0.bin", "predict.py", "./"]

    # exposing port for communication
    EXPOSE 9696

    # entry point for directly running the request
    ENTRYPOINT ["gunicorn",  "--bind 0.0.0.0:9696", "predict:app"]
    ```
    
    **Build a docker image**
    ```
    docker build -t zoomcamp-test .
    ```

    **Run a docker image in terminal**

    ```
    docker run -it --rm zoomcamp-test:latest # using default or dockerfile entrypoint     
    docker run -it --rm --entrypoint=bash zoomcamp-test:latest # explicitly asking for an entrypoint
    ```

    **PORT provides external access to the container**

    <img src="2025-08-18_17-56.png" alt="alt text" width="400"/>

    Now expose the port on container to host machine using 
    
    `-p <container_port>:<host_machine_port>`

    This allows to use `predict-test.py` on host machine which communicates with the container
    
    ```
    docker run -it --rm -p 9696:9696 zoomcamp-test-1:latest
    ```

- [Deployment To The Cloud: AWS Elastic Beanstalk](https://youtu.be/HGPJ4ekhcLg?list=PL3MmuxUbc_hIhxl5Ji8t4O6lPAOpHaCLR)

    <img src="2025-08-18_22-44.png" alt="alt text" width="400"/>

    _Install_ AWS Elastic Beanstalk `awsebcli` only as a _dev_ package to deploy our web service (not something we need to have inside the container). See line in Pipfile `[dev-packages]` and line `"develop": { ...` in Pipfile.lock.
    ```
    pipenv install awsebcli --dev
    ```
    
    Then _get_ into the env (`pipenv shell`) and init a `eb` env on cloud
    ```
    pipenv shell
    eb init -p docker -r eu-central-1 churn-serving
    ```
    Here `-p` is for platform which is docker and `-r` for region

    _Create_ and instance of this cloud (AWS) env
    ```
    eb create churn-service-env
    ```

    _Add host_ and change the url in `predict-test.py` to:
    ```python
    host = "churn-serving-env.eba-umxcsddh.eu-central-1.elasticbeanstalk.com"
    url = f'http://{host}/predict'
    ```

    As of now this EB env is public and anyone (any service can have access)

    _Terminate_ of AWS EB machine
    ```
    eb terminate churn-serving-env
    ```

### Serverless (Chapter 9)

<img src="2025-08-26_15-28.png" alt="alt text" width="400"/>

[**AWS Lambda**](https://youtu.be/_UX8-2WhHZo?list=PL3MmuxUbc_hIhxl5Ji8t4O6lPAOpHaCLR)

- No need of any AWS EC2 machine/server but directly use Lambda function
    
    <img src="2025-08-26_15-49.png" alt="alt text" width="400"/>

- You pay per request
    
    <img src="2025-08-26_15-51.png" alt="alt text" width="400"/>


- Why use TF-lite?
    - (Earlier) AWS Lambda limits ($\le$ 50 MB zip file). But now upto 10 GB using containiers
    - Large container image
        - pay more: for storing the image !
        - slow init: takes time to initialize the lambda function when we invoke it for the first time !
        - slow to import: tf takes long to be imported in our python script !

    - TF-lite only focuses on inference (not used for training models): `pip install --extra-index-url https://google-coral.github.io/py-repo/ tflite_runtime`

Now we follow these steps

- Save (or download) the model

- Create a python file for the model prediction and lambda handler -- call it e.g. `lambda_function.py`

- Create a Docker image 
    - using a base image from AWS ECR (https://gallery.ecr.aws/) - look for `python lambda`
    - `docker build -t clothing-model .` here `.` means that use the Docker image in this dir
    - try it our locally first
        - `docker run -it --rm -p 8080:8080 clothing-model:latest`
        - `python test.py`

- Publish the docker image on Amazon (Elastic Container Registry) ECR to be loaded later in AWS lambda
    - (if not already done) Install and configure AWS Command Line Interface (awscli)
        - `pip install awscli`
        - `aws configure`
    - create a repository for container images using AWS CLI
        - `aws ecr create-repository --repository-name cloth ing-tflite-images`
    - log into our container repository (or registry) as it is private using
        - `aws ecr get-login-password --region eu-central-1 | docker login --username AWS --password-stdin 364155067933.dkr.ecr.eu-central-1.amazonaws.com` 
    - make a **URI** for the image that we are going to push to our ECR _cloth ing-tflite-images_
        ```bash
        ACCOUNT=364155067933
        REGION=eu-central-1
        REGISTRY=dogs-vs-cats-tflite-images
        PREFIX=${ACCOUNT}.dkr.ecr.${REGION}.amazonaws.com/${REGISTRY}
        TAG=dogs-vs-cats-model-xception-v1-001
        REMOTE_URI=${PREFIX}:${TAG}
        ```
    - now **tag** and **push** (just like git commit and push) the _clothing-model:latest_ docker image previously created with the _URI_ we just created
        - `docker tag clothing-model:latest ${REMOTE_URI}`
        - `docker push ${REMOTE_URI}`       
    - now create a AWS lambda function and use the (browse to) `clothing-model-xception-v4-001` image from  `clothing-tflite-images` repository
    - test the model on AWS lambda (just like we tested it locally)

- [Exposing the Lambda Function](https://youtu.be/wyZ9aqQOXvs?list=PL3MmuxUbc_hIhxl5Ji8t4O6lPAOpHaCLR) as a web service using API Gateway

    _API Gateway helps to expose different AWS services as web services including lambda functions_
    - go to AWS and create an API - choose FAST API and create a resource e.g. `predict` with a method `POST` and a stage e.g. `test`
    - deploy (expose) this API - which creates a ULR for testing our gateway (web service) from anywhere
    - we can than use this AWS lambda gateway URL (_see invoke URL in method tab_) in `test.py` instead of our local machine URL to directly use our web service!
    - **NOTE**: this URL is by default public and anyone can send requests just like we do in `test.py`. So for work we should make it private or limited to people who should have access to it!

### [TF-Serving | Kubernetes (Chapter 10)](https://youtu.be/mvPER7YfTkw?list=PL3MmuxUbc_hIhxl5Ji8t4O6lPAOpHaCLR)

<img src="2025-08-28_15-56.png" alt="alt text" width="400"/>

_Note that we would make this service very efficient by splitting it into 2 services -- as we can use CPUs for gateway service and GPUs for TF-serving service separately!_

We move on to more than once services (with their respective Docker images) - use `docker-compose` and `kubrnetes` for communication between them!

TensorFlow Serving

#######
* The saved_model format 
* Running TF-Serving locally with Docker
* Invoking the model from Jupyter

#######

- To use TF-Serving we need to first convert tf model to a special format called _saved model_ (similar to what we did earlier using _tflite_).

    ```python
    import tensorflow as tf
    from tensorflow import keras

    model = keras.models.load_model("clothing-model-v4.h5")
    tf.saved_model.save(model, "clothing-model")
    ```

- We can now look inside this model using `saved_model_cli` which comes already with a tensorflow download.

    ```bash
    saved_model_cli show --dir clothing-model --all
    ```
    Look for signature, inputs and outputs in _signature-def['serving_default']_ and store in the names of the input and output (e.g. in model-description.txt)

- Now we can run tf-serving using this model 
    - Instead of creating our docker image (as before) we first spin up an official tf-serving docker image (locally) and mount our model on it 

        ```bash
        docker run -it --rm \                                                  
            -p 8500:8500 \
            -v "$(pwd)/clothing-model:/models/clothing-model/1" \
            -e MODEL_NAME="clothing-model" \
            tensorflow/serving:2.7.0
        ```

    Here -p if for port mapping HOST:REMOTE (container), -v for volume mounting for the model from HOST:REMOTE, and last line is image name

    - We now need install `grpc` client which is a special protocol which uses binary data format (faster than json) used to communicate with tf-serving
        ```
        pip install grpcio==1.42.0 tensorflow-serving-api==2.7.0
        ```

    - We then create the code that would go (as usual) in our inference script (like predict.py or lambda_function.py)

        ```python
        import grpc
        from tensorflow_serving.apis import predict_pb2
        from tensorflow_serving.apis import prediction_service_pb2_grpc

        host = 'localhost:8500'# port where our tf-service is currently running (see above docker run)

        # access this port (using insecure as tf-service will not be accessible from outside of Kubernetes)
        channel = grpc.insecure_channel(host)
        # tf-serving
        stub = prediction_service_pb2_grpc.PredictionServiceStub(channel)

        from keras_image_helper import create_preprocessor
        preprocessor = create_preprocessor('xception', target_size=(299, 299))
        url = 'http://bit.ly/mlbookcamp-pants'
        X = preprocessor.from_url(url)  
        # want to send this X to our prediction service which is currently running in tf-serving

        # now we want to make a proto buf request and make the prediction using the tf-serving which is currently running 
        def np_to_protobuf(data):
            return tf.make_tensor_proto(data, shape=data.shape)
        pb_request = predict_pb2.PredictRequest()
        pb_request.model_spec.name = 'clothing-model'
        pb_request.model_spec.signature_name = 'serving_default'
        pb_request.inputs['input_8'].CopyFrom(np_to_protobuf(X))
        pb_response = stub.Predict(pb_request, timeout=20.0)
        preds = pb_response.outputs['dense_7'].float_val
        ```
        As usual we turn the output into a dictionary as before.

    - So in summary 
        - this is how we communicate with a model deployed with tensorflow-serving
        - we build the request (a step we always do to create a service) containing - name of the model, signature and the input
        - we then prepare and send this proto buf request which we send over grpc to our instance of tf-serving where the model processes it and gives the predictions back!

Pre-processing service

#######

* Converting the notebook to a Python script
* Wrapping the script into a Flask app
* Creating the virtual env with Pipenv
* Getting rid of the tensorflow dependency

#######

- We have every thing in the above image except for a flask service. Lets call this service as gateway.py (as in the image) as opposed to previously called predict.py or lambda_function.py

    - convert the above code into functions - `prepare_request(X), prepare_response(pb_response), predict(url),` and add a \_\_main__ function:
        ```python
        if __name__ == '__main__':
        url = 'http://bit.ly/mlbookcamp-pants'
        response = predict(url)
        print(response)
        ```
    - test by executing the script (note that the docker still needs to be running): `python gateway.py`

    - now we turn this script into a Flask application (see chapter 5) by adding:
        ```python
        app = Flask('gateway')  # 'gateway' is the name of the app (service) -  can be anything

        @app.route('/predict', methods=['POST'])
        def predict_endpoint():
            data = request.get_json()
            url = data['url']
            result = predict(url)
            return jsonify(result)

        if __name__ == '__main__':
            app.run(debug=True, host='0.0.0.0', port=9696)

        ```

        Now running this script will invoke the Flask app to which we can post requests using a test.py script (as we did in chapter 9). 
        
        NOTE that this test only works when the docker image is still running and the flas app is still invoked! This allows us to send the `REQUEST` to the `GATEWAY` which then `PREPARES` the request and sends it to `TF_SERVING` which uses the `MODEL` to make the `PREDICTION` and sends it back to `GATEWAY` which then post-processes and sends it to us!

- Creating a pipenv env

    ```
    pipenv install grpcio==1.42.0 flask gunicorn keras-image-helper
    ```

- Remove dependence on tensorflow in gateway

    We have this code which depends on tf
    ```python
    def np_to_protobuf(data):
    return tf.make_tensor_proto(data, shape=data.shape)
    ```

    To avoid this we install `tensorflow-protobuf` (as tf cpu is also large)
    ```
    pipenv install tensorflow-protobuf
    ```

    And then use it (without tf) to change the function `np_to_protobuf` -- see proto.py. Test it by running the gateway (not using flask service) in pipenv shell (while the container running).    

Running everything locally with Docker-compose
* Preparing the images 
* Installing docker-compose 
* Running the service 
* Testing the service

- Previously we used an official tf-serving docker image and copied our model in it. Now we want to make a self contained docker image so that when we deploy it it has everything

    - we first create (build and run) a docker file for tf-serving service see `image-model.dockerfile`

        ```
        docker build -t zoomcamp-10-model:xception-v4-001 -f image-model.dockerfile .
        docker run -it --rm -p 8500:8500 zoomcamp-10-model:xception-v4-001
        ```
        
        As before test it by running gateway (w/o flask) in pipenv shell

    - then we create a docker file for gateway (flask service) -- see `image-gateway.dockerfile` [same as session 5]

        ```
        docker build -t zoomcamp-10-gateway:001 -f image-gateway.dockerfile .
        docker run -it --rm -p 9696:9696 zoomcamp-10-gateway:001
        ```

    - to test this services together -- if we run test.py it runs into an error as out gateway is not able to communicate with the tensorflow service!

        <img src="2025-09-16_13-11.png" alt="alt text" width="400"/>


- We now want to establish a communication between the two services -- precisely put them on same network for the ports to match

    - we use docker compose to do this

    ```
    sudo apt install -y docker-compose-plugin 
    ```

    - then we create the `docker-compose.yaml` file to include the two services and ports

        ```yaml
        version: "3.9"
        services:
        clothing-model:
            image: zoomcamp-10-model:xception-v4-001
        gateway:
            image: zoomcamp-10-gateway:001
            environment:
            - TF_SERVING_HOST=clothing-model:8500
            ports:
            - "9696:9696"
        ```

    - run docker compose (in the same dir as yaml file)
        ```
        docker compose up -d
        ```

    and test using test.py

    This way we only need to run it once instead of spinning up two containers as before.

    - stop it using `docker compose down`

This is how we use docker compose to run multiple connected services on the same machine!




Kubernetes

* [The anatomy of a Kubernetes cluster](https://youtu.be/UjVkpszDzgk?list=PL3MmuxUbc_hIhxl5Ji8t4O6lPAOpHaCLR)

**Kubernetes (K8s)** - open-source system for automating deployment, scaling and management of containerized applications. What it means --- it gives us a way to take docker image that we created locally and deploy to the cloud!

to be continued ...

### KServe (Chapter 11)