# gRPC Requests with Text Generation Inference Server

Change that following variable settings match your deployed model's *Inference endpoint*. for example: 

```
infer_endpoint = "https://flan-t5-small-predictor-userx-workshop.apps.clusterx.sandboxx.opentlc.com"
```


In [None]:
model_id = "flan-t5-small"
grpc_port = 433
infer_endpoint = "https://tgis-ai-example-single-model-serving.apps.cluster-4jksb.dynamic.redhatworkshops.io"

The UI doesn't currently show the gRPC endpoint, but we can derive the hostname from the `inference endpoint` 

In [None]:
import re

hostname = re.sub("https://|http://", "", infer_endpoint)
if hostname[-1] == "/":
    hostname = hostname[:-1]

hostname

### gRPC

gRPC is a high-performance RPC framework developed by Google, which can be used for inference in machine learning (ML) models. Here’s an overview of using gRPC for inference:

*Advantages*  
- Low-latency: gRPC provides low-latency communication between clients and servers, making it suitable for real-time inference applications.
- High-performance: gRPC uses Protocol Buffers (protobuf) for serialization, which is more efficient than JSON or XML. This results in faster data transfer and processing.
- Scalability: gRPC allows for easy horizontal scaling, enabling you to distribute inference workloads across multiple machines or containers.
- Language-agnostic: gRPC supports multiple programming languages, including Python, Java, C++, and Go, making it a versatile choice for inference.

*gRPC Inference Architecture*  
- Client: The client sends a request to the gRPC server, specifying the input data and desired output.
- Server: The gRPC server receives the request, performs inference using the ML model, and returns the output to the client.
- Model: The ML model is deployed on the gRPC server, and can be updated or swapped without affecting the client.

*gRPC Inference Protobuf Definition*  
- Message definitions: Define the input and output messages using Protocol Buffers (protobuf). For example, InferenceRequest and InferenceResponse.
- Service definition: Define the gRPC service interface, specifying the methods for inference (e.g., Predict).

*Conclusion*
gRPC provides a robust and efficient framework for inference in machine learning models. By defining protobuf messages and services, you can create a scalable and language-agnostic inference architecture.

### Request Function

Build and submit the gRPC request. 

We're using the `TgisGrpcClient` class from the `utils` directory.  If you're curious about the gRPC code, it's in the [utils/tgis_grpc_client.py](utils/tgis_grpc_client.py) file.

In [None]:
import sys
sys.path.append('./utils')

from utils.tgis_grpc_client import TgisGrpcClient

# If using RHOAI 2.13 or new, set `verify=True`
# Older versions utilize a self-signed cert that is not trusted by default
client = TgisGrpcClient(
    hostname,
    443,
    verify=True,
)

client.make_request("What is the capital of Italy?", model_id=model_id)

Change the question to "What is the capital of America?" or "What is the boiling point of water?"

Sometimes the model will not give a correct answer to the question that we ask it. 

Can you explain why?