### Making Request to the inference Server.

We have now built our model, the next step is to make an inference request to it and analyze the response.

Since the model is deployed as a REST API you can make inference requests to it using any client of your choice in any language

.  The inference server is very strict in terms of what it expects as input, and how to build it. Fortunately, they have described different clients to use to build the inputs. 

For demonstration purposes, I will be using the Python HTTP client to make inference requests. 

But nothing restricted you from using your language to make HTTP requests to the API.


In [1]:
import numpy as np
import tritonclient.http as httpclient
url = "141.147.108.177:8000"
http_client = httpclient.InferenceServerClient(url=url,verbose=False)
                  

The above code creates the http client, with our server url, let us define the input and output of it.

In [None]:
text_input = httpclient.InferInput('TEXT', shape=[1], datatype='BYTES')

embedding_output = httpclient.InferRequestedOutput("3391", binary_data=False)


Those are the placeholder for our inputs and output, let us fill them now:


In [None]:
sentences = ["what cause covid"]
np_input_data = np.asarray([sentences], dtype=object)


In [None]:
np_input_data.reshape(-1)

array(['what cause covid'], dtype=object)

In [None]:
text_input.set_data_from_numpy(np_input_data.reshape(-1))

<tritonclient.http._infer_input.InferInput at 0x10435b460>

In [None]:
results = http_client.infer(model_name="ensemble_model", inputs=[text_input], outputs=[embedding_output])

TimeoutError: timed out

In [None]:
results

<tritonclient.http._infer_result.InferResult at 0x10c2f3130>

We can now convert back the output to numpy using

In [None]:
inference_output = results.as_numpy('3391')
print(inference_output.shape)

(1, 1024)
