# gRPC Text Generation Inference with Caikit+TGIS Serving

### Set the Inference server url (replace with your own address)

In [1]:
!pip install -q caikit-nlp-client

In [None]:
host, port = "localhost", 8085 # replace with your own

### Imports

In [2]:
from caikit_nlp_client import GrpcClient

### With a self-signed certificate

Note: to extract the certificate chain, you can do the following in bash:
```bash
host=<your host>
port=<your port>
openssl s_client -showcerts -verify 5 -connect $host:$port < /dev/null |
    awk '/BEGIN CERTIFICATE/,/END CERTIFICATE/{ if(/BEGIN CERTIFICATE/){a++}; out="cert"a".pem"; print >out}'
cat cert*.pem > bundle.pem
```

In [4]:
with open('bundle.pem', 'rb') as f:
    bundle = f.read()

In [None]:
# Note: certificate verification can be disabled in the GrpcClient using verify=False for development purposes

### Query the service

In [5]:
# instantiate the client
client = GrpcClient(host, port, ca_cert=bundle) # replace ca_cert with verify=False to disable certificate verification

In [7]:
# Let's query the model!
model = 'Llama-2-7b-chat-hf'
generated_text = client.generate_text(
    'How do you bake a cake?',
    preserve_input_text=False,
    max_new_tokens=200,
    min_new_tokens=10,
)
print(generated_text)



Baking a cake is a straightforward process that requires a few basic ingredients and some time in the oven. Here's a step-by-step guide on how to bake a cake:

1. Preheat the oven: Preheat the oven to the temperature specified in the recipe you're using. This can range from 325°F to 375°F (160°C to 190°C), depending on the type of cake you're making.

2. Prepare the cake pan: Choose a cake pan that's the right size for the recipe you're using. Grease the pan with butter or cooking spray to prevent the cake from sticking.

3. Mix the ingredients: In a large mixing bowl, combine the dry ingredients (flour,


### Query the service - Streaming answer

In [8]:
# Let's get some streaming answers!
for chunk in client.generate_text_stream(
    'How do you bake a cake?',
    preserve_input_text=False,
    max_new_tokens=200,
    min_new_tokens=10,
):
    print(chunk, end="")



Baking a cake is a straightforward process that requires a few basic ingredients and some time in the oven. Here's a step-by-step guide on how to bake a cake:

1. Preheat the oven: Preheat the oven to the temperature specified in the recipe you're using. This can range from 325°F to 375°F (160°C to 190°C), depending on the type of cake you're making.

2. Prepare the cake pan: Choose a cake pan that's the right size for the recipe you're using. Grease the pan with butter or cooking spray to prevent the cake from sticking.

3. Mix the ingredients: In a large mixing bowl, combine the dry ingredients (flour,

### To go further: service, methods and parameters discovery

In [9]:
# List available services
services = client._reflection_db.get_services()
print(f'Available services: {services}')

Available services: ['caikit.runtime.Nlp.NlpService', 'caikit.runtime.Nlp.NlpTrainingService', 'caikit.runtime.training.TrainingManagement', 'grpc.reflection.v1alpha.ServerReflection', 'mmesh.ModelRuntime']


In [10]:
# Selecting the NlpService, list available methods
nlp_service = client._desc_pool.FindServiceByName('caikit.runtime.Nlp.NlpService')
print('Available methods:')
for m in nlp_service.methods:
    print(m.name)

Available methods:
TextClassificationTaskPredict
TextGenerationTaskPredict
ServerStreamingTextGenerationTaskPredict
TokenizationTaskPredict
TokenClassificationTaskPredict
BidiStreamingTokenClassificationTaskPredict


In [11]:
client.get_text_generation_parameters()

{'text': 'string',
 'max_new_tokens': 'int64',
 'min_new_tokens': 'int64',
 'truncate_input_tokens': 'int64',
 'decoding_method': 'string',
 'top_k': 'int64',
 'top_p': 'double',
 'typical_p': 'double',
 'temperature': 'double',
 'seed': 'uint64',
 'repetition_penalty': 'double',
 'max_time': 'double',
 'exponential_decay_length_penalty': {'start_index': 'int64',
  'decay_factor': 'double'},
 'stop_sequences': 'string',
 'preserve_input_text': 'bool'}