#### [APP_E_lane3](/home/hobs/code/hobs/nlpia-manuscript/manuscript/adoc/APP_E_lane3.adoc)

#### 

In [None]:
from predict_intent import INTENT_RECOGNIZER as pipe
pipe

#### 

In [None]:
from multilabel_intent_recognition import *  <1>
df = pd.read_csv(DATA_DIR / TRAINING_SET_PATH)
df

#### 

In [None]:
df['Label_1'].unique()

#### 

In [None]:
df_uni = multilabel_to_unilabel(df)
df_uni

#### 

In [None]:
df_tags = tags_from_labels(df_uni)
df_tags

#### 

In [None]:
@app.post("/intent/", response_model=Response)
async def intent(request: Request):  # <1>
    response = Response()
    response.tags = predict_intents(request.text)
    response.created_at = datetime.now()
    return response
----
<1> To improve readability, use the same name for the endpoint function and the endpoint URL path.

The endpoint function receives a `Request` object and uses it to construct a `Response` object.
The `app.post` decorator takes care of all the HTTP POST protocol processing to convert the request payload (JSON string or bytes) into a Python `Request` object that your function can process just like any other Python object.
The endpoint decorator also takes care of serializing the returned `Response` object to create the appropriate HTTP protocol responses.
So FastAPI takes care of all the difficult work of processing and generating the headers a web API needs to work.

How does FastAPI know what a valid `Request` or `Response` object should look like?
For example, if the user calls this endpoint with the string `1`, how does it know whether to keep it as a string or convert it to an `int`, `float`, or `bool` type object?
The answer is it doesn’t.
You need to use the `pydantic` library to tell FastAPI what kind of data the requests and responses will contain.
That’s all it needs to insert the appropriate string processing functions needed to create the Python types that you want.
Fortunately, `pydantic` just uses the built-in _type hints_ feature of Python 3.8+.

Here is the `pydantic` data model (schema) for the incoming `Request` objects:

[source,python]
----
class Request(BaseModel):
    text: str = None  # <1>
    embedding: list[float] = None  # <2>
----
<1> Optional natural language text (user chat message, document, LLM text, etc.)
<2> Optional embedding vector associated with natural language text from a user

Wait a minute, you probably thought this endpoint was designed to handle natural language text.
What is this second optional input for a list of ``float``s called an embedding?
If you define multiple possible arguments to your endpoint, it gives your user more options when calling your API.
You should take the time to think about all the possible use cases for your API.
This /intent/ endpoint was designed to be multipurpose and accept either natural language text _or_ an embedding.

Best practice API design would split this into two separate endpoints, but in some cases, this multipurpose endpoint can be helpful if you want to upgrade an endpoint while remaining _reverse compatible_.
A reverse-compatible API will work in the original way that your users have been using it in the past, but it also enables new features.
For web APIs, you should always try to make your endpoints reverse compatible for a period of time before you _deprecate_ a feature and require your users to learn the new API.

You define the `Response` object the same way you did for the `Request` class, using Pydantic:

[source,python]
----
class Response(BaseModel):
    tags: List[Tag] = []  # <1>
    embedding: list[float] = None  # <2>
    created_at: datetime = None  # <3>
----
<1> Sorted list of Tag objects (named tuples) with the most likely tag at the top
<2> Embedding or encoding vector (list of floats)
<3> Timestamp when the response was composed

Here in the `Response` class, you can define all the pieces of information you’d like to send back to the other parts of your app.
In the case of this multilabel intent recognizer endpoint, you could return a single intent label, such as `positive` or `greeting`, or you can provide more detail.
You built this multilabel classifier to be able to handle ambiguity by providing multiple intent labels for each message.
So you probably want to return a ranked list of all the possible intents as the preceding code does.
In addition to the label itself, you might want to provide an integer index for that label as well as a floating-point value for the probability or confidence of that particular label.
You can think of this as the weight or emphasis that the text places on this particular intent label.
Python provides a nice data type for capturing triplets of information like this—a named tuple.

The following code creates a standard Python `NamedTuple` class, where you can store the intent label, a confidence score, and the index integer of the intent, in one compact tuple:

#### 

In [None]:
class Tag(NamedTuple):
    label: str
    proba: float = None
    index: int = None
----

Now that you have seen the `pydantic` datatype class for the labels (tags) and the response object, you might now realize how to use that `Response.embedding` attribute to give the caller more information about the intents associated with the text.
As you can see in chapter 6, embedding vectors contain a lot of “ness” information (sentiment) about a word or passage of text.
So if your user has NLP skills like you do, they may want to get access to the raw BERT encoding (embedding) this endpoint uses under the hood.

Here is the code to pop the hood on your /intent/ endpoint and expose the raw embedding vector to NLP engineers or conversation designers who might want to use it within other parts of the application:

[source,python]
----
@app.post("/intent/", response_model=Response)
async def intent(request: Request):
    response = Response()
    response.embedding = predict_encoding_cached(request.text)
    response.tags = predict_intents_from_encoding(response.embedding)
    response.created_at = datetime.now()
    return response
----

You can see that this multipurpose endpoint reveals two new opportunities for creating additional microservices.
You can imagine an /encode_text/ endpoint to provide the raw BERT encoding vector.
The user would call that endpoint first and then use that encoding vector to call a second endpoint:/intents_from_encoding/.
This would allow you to split your endpoints into separate microservices.
You may also be wondering what that ``cached`` suffix means at the end of `predict_encoding_cached`.
You can learn more about both caching and splitting in section E.3.3.

This microservice for predicting user intent can be kept separate from the rest of your application.
This architecture makes it possible for you to continue to improve the NLP pipeline while your teammates work on other parts of the application.
Because this microservice focuses on this one prediction task, it remains isolated from other components delivering a successful chat experience.
This is called _separation of concerns_, a best practice that ensures more maintainable and performant software.
The microservice doing NLU prediction can ignore all the other tasks of a chatbot application, such as authentication and content management.

Well-designed and documented microservices are easy for developers to work with.
When microservices are clearly defined and have separate tasks, it becomes clear where errors originate.
In this prediction microservice’s case, a failure at the prediction step would indicate that the microservice had some issues.
Failures in other parts of the application would be easier to track down, as well.
Additionally, you should write focused tests to ensure different parts of the prediction service work—the model download, storing the model in memory, making a prediction, and the cache growing with use.
These tests ensure the changes don’t break the service as development continues.
But the other components of the web application don’t need the NLP-related tests cluttering up their test directories, much like both baseball teams don’t sit in the same dugout to watch the game.
A microservice should be independent, only loosely coupled with the rest of the application.
The more rigorously you plan for this and make the application configurable, the more reusable your microservice will be for other applications.

By breaking out your NLU endpoint as a microservice, in addition to improving the maintainability of your code, it also makes it easier to optimize the NLU for throughput, latency, and accuracy, without sacrificing the user experience in other parts of the app.
You can optimize each microservice separately, improving the scalability of your app and reducing server costs.
But before you try to optimize and scale up your application, you want to deploy your working prototype.
So the next step is creating a container (containerizing or Dockerizing) for your microservice.

=== Containerizing a microservice

Even as a smaller component, microservices can be challenging to build and deploy correctly.
There are many steps to configure and build a microservice, but environments differ in several ways across platforms, services, and developers, such as how they handle `.env` variables.
While it’s also good to have as similar staging and production branches as possible, it might be more practical to use settings on staging that reduce costs.
It can be easy to miss or even break steps that are necessary to get a microservice set up and running.
Developers work across different environments and may need to use different steps to accomplish the same task.
A simple example comes from activating a venv virtual environment.
Windows uses `source .venv/Scripts/activate`, but Linux uses `source .venv/bin/activate`.
The greater a variety of platforms you use, the more challenging it can be to communicate deployment and to collaborate in general.
Working through these platform differences can be time-consuming and frustrating.
Containerizing can help deal with the complexity of deploying to various systems.


#### 

In [None]:
cache_directory = Path(DATA_DIR / "cache")
memory = joblib.Memory(cache_directory, verbose=0, backend="local")
async def clean_prediction_cache():
   memory.reduce_size(items_limit=500, bytes_limit=1048576)
predict_intents_cached = memory.cache(predict_intents)

#### 

In [None]:
INTENT_RECOGNIZER_MODEL = None
def predict_intents_list(text, num_intents=None):
   global INTENT_RECOGNIZER_MODEL
   INTENT_RECOGNIZER_MODEL = INTENT_RECOGNIZER_MODEL or joblib.load(
       download_model(INTENT_RECOGNIZER_PATH)
   )

#### 

In [None]:
from multilabel_intent_recognition import *
train_validate_save()
mv data/multi_intent_recognizer.{timestamp}.pkl data/multi_intent_recognizer.pkl

#### 

In [None]:
import requests
resp = requests.post(
    "http://127.0.0.1:8080/intent/",
    json={"content":
        "Disturbing! That made me uncomfortable."})
resp

#### 

In [None]:
resp.json()['tags']

#### 

In [None]:
resp.json()['embedding']