♫ The Dream of the 90's ♫ is alive in Portland "a weird suite of Enterprise LLM tools" named after Nicktoons
A set of serverless functions designed to assist in the monitoring of outputs from language models, specifically inspection of messages for non-conformity to pre-calulcated log-likelihood and the appending of warnings based on the achieved log-likelihood and p-values for new messages
i.e. instead of doing batch or event-driven prediction of the range of possible values, herein we compare individual outputs to previously established outputs and measure non-conformity as a heuristic for whether not not they "would have" fallen outside the expected range, thus representing non-conforming outputs which should be labeled as such.
Note: BERT was established as default, given the flexibility in swapping transformers models, but the quality of your model and resultant log-likelihood has a direct effect on the quality of your p-values and their utility as an indicator of non-conformity
- Large Language Models are subject to various forms of prompt injection (indirect or otherwise); lightweight and step-wise alerting of outputs ensures additional safeguarding of externally facing models
- User experience, instrumentation, and metadata capture are crucial to the adoption of LLMs for orchestration of multi-modal agentic systems; calculating the log-likehood of a given message, and thus it's p-value, serves as a good heuristics for it's adherence to the expected vector space of what might otherwise been previously set as the prediction interval and expected range of values e.g. if you had previously used conformal prediction to establish the possible range of values i.e. instead of doing batch or event-driven prediction of the range of possible values, herein we compare individual outputs to previously established outputs and measure non-conformity as a heuristic for whether not not they "would have" fallen outside the expected range
The intent of this Squidward_looking_out_his_window.py is to efficiently spin up, calculate needed values for evaluation, and inspect each message for non-conformity to established output; thereafter routing messages to the appropriate SQS bus (e.g. for response to user, further evaluation, logging, etc)
The goal being to detect if the message has high non-conformity with known model outputs; via use of a pre-calculated log-likelihood for the model (or eventually specific vector space clusters)
The is pre-calculated and stored in parquet, then loaded into memory. log-likelihood and p-values are calculated for incoming messages and appended/routed appropriately; when complete the function spins down appropriately.
Based on the resultant calculations messages are routed to the appropriate SQS bus.