Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LLM: Add simple capability system #69

Merged
merged 3 commits into from
Mar 5, 2024
Merged

Conversation

Hialus
Copy link
Member

@Hialus Hialus commented Feb 28, 2024

Motivation

For our first attempt at selecting LLMs we want to try a system based on capabilities. This system should allow the pipeline subsystem to specify what it needs from the LLM and the LLM subsystem will select the best fitting model that matches these requirements.

Description

For this I added several capabilities in the first version. The most notable ones are:

  • gpt_version_equivalent as a first simple measure of skill
  • cost: To set the cost of a model. TODO: Specify unit used
  • speed: The generation speed of the model. Higher = faster
  • context_length: The maximum amount of tokens the model can handle

To match against these capabilities there is a matching RequirementList class. The pipelines can specify their requirements and the new CapabilityRequestHandler will select the best fitting model.
The CapabilityRequestHandler will not select models that do not fulfill the requirements.
The CapabilityRequestHandler can either select the best possible model or the worst model that passes the requirements check. This is useful, as often you simply want to get the model that barely fulfills the requirements. Selecting the best should only be done if you specify a cost limit, as it will otherwise always choose e.g. GPT 4 32k (or Turbo 128k).

Only the RequirementList and CapabilityRequestHandler classes are considered part of the public interface of the LLM subsystem. These can be considered stable for the nearer future. The rest of the implementation will likely change in the short term and is subject to improvements and refactorings.

@Hialus Hialus force-pushed the feature/llm/capability-system branch from daea54f to d64272a Compare February 28, 2024 03:33
@Hialus Hialus added this to the 1.0.0-Prototype milestone Feb 28, 2024
Copy link
Contributor

coderabbitai bot commented Feb 29, 2024

Walkthrough

The recent updates aim to enhance type hinting for better clarity and consistency in various domain objects. Additionally, a new capability management system for language models (LLMs) has been introduced. This system includes classes for handling capabilities and requirements of LLMs, enabling smarter model selection based on capabilities.

Changes

File(s) Summary of Changes
.../codehint.py, .../dtos.py, .../submission.py Updated type hints to explicitly use list type annotations for improved clarity and consistency.
app/llm/__init__.py, .../capability/__init__.py Added imports related to the new capability management system.
.../capability/capability_checker.py, .../capability/capability_list.py, .../capability/requirement_list.py Introduced new classes and functions for managing and evaluating LLM capabilities.
.../external/model.py, .../llm_manager.py Integrated capabilities into the LanguageModel class and added sorting methods based on capability scores.
.../request_handler/__init__.py, .../request_handler/basic_request_handler.py, .../request_handler/capability_request_handler.py Reorganized imports and introduced CapabilityRequestHandler for model selection based on capabilities.

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share

Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit-tests for this file.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit tests for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai generate interesting stats about this repository from git and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit tests.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • The JSON schema for the configuration file is available here.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/coderabbit-overrides.v2.json

CodeRabbit Discord Community

Join our Discord Community to get help, request features, and share feedback.

from llm.request_handler import *
from llm.capability import RequirementList
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The explicit import of RequirementList is redundant due to the preceding wildcard import from the same module (from llm.capability import *). It's a good practice to avoid such redundancies to keep the code clean and readable.

- from llm.capability import RequirementList

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
from llm.capability import RequirementList

Comment on lines 1 to 37
class RequirementList:
"""A class to represent the requirements you want to match against"""

input_cost: float | None
output_cost: float | None
gpt_version_equivalent: float | None
speed: float | None
context_length: int | None
vendor: str | None
privacy_compliance: bool | None
self_hosted: bool | None
image_recognition: bool | None
json_mode: bool | None

def __init__(
self,
input_cost: float | None = None,
output_cost: float | None = None,
gpt_version_equivalent: float | None = None,
speed: float | None = None,
context_length: int | None = None,
vendor: str | None = None,
privacy_compliance: bool | None = None,
self_hosted: bool | None = None,
image_recognition: bool | None = None,
json_mode: bool | None = None,
) -> None:
self.input_cost = input_cost
self.output_cost = output_cost
self.gpt_version_equivalent = gpt_version_equivalent
self.speed = speed
self.context_length = context_length
self.vendor = vendor
self.privacy_compliance = privacy_compliance
self.self_hosted = self_hosted
self.image_recognition = image_recognition
self.json_mode = json_mode
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The RequirementList class is well-structured and aligns with the PR's objectives to introduce a flexible capability system for selecting language learning models. Each attribute represents a specific requirement that can be used to match against available models.

However, consider adding documentation comments for each attribute to clarify their purpose and expected values, enhancing maintainability and understanding for future developers.

Comment on lines +41 to +49
def get_llms_sorted_by_capabilities_score(
self, requirements: RequirementList, invert_cost: bool = False
):
"""Get the llms sorted by their capability to requirement scores"""
scores = calculate_capability_scores(
[llm.capabilities for llm in self.entries], requirements, invert_cost
)
sorted_llms = sorted(zip(scores, self.entries), key=lambda pair: -pair[0])
return [llm for _, llm in sorted_llms]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The get_llms_sorted_by_capabilities_score method is a key addition that aligns with the PR's objectives, efficiently sorting LLMs based on capability scores. This method enhances the model selection process by considering the specified requirements.

However, consider adding error handling or a fallback mechanism for cases where no models match the specified requirements, ensuring robustness in all scenarios.

Comment on lines +18 to +63
class CapabilityRequestHandler(RequestHandler):
"""Request handler that selects the best/worst model based on the requirements"""

requirements: RequirementList
selection_mode: CapabilityRequestHandlerSelectionMode
llm_manager: LlmManager

def __init__(
self,
requirements: RequirementList,
selection_mode: CapabilityRequestHandlerSelectionMode = CapabilityRequestHandlerSelectionMode.WORST,
) -> None:
self.requirements = requirements
self.selection_mode = selection_mode
self.llm_manager = LlmManager()

def complete(self, prompt: str, arguments: CompletionArguments) -> str:
llm = self._select_model(CompletionModel)
return llm.complete(prompt, arguments)

def chat(
self, messages: list[IrisMessage], arguments: CompletionArguments
) -> IrisMessage:
llm = self._select_model(ChatModel)
return llm.chat(messages, arguments)

def embed(self, text: str) -> list[float]:
llm = self._select_model(EmbeddingModel)
return llm.embed(text)

def _select_model(self, type_filter: type) -> LanguageModel:
"""Select the best/worst model based on the requirements and the selection mode"""
llms = self.llm_manager.get_llms_sorted_by_capabilities_score(
self.requirements,
self.selection_mode == CapabilityRequestHandlerSelectionMode.WORST,
)
llms = [llm for llm in llms if isinstance(llm, type_filter)]

if self.selection_mode == CapabilityRequestHandlerSelectionMode.BEST:
llm = llms[0]
else:
llm = llms[-1]

# Print the selected model for the logs
print(f"Selected {llm.description}")
return llm
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CapabilityRequestHandler class is a key addition that enhances the model selection process by leveraging the new capability system. It correctly implements functionality to select the best or worst model based on specified requirements and selection mode.

However, consider adding documentation to clarify the selection process and the impact of the selection mode on the chosen model, enhancing understanding and maintainability.

Comment on lines +9 to +80
def capabilities_fulfill_requirements(
capability: CapabilityList, requirements: RequirementList
) -> bool:
"""Check if the capability fulfills the requirements"""
return all(
getattr(capability, field).matches(getattr(requirements, field))
for field in requirements.__dict__.keys()
if getattr(requirements, field) is not None
)


def calculate_capability_scores(
capabilities: list[CapabilityList],
requirements: RequirementList,
invert_cost: bool = False,
) -> list[int]:
"""Calculate the scores of the capabilities against the requirements"""
all_scores = []

for requirement in requirements.__dict__.keys():
requirement_value = getattr(requirements, requirement)
if (
requirement_value is None
and requirement not in always_considered_capabilities_with_default
):
continue

# Calculate the scores for each capability
scores = []
for capability in capabilities:
if (
requirement_value is None
and requirement in always_considered_capabilities_with_default
):
# If the requirement is not set, use the default value if necessary
score = getattr(capability, requirement).matches(
always_considered_capabilities_with_default[requirement]
)
else:
score = getattr(capability, requirement).matches(requirement_value)
# Invert the cost if required
# The cost is a special case, as depending on how you want to use the scores
# the cost needs to be considered differently
if (
requirement in ["input_cost", "output_cost"]
and invert_cost
and score != 0
):
score = 1 / score
scores.append(score)

# Normalize the scores between 0 and 1 and multiply by the weight modifier
# The normalization here is based on the position of the score in the sorted list to balance out
# the different ranges of the capabilities
sorted_scores = sorted(set(scores))
weight_modifier = capability_weights[requirement]
normalized_scores = [
((sorted_scores.index(score) + 1) / len(sorted_scores)) * weight_modifier
for score in scores
]
all_scores.append(normalized_scores)

final_scores = []

# Sum up the scores for each capability to get the final score for each list of capabilities
for i in range(len(all_scores[0])):
score = 0
for j in range(len(all_scores)):
score += all_scores[j][i]
final_scores.append(score)

return final_scores
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The functions capabilities_fulfill_requirements and calculate_capability_scores are well-implemented, providing a robust mechanism for evaluating and scoring models based on their capabilities and specified requirements. This functionality is central to the new capability system.

However, consider adding documentation to clarify the scoring logic, especially the role of invert_cost in the calculation process, to enhance understanding and maintainability.

Comment on lines 1 to 121
def matches(self, text: str) -> int:
return int(self.value == text)

def __str__(self):
return f"TextCapability({super().__str__()})"


class OrderedNumberCapability(BaseModel):
"""A capability that is better the higher the value"""

value: int | float

def matches(self, number: int | float) -> int | float:
if self.value < number:
return 0
return self.value - number + 1

def __str__(self):
return f"OrderedNumberCapability({super().__str__()})"


class InverseOrderedNumberCapability(BaseModel):
"""A capability that is better the lower the value"""

value: int | float

def matches(self, number: int | float) -> int | float:
if self.value > number:
return 0
return number - self.value + 1

def __str__(self):
return f"InverseOrderedNumberCapability({super().__str__()})"


class BooleanCapability(BaseModel):
"""A simple boolean capability"""

value: bool

def matches(self, boolean: bool) -> int:
return int(self.value == boolean)

def __str__(self):
return f"BooleanCapability({str(self.value)})"


class CapabilityList(BaseModel):
"""A list of capabilities for a model"""

input_cost: InverseOrderedNumberCapability = Field(
default=InverseOrderedNumberCapability(value=0)
)
output_cost: InverseOrderedNumberCapability = Field(
default=InverseOrderedNumberCapability(value=0)
)
gpt_version_equivalent: OrderedNumberCapability = Field(
default=OrderedNumberCapability(value=2)
)
speed: OrderedNumberCapability = Field(default=OrderedNumberCapability(value=0))
context_length: OrderedNumberCapability = Field(
default=OrderedNumberCapability(value=0)
)
vendor: TextCapability = Field(default=TextCapability(value=""))
privacy_compliance: BooleanCapability = Field(
default=BooleanCapability(value=False)
)
self_hosted: BooleanCapability = Field(default=BooleanCapability(value=False))
image_recognition: BooleanCapability = Field(default=BooleanCapability(value=False))
json_mode: BooleanCapability = Field(default=BooleanCapability(value=False))

@model_validator(mode="before")
@classmethod
def from_dict(cls, data: dict[str, any]):
"""Prepare the data for handling by Pydantic"""
for key, value in data.items():
if type(value) is not dict:
data[key] = {"value": value}
return data


# The weights for the capabilities used in the scoring
capability_weights = {
"input_cost": 0.5,
"output_cost": 0.5,
"gpt_version_equivalent": 4,
"speed": 2,
"context_length": 0.1,
"vendor": 1,
"privacy_compliance": 0,
"self_hosted": 0,
"image_recognition": 0,
"json_mode": 0,
}

# The default values for the capabilities that are always considered
always_considered_capabilities_with_default = {
"input_cost": 100000000000000,
"output_cost": 100000000000000,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CapabilityList class and various capability classes (TextCapability, OrderedNumberCapability, etc.) are well-structured and crucial for the new capability system, allowing for detailed specification and evaluation of model capabilities.

However, consider adding documentation comments for each capability class to clarify their purpose and expected use, enhancing maintainability and understanding for future developers.

@Hialus Hialus merged commit 4e27d05 into main Mar 5, 2024
4 checks passed
@Hialus Hialus deleted the feature/llm/capability-system branch March 5, 2024 23:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant