-
Notifications
You must be signed in to change notification settings - Fork 16
feat: Add parallel tool calling support for Meta/Llama models #59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add parallel tool calling support for Meta/Llama models #59
Conversation
🔍 Verification: is_parallel_tool_calls is Meta/Llama OnlyVerified through OCI API documentation that API Documentation FindingsGenericChatRequest (Meta/Llama models):
CohereChatRequest (Cohere models):
ConclusionThe implementation correctly restricts
This is an OCI platform limitation, not a langchain-oracle implementation choice. Future SupportIf OCI adds
For now, Meta/Llama only is correct and properly documented. |
YouNeedCryDear
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please move the test file into the correct folder. Also I don't think Llama model supports parallel tool calls. Have you tested it?
5861a70 to
9bd0122
Compare
🙏 Thank You for the Review!Thanks @YouNeedCryDear for catching these issues! Your feedback helped improve the implementation significantly. 📝 Clarification on Llama Parallel Tool Calling SupportAfter extensive testing with real OCI API calls, here's what we found: Only Llama 4+ Actually Works
Test EvidenceWhen asked: "What's the weather and population of Tokyo?" Llama 4 ( Llama 3.3 ( Conclusion: The OCI API accepts the Reference: https://www.llama.com/docs/model-cards-and-prompt-formats/llama3_3/ ✅ Changes Made
The implementation now properly restricts parallel tool calling to Llama 4+ only! |
|
@YouNeedCryDear let me know if anything else is needed here. |
|
👋 Hi! This PR is needed for executing Llama models with parallel tool calling capabilities. The implementation is ready and all tests are passing. Could we prioritize reviewing this to prevent the code from going stale? Thanks! |
9bd0122 to
6583478
Compare
|
🔄 Rebased onto latest main cc @paxiaatucsdedu @YouNeedCryDear - This PR adds parallel tool calling support needed for executing Llama models efficiently. All tests passing and ready for review! |
|
✅ Rebase and fixes completed Changes made:
Validation approach:
CI checks are running to verify everything passes in the CI environment. |
Add support for the parallel_tool_calls parameter to enable parallel
function calling in Meta/Llama models, improving performance for
multi-tool workflows.
- Add parallel_tool_calls class parameter to OCIGenAIBase (default: False)
- Add parallel_tool_calls parameter to bind_tools() method
- Support hybrid approach: class-level default + per-binding override
- Pass is_parallel_tool_calls to OCI API in MetaProvider
- Add validation for Cohere models (raises error if attempted)
- 9 comprehensive unit tests (all passing)
- 4 integration tests with live OCI API (all passing)
- No regression in existing tests
Class-level default:
llm = ChatOCIGenAI(
model_id="meta.llama-3.3-70b-instruct",
parallel_tool_calls=True
)
Per-binding override:
llm_with_tools = llm.bind_tools(
[tool1, tool2, tool3],
parallel_tool_calls=True
)
- Up to N× speedup for N independent tool calls
- Backward compatible (default: False)
- Clear error messages for unsupported models
- Follows existing parameter patterns
…ol calling - Update README to include all GenericChatRequest models (Grok, OpenAI, Mistral) - Update code comments and docstrings - Update error messages with complete model list - Clarify that feature works with GenericChatRequest, not just Meta/Llama
Relocated test_parallel_tool_calling_integration.py to tests/integration_tests/chat_models/ Following repository convention for integration test organization
Only Llama 4+ models support parallel tool calling based on testing. Parallel tool calling support: - Llama 4+ - SUPPORTED (tested and verified with real OCI API) - ALL Llama 3.x (3.0, 3.1, 3.2, 3.3) - BLOCKED - Cohere - BLOCKED (existing behavior) - Other models (xAI Grok, OpenAI, Mistral) - SUPPORTED Implementation: - Added _supports_parallel_tool_calls() helper method with regex version parsing - Updated bind_tools() to validate model version before enabling parallel calls - Provides clear error messages: "only available for Llama 4+ models" Unit tests added (8 tests, all mocked, no OCI connection): - test_version_filter_llama_3_0_blocked - test_version_filter_llama_3_1_blocked - test_version_filter_llama_3_2_blocked - test_version_filter_llama_3_3_blocked (Llama 3.3 doesn't support it either) - test_version_filter_llama_4_allowed - test_version_filter_other_models_allowed - test_version_filter_supports_parallel_tool_calls_method - Plus existing parallel tool calling tests updated to use Llama 4
- Fix line length violations in chat_models and llms - Replace print statements with logging in integration tests - Fix import sorting and remove unused imports - Fix unused variable in test
- Validation now happens at request preparation time - Cohere validation remains in CohereProvider - Llama 3.x validation added to GenericProvider - Fixes failing unit tests
- Llama 3.x validation happens early at bind_tools time - Cohere validation happens at provider level (_prepare_request time) - All 16 parallel tool calling tests now pass
56c0a46 to
1ed506a
Compare
✅ Rebased on Latest MainSuccessfully rebased this PR on the latest Changes in this rebase:
Test Results: @YouNeedCryDear @paxiaatucsdedu - Ready for review! This PR adds parallel tool calling support for Llama 4+ models. All tests passing. |
|
Hey team - just wanted to follow up on this PR. Parallel tool calling is a critical feature that multiple teams are waiting on for their testing workflows. Without this, Llama 4 models can't fully leverage their parallel execution capabilities, which significantly impacts performance testing and multi-tool agent scenarios. Would really appreciate if we could prioritize getting this reviewed and merged. Happy to address any feedback quickly. Thanks! |
|
@fede-kamel Could you please fix the failed linting? |
eeb70c4 to
3bb4d01
Compare
|
@fede-kamel Seems like for all the other vendors, parallel_tool_calls are only introduced inside the bind_tools. I think we need to be consistent. |
|
@YouNeedCryDear Fixed the linting issues and moved |
|
Also updated the README - removed the class-level example, now only shows |
- Add type: ignore[override] to bind_tools methods in oci_data_science.py and oci_generative_ai.py to handle signature incompatibility with BaseChatModel parent class - Remove unused type: ignore comments in oci_generative_ai.py - Add type: ignore[attr-defined] comments for RunnableBinding runtime attributes (kwargs, _prepare_request) in test_parallel_tool_calling.py - Fix test_parallel_tool_calling_integration.py to use getattr for tool_calls attribute access on BaseMessage - Fix test_tool_calling.py: import StructuredTool from langchain_core.tools - Fix test_oci_data_science.py: remove unused type: ignore comment - Fix test_oci_generative_ai_responses_api.py: add type: ignore for LangGraph invoke arg type
- Add type: ignore[unreachable] back to BaseTool isinstance check in oci_generative_ai.py (CI mypy flags this as unreachable) - Remove type: ignore[override] from bind_tools (CI reports unused) - Fix test_oci_data_science.py: explicitly type output variable and use explicit addition instead of += to avoid assignment type error - Remove unused type: ignore comments from test files
- Use Optional[T] instead of T | None syntax for Python 3.9 compat - Add type: ignore[assignment] for AIMessageChunk addition
|
Good to go - all passing @YouNeedCryDear |
|
|
||
| # Extract provider from model_id | ||
| # (e.g., "meta" from "meta.llama-4-maverick-17b-128e-instruct-fp8") | ||
| provider = model_id.split(".")[0].lower() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is already a property to get the provider.
| provider = model_id.split(".")[0].lower() | ||
|
|
||
| # Cohere models don't support parallel tool calling | ||
| if provider == "cohere": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic probably better fit into a bool method in provider class?
| if provider == "meta" and "llama" in model_id.lower(): | ||
| # Extract version number | ||
| # (e.g., "4" from "meta.llama-4-maverick-17b-128e-instruct-fp8") | ||
| version_match = re.search(r"llama-(\d+)", model_id.lower()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This string match method seems too hacky to me. maybe just let it fail on the API side for those doesn't have parallel tool support?
| if parallel_tool_calls: | ||
| # Validate Llama 3.x doesn't support parallel tool calls (early check) | ||
| model_id = self.model_id or "" | ||
| is_llama = "llama" in model_id.lower() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is too hacky as well. If the user has DAC endpoint, and the model_id won't be something like llama-4-XXXX
…id parsing Addresses reviewer feedback: - Add supports_parallel_tool_calls property to Provider base class (False) - Override in GenericProvider to return True (supports parallel calls) - CohereProvider inherits False (doesn't support parallel calls) - Remove _supports_parallel_tool_calls method with hacky model_id parsing - Simplify bind_tools to use provider property for validation - Remove Llama version-specific validation (let API fail naturally) - Update unit tests to focus on provider-based validation
|
@YouNeedCryDear Addressed all the review feedback:
The validation is now simple:
For models that don't actually support parallel tool calls (like Llama 3.x), the OCI API will return an error naturally instead of us trying to pre-validate with hacky string matching. |
- Reorder convert_to_oci_tool checks to avoid unreachable code warning - Fix type annotation in test_stream_vllm to use BaseMessageChunk
c1d67aa to
d17fc8f
Compare
Summary
Add support for parallel tool calling to enable models to execute multiple tools simultaneously, improving performance for multi-tool workflows.
Problem
The langchain-oracle SDK did not expose the OCI API's
is_parallel_tool_callsparameter, forcing sequential tool execution even when tools could run in parallel.Solution
Implemented hybrid approach allowing both class-level defaults and per-binding overrides:
Changes
parallel_tool_callsparameter toOCIGenAIBase(default: False)bind_tools()method to acceptparallel_tool_callsparameterGenericProviderto passis_parallel_tool_callsto OCI APITesting
Unit Tests (9/9 passing)
Integration Tests (4/4 passing)
All tests verified with live OCI GenAI API.
Backward Compatibility
✅ Fully backward compatible
False(existing behavior)Benefits
Model Support
Supported (GenericChatRequest models):
Unsupported: