Skip to content

Conversation

@dinmukhamedm
Copy link
Member

@dinmukhamedm dinmukhamedm commented Nov 12, 2025

Note

Adds datasets CLI and SDK (async/sync) with list/push/pull/create and links evaluation datapoints to datasets; updates logging, tests, CI, and deps.

  • CLI
    • New lmnr datasets subcommands: list, push, pull, create with shared --project-api-key/--base-url/--port and async handlers in lmnr/cli.
    • Central cli() entrypoint; eval runner refactored to lmnr/cli/evals.py; cursor rules moved to lmnr/cli/rules.py.
  • SDK (Async/Sync Clients)
    • New client.*.resources.datasets exposing list_datasets, get_dataset_by_name, push, pull; surfaced via client.datasets.
    • evals.get_datapoints now warns; use datasets.pull.
  • Evaluations
    • Link partial/final datapoints to datasets via dataset_link; auto-resolve dataset id from name when using LaminarDataset.
    • Guard empty data; minor progress/cleanup tweaks and background upload waits.
  • Datasets SDK
    • New sdk/datasets with LaminarDataset (by name or id) and file_utils to load JSON/CSV/JSONL.
  • Types & Logging
    • Extend Datapoint (id, createdAt, default {} for target/metadata); add Dataset, PushDatapointsResponse, EvaluationDatapointDatasetLink; map totalCounttotal_count.
    • get_default_logger(verbose=...) for compact console output.
  • Tests & Infra
    • Update CLI/evaluation tests for new module paths and human evaluators; refresh cassette/model; adjust assertions.
    • CI: test Python 3.14.
    • Deps: pin opentelemetry-instrumentation-langchain <0.48.0.

Written by Cursor Bugbot for commit 3c9cf7c. This will update automatically on new commits. Configure here.


Important

Enhance CLI and SDK for dataset management and evaluation, refactor CLI structure, extend logging/types, and update tests and dependencies.

  • CLI:
    • Add lmnr datasets with subcommands: list, push, pull, create in cli/__init__.py.
    • Refactor CLI entry: dedicated eval runner in cli/evals.py, rules.add_cursor_rules, and central cli() in cli/__init__.py.
  • SDK Clients (async/sync):
    • New client.*.resources.datasets exposing list_datasets, get_dataset_by_name, push, pull; surfaced via client.datasets.
    • Mark evals.get_datapoints deprecated and route to datasets endpoint.
  • Evaluations:
    • Attach dataset_link to partial/final datapoints; auto-resolve dataset id from name; guard empty data; minor progress handling.
  • Datasets SDK:
    • New sdk/datasets module with LaminarDataset (by name or id) and file_utils for JSON/CSV/JSONL I/O.
  • Types & Logging:
    • Extend Datapoint (id, createdAt, default {} for target/metadata); add Dataset, PushDatapointsResponse, EvaluationDatapointDatasetLink; map totalCounttotal_count.
    • get_default_logger gains verbose flag for simplified console formatting.
  • Tests & CI:
    • Update CLI/evaluation tests (human evaluators, dataset interactions); refresh VCR cassette/model; adjust patch paths.
    • CI: add Python 3.14 to matrix.
  • Dependencies:
    • Pin opentelemetry-instrumentation-langchain extra to <0.48.0.

This description was created by Ellipsis for 3c9cf7c. You can customize this summary. It will automatically update as commits are pushed.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Changes requested ❌

Reviewed everything up to 05af33d in 2 minutes and 57 seconds. Click for details.
  • Reviewed 2925 lines of code in 18 files
  • Skipped 0 files when reviewing.
  • Skipped posting 3 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. src/lmnr/cli/__init__.py:24
  • Draft comment:
    The help string for the positional 'file' argument is concatenated without explicit spaces, which may lead to missing spaces in the output. Consider inserting spaces or using a multiline string for clarity.
  • Reason this comment was not posted:
    Marked as duplicate.
2. src/lmnr/sdk/client/synchronous/resources/datasets.py:93
  • Draft comment:
    Similar to the asynchronous version, the synchronous push() method returns during the first batch loop. Ensure that all batches are processed rather than returning early.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 0% <= threshold 50% The comment is asking the author to ensure that all batches are processed rather than returning early. This is a request for confirmation of intended behavior, which violates the rule against asking the PR author to confirm their intention or ensure behavior.
3. src/lmnr/cli/evals.py:65
  • Draft comment:
    The module name is generated by concatenating 'user_module' with the file path without a separator. This could produce invalid module names. Consider adding a separator or sanitizing the file path.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.

Workflow ID: wflow_VJneyKDifb5BwdO9

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Changes requested ❌

Reviewed d09b584 in 2 minutes and 34 seconds. Click for details.
  • Reviewed 313 lines of code in 5 files
  • Skipped 0 files when reviewing.
  • Skipped posting 2 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. src/lmnr/cli/__init__.py:60
  • Draft comment:
    The '--port' argument was removed from laminar args but is still used when creating the client (e.g., in datasets commands). This may lead to a missing attribute error.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% The comment is about a cross-file issue - it claims that code in other files (like datasets commands) still references the removed --port argument. According to the rules, I should "Ignore cross-file issues. Only think about the file you are reviewing." The diff only shows changes to argument parsing setup, not to any code that uses these arguments. The comment is speculative about what happens in other files when the client is created. Without seeing the actual client creation code or the datasets command implementation, I cannot verify if this is a real issue. This violates the rule that I need "STRONG EVIDENCE that the comment is correct." However, if the port argument was being used and is now removed, this could be a legitimate breaking change. The comment might be pointing out a real bug that would cause runtime errors. Perhaps the reviewer has context about other files that I don't have access to. Even if this is a real issue, the rules explicitly state to "Ignore cross-file issues. Only think about the file you are reviewing." The comment is about code in other files (datasets commands, client creation) that isn't shown in this diff. Without seeing evidence in THIS file that there's a problem, I should delete the comment. Additionally, the comment is placed on line 68 which is just a closing parenthesis, not even related to where the port argument was removed. This comment should be deleted because it's about a cross-file issue (how other files use the removed argument), which violates the rule to only consider the file being reviewed. There's no evidence in this diff that shows the port argument being used incorrectly within this file.
2. src/lmnr/sdk/types.py:33
  • Draft comment:
    Good use of default_factory for 'target' and 'metadata' to avoid mutable default pitfalls.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50% None

Workflow ID: wflow_T5SjAM373YvWYdN5

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Changes requested ❌

Reviewed 66e2dab in 3 minutes and 23 seconds. Click for details.
  • Reviewed 245 lines of code in 5 files
  • Skipped 0 files when reviewing.
  • Skipped posting 9 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. pyproject.toml:74
  • Draft comment:
    The dependency spec for langchain now uses parentheses (e.g. opentelemetry-instrumentation-langchain(>=0.47.1,<0.48.0)). Verify that this non-standard syntax is supported and intentional.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 0% vs. threshold = 50% The comment claims the parentheses syntax is "non-standard" but this is incorrect. PEP 508 explicitly supports parentheses for grouping version specifiers. More importantly, the same syntax is already used in the file on line 18 for pydantic: pydantic (>=2.0.3,<3.0.0). This proves the syntax is both valid and already in use in this project. The comment is asking the author to "verify" something that is clearly intentional and correct, which violates the rule about not asking authors to confirm their intention. This is not a useful comment. Could the parentheses syntax cause issues with specific package managers or build tools that this project uses? Maybe there's a subtle difference between spaces around parentheses that matters? While different tools might have quirks, the syntax is standard PEP 508 and is already used successfully in this same file (line 18). If there were compatibility issues, they would already exist. The comment provides no specific evidence of a problem, just speculation. This comment should be deleted. The parentheses syntax is standard PEP 508 and is already used elsewhere in the same file (line 18 for pydantic). The comment incorrectly labels it as "non-standard" and asks the author to verify something that is clearly correct and intentional.
2. src/lmnr/cli/__init__.py:69
  • Draft comment:
    New '--port' argument is added. Consider handling cases where the LMNR_PORT environment variable might be empty or non-numeric to avoid potential type issues.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50% The comment raises a potential issue: if from_env("LMNR_PORT") returns a non-numeric string, the code would fail. However, I need to consider: 1) The type=int in argparse only applies to command-line arguments, not the default value, 2) If from_env("LMNR_PORT") returns a non-numeric string, it would be used as-is in the default, which could cause issues later, 3) But this is speculative - we don't know if this is actually a problem in practice, 4) The existing code for --base-url doesn't do this validation, suggesting it may not be necessary, 5) Without seeing the from_env implementation, I can't be sure if it already handles this. I don't have visibility into the from_env function implementation - it might already handle validation or return None for invalid values. The comment is somewhat speculative ("might be empty or non-numeric") rather than pointing to a definite issue. The existing codebase doesn't validate environment variables in this way for other arguments. While the comment could be valid, it's speculative without evidence that this is actually a problem. The existing code pattern doesn't validate environment variables this way, and we don't know if from_env already handles edge cases. This falls under "speculative comments" that should be removed per the rules. This comment is speculative and suggests defensive programming without strong evidence of an actual issue. The existing codebase doesn't validate environment variables in this manner, and we lack context on whether from_env already handles these cases. Per the rules, speculative comments should be removed.
3. src/lmnr/cli/__init__.py:156
  • Draft comment:
    The default value for '--output-format' in the datasets pull parser was removed. Confirm that inferring the format solely from the file extension (or user input) meets the expected behavior.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% This comment is asking the PR author to "confirm" something, which is explicitly called out in the rules as not useful ("Do NOT ask the PR author to confirm their intention, to explain, to double-check things, to ensure the behavior is intended"). The comment doesn't identify a specific bug or issue - it's just asking for confirmation that the change is intentional. The removal of the default value appears to be deliberate, and the help text already explains the new behavior. There's no strong evidence of a problem here, just a request for verification. Perhaps there's a legitimate concern that removing the default could break existing behavior or cause issues when no file extension is provided and no format is specified. Maybe the comment is trying to highlight a potential edge case that needs to be handled. Even if there's a potential edge case, the comment doesn't clearly articulate what the specific problem is. It just asks for confirmation. If there was a real issue (like "this will fail when output_path is None and no format is specified"), that would be actionable. But "confirm that X meets expected behavior" is exactly the type of comment the rules say to remove. This comment should be deleted because it asks the author to "confirm" their intention rather than identifying a specific, actionable issue. It violates the rule against asking authors to verify, confirm, or ensure their changes are correct.
4. src/lmnr/cli/datasets.py:20
  • Draft comment:
    The _dump_json function now accepts a 'do_indent' parameter. This is a useful enhancement for controlling formatting. Consider adding a comment in the docstring to clarify its usage.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50% This comment is suggesting documentation improvements for a private helper function. The parameter name do_indent is quite self-explanatory, and the default value of True makes the behavior clear. The comment doesn't point to a code issue but rather suggests adding documentation. According to the rules, comments should clearly require a code change and not be obvious or unimportant. Adding a docstring to a simple private helper function with a self-explanatory parameter seems like an obvious/minor suggestion. The function's purpose and the parameter's meaning are clear from the code itself. This feels like a "nice to have" rather than something that clearly requires a code change. The function is private and relatively simple, but adding documentation is generally a good practice. Perhaps for a public API this would be more important, but even for private functions, documentation can help maintainability. The comment is actionable and not technically wrong. While documentation is generally good, the rules specifically state not to make comments that are obvious or unimportant. The parameter name do_indent with a boolean type and default value of True is self-documenting. This is a private helper function, not a public API. The comment doesn't identify a bug or code quality issue that needs fixing - it's a minor documentation suggestion for something that's already clear. This comment should be deleted. It suggests adding documentation to a private helper function where the parameter name and type are already self-explanatory. This falls under "obvious or unimportant" comments that don't clearly require a code change.
5. src/lmnr/cli/datasets.py:72
  • Draft comment:
    Slicing the result with 'if limit is not None: return result[:limit]' ensures the returned list respects the limit, even if the loop overshoots. Double-check that this behavior is as intended.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 0% <= threshold 50% The comment is asking the PR author to double-check the behavior of slicing a list with a limit. This falls under the rule of not asking the author to confirm their intention or to double-check things. Therefore, this comment should be removed.
6. tests/cassettes/test_tracing/test_use_span_with_auto_instrumentation_langchain.yaml:3
  • Draft comment:
    Test cassette updated to use 'gpt-5-nano' and adjusted header values. Ensure these changes align with the new model specs and that the updated headers (e.g., content-length, traceparent) are correct.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 0% <= threshold 50% The comment is asking the PR author to ensure that the changes align with the new model specs and that the updated headers are correct. This falls under the category of asking the author to double-check things, which is against the rules. The comment does not provide a specific suggestion or point out a specific issue that needs to be addressed.
7. tests/test_tracing.py:267
  • Draft comment:
    The assertion for span count (assert len(spans) == 2) is commented out. Confirm if this is temporary or if the expected number of spans has changed; ideally, update the test to assert the correct count.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50% The comment says "Confirm if this is temporary or if the expected number of spans has changed; ideally, update the test to assert the correct count." This is explicitly asking the PR author to confirm their intention and explain the change. The comment also suggests uncommenting the line, but doesn't provide strong evidence that this is definitely wrong. The fact that the code still extracts two spans and asserts on them suggests the test might still expect 2 spans, but the comment is speculative ("if this is temporary") rather than definitively identifying an issue. This violates the rule about not asking the author to confirm their intention or explain things. The test code after the commented assertion still expects exactly 2 spans (it extracts test_span and openai_span and would fail with an IndexError if there weren't exactly 2 spans). This could be strong evidence that commenting out the assertion is a mistake, making the comment potentially valid. While the subsequent code does extract two specific spans, it uses list comprehension with filtering, which wouldn't necessarily fail if there were more or fewer spans - it would just fail when trying to access index [0] if no matching span exists. The comment is still phrased as asking for confirmation rather than stating definitively that this is wrong. The rules explicitly say not to ask the author to confirm their intention. This comment violates the rule against asking the PR author to confirm their intention or explain their changes. The comment is speculative ("Confirm if this is temporary") rather than definitively identifying a bug. While there's some evidence the assertion should remain, the comment doesn't provide strong enough evidence to keep it.
8. tests/test_tracing.py:259
  • Draft comment:
    The auto instrumentation test now calls the OpenAI client with model 'gpt-5-nano'. Ensure that all other related code and tests are updated accordingly to maintain consistency with the new model.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 0% <= threshold 50% The comment is asking the PR author to ensure that all related code and tests are updated for consistency with the new model. This falls under the rule of not asking the author to ensure things are tested or consistent, which is not allowed.
9. pyproject.toml:102
  • Draft comment:
    Typographical note: The dependency on "opentelemetry-instrumentation-langchain" now uses parentheses for the version specifiers (e.g., (>=0.47.1,<0.48.0)), which is inconsistent with the formatting of the other dependencies. Verify if this is the intended syntax, or if it should be formatted as "opentelemetry-instrumentation-langchain>=0.47.1,<0.48.0" instead.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% This comment violates the rule "Do NOT ask the PR author to confirm their intention, to explain, to double-check things, to ensure the behavior is intended". The comment explicitly says "Verify if this is the intended syntax". Both syntaxes (with and without parentheses) are valid in Python dependency specifications. Looking at the main dependencies section (lines 17-43), I can see parentheses ARE used with version specifiers (e.g., line 18: pydantic (>=2.0.3,<3.0.0)). So the author is actually being consistent with the existing style in the file. The comment is asking for verification rather than pointing out a definite issue, which violates the rules. Could the parentheses actually cause a parsing issue in this specific context (optional dependencies in pyproject.toml)? Maybe there's a subtle difference between the main dependencies section and optional dependencies section that I'm missing. The comment does provide a concrete suggestion, which could be valuable. Both syntaxes are valid in Python packaging. The main dependencies section uses parentheses (line 18), so this is actually consistent with existing style in the file. The comment asks to "verify" which is explicitly against the rules - we should not ask authors to confirm intentions. If there was a definite syntax error, it would be caught by the build system. This comment should be deleted. It asks the author to verify their intention, which violates the rules. Both syntaxes are valid, and the parentheses format is actually used elsewhere in the same file. Any syntax errors would be caught by the build.

Workflow ID: wflow_BeNXXsXMVKVlZbkR

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

Looks good to me! 👍

Reviewed ea76f54 in 59 seconds. Click for details.
  • Reviewed 58 lines of code in 3 files
  • Skipped 0 files when reviewing.
  • Skipped posting 4 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. pyproject.toml:71
  • Draft comment:
    Fixed langchain dependency syntax by removing parentheses; now correctly uses '>=' and '<' version specifiers.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50% None
2. src/lmnr/sdk/datasets/__init__.py:55
  • Draft comment:
    Enhanced logging in _fetch_batch to include both dataset name and id for better traceability.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50% None
3. src/lmnr/sdk/datasets/__init__.py:92
  • Draft comment:
    Updated push() logging to display the dataset identifier (id or name) in a clearer format.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50% None
4. src/lmnr/sdk/datasets/file_utils.py:55
  • Draft comment:
    Filters out empty lines when reading .jsonl files to avoid potential JSON parsing errors.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50% None

Workflow ID: wflow_hlyoH3QbFwJgscxB

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

Looks good to me! 👍

Reviewed 3c9cf7c in 1 minute and 49 seconds. Click for details.
  • Reviewed 51 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 5 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. src/lmnr/cli/__init__.py:78
  • Draft comment:
    Removed redundant call to setup_laminar_args in the 'list' subcommand; common flags are now set at the parent.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 0% <= threshold 50% This comment is purely informative, explaining a change that was made. It does not provide a suggestion, ask for confirmation, or point out a potential issue. According to the rules, purely informative comments should be removed.
2. src/lmnr/cli/__init__.py:121
  • Draft comment:
    Removed duplicate setup_laminar_args call in the 'push' subcommand; verify common options are still parsed from the parent.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 0% <= threshold 50% The comment is asking the PR author to verify that common options are still parsed from the parent after removing a duplicate function call. This falls under asking the author to double-check things, which is against the rules.
3. src/lmnr/cli/__init__.py:173
  • Draft comment:
    Removed redundant setup_laminar_args call in the 'pull' subcommand; ensure inherited flags work as intended.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 0% <= threshold 50% The comment is asking the PR author to ensure that inherited flags work as intended after removing a redundant call. This falls under asking the author to ensure behavior is intended, which is against the rules.
4. src/lmnr/cli/__init__.py:218
  • Draft comment:
    Removed redundant setup_laminar_args call in the 'create' subcommand; rely on common options from the parent parser.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 0% <= threshold 50% This comment is purely informative, describing a change made in the code without providing any actionable feedback or suggestions. It doesn't ask for confirmation or suggest improvements, which violates the rule against purely informative comments.
5. src/lmnr/cli/__init__.py:228
  • Draft comment:
    Centralized common laminar arguments by invoking setup_laminar_args on the parent 'datasets' parser.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 0% <= threshold 50% This comment is purely informative and does not provide any actionable feedback or suggestions for improvement. It simply describes what was done in the code without offering any critique or questions.

Workflow ID: wflow_GC6YMlfz0mTQbBzs

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

@dinmukhamedm dinmukhamedm merged commit 8d62a28 into main Nov 13, 2025
8 checks passed
@dinmukhamedm dinmukhamedm deleted the eval-dataset branch November 13, 2025 14:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants