Add TOML data loader by Copilot · Pull Request #123 · pickwicksoft/pystreamapi

Copilot · 2026-04-13T12:30:42Z

Run poetry lock to regenerate the lock file to match pyproject.toml
closes Data loader for TOML #73

Summary by Sourcery

Add support for loading TOML data alongside existing CSV and JSON loaders.

New Features:

Introduce a TOML data loader that lazily parses TOML from files or strings into namedtuple-based objects.

Enhancements:

Expose the TOML loader via the loaders package API and include it in the aggregate extras group.

Build:

Add tomlkit as a dependency for TOML loading and include it in test environment dependencies.

Tests:

Add comprehensive tests for the TOML loader covering file and string input, empty and invalid inputs, nested structures, and native type preservation.

Agent-Logs-Url: https://github.com/pickwicksoft/pystreamapi/sessions/1a3f86a6-3bc8-41dc-9d9f-8f20f6746e42 Co-authored-by: garlontas <70283087+garlontas@users.noreply.github.com>

Agent-Logs-Url: https://github.com/pickwicksoft/pystreamapi/sessions/201e043c-1007-4214-bfe2-55563c47281f Co-authored-by: garlontas <70283087+garlontas@users.noreply.github.com>

sonarqubecloud · 2026-04-13T13:16:39Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
100.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

sourcery-ai · 2026-04-13T13:24:58Z

Reviewer's Guide

Implements a new TOML data loader using tomlkit that lazily parses TOML content from files or strings into nested namedtuples, wires it into the loaders package and extras, adds tomlkit as a dependency, and introduces tests to validate behavior and edge cases.

Sequence diagram for TOML loader entrypoint and lazy loading

sequenceDiagram
    participant Client
    participant TomlLoader
    participant LoaderUtils
    participant FileSystem
    participant Tomlkit

    Client->>TomlLoader: toml(src, read_from_src)
    alt read_from_src is true
        TomlLoader->>TomlLoader: __lazy_load_toml_string(src)
        TomlLoader->>Tomlkit: loads(toml_string)
        Tomlkit-->>TomlLoader: dict_data
        TomlLoader->>TomlLoader: __dict_to_namedtuple(dict_data)
        TomlLoader-->>Client: Iterator[namedtuple]
    else read_from_src is false
        TomlLoader->>LoaderUtils: validate_path(src)
        LoaderUtils-->>TomlLoader: file_path
        TomlLoader->>FileSystem: open(file_path, read)
        FileSystem-->>TomlLoader: toml_string
        TomlLoader->>Tomlkit: loads(toml_string)
        Tomlkit-->>TomlLoader: dict_data
        TomlLoader->>TomlLoader: __dict_to_namedtuple(dict_data)
        TomlLoader-->>Client: Iterator[namedtuple]
    end

File-Level Changes

Change	Details	Files
Add a TOML loader that lazily parses TOML from files or strings into nested namedtuples using tomlkit.	Introduce a public toml(src, read_from_src=False) loader entry point mirroring other loaders, delegating to separate lazy file and string loader helpers. Implement __lazy_load_toml_file and __lazy_load_toml_string generators that read/parse TOML only when iterated and yield zero or one item depending on content presence. Add __dict_to_namedtuple to recursively convert TOML-parsed dicts/lists into nested namedtuples while preserving native scalar types and supporting arrays of tables.	`pystreamapi/loaders/__toml/__toml_loader.py`
Expose the new TOML loader through the loaders package and declare its dependency in project configuration and test environment.	Import the toml loader in the loaders package init and add it to all so it is available as pystreamapi.loaders.toml. Add a toml_loader extra depending on tomlkit and include tomlkit in the all extras group. Include tomlkit in tox test dependencies and regenerate poetry.lock to capture the new dependency.	`pystreamapi/loaders/__init__.py` `pyproject.toml` `tox.ini` `poetry.lock`
Add test coverage for TOML loader behavior, including normal, edge, and error scenarios.	Test loading TOML both from file paths and from in-memory strings, verifying laziness, iterability, and correct typed field extraction. Verify behavior with empty content, invalid file paths, directory paths (ValueError), and malformed TOML (tomlkit.exceptions.ParseError). Add tests for nested tables, arrays of tables with heterogeneous fields, and preservation of native scalar types for ints, floats, and bools.	`tests/_loaders/test_toml_loader.py`

Assessment against linked issues

Issue	Objective	Addressed	Explanation
#73	Implement a TOML data loader that lazily reads from a file path or from a TOML string and yields namedtuples with efficient, native type conversion.	✅
#73	Ensure the TOML loader correctly handles non-consistent data in arrays of tables (like the employees example) by yielding an iterable where each entry is a separate namedtuple that may have different fields.	❌	The implemented loader parses the entire TOML document once and yields a single top-level namedtuple, within which arrays-of-tables (e.g., employees.employee) are represented as lists of namedtuples. The issue specifies that a LazyFileIterable containing three differently structured namedtuples should be created for the example, implying one yielded item per employee. The PR’s behavior (and tests) instead yield a single item containing a list of employee namedtuples, so iteration is over documents, not over the individual employee entries as requested.
#73	Integrate the TOML loader into the public loaders API and project configuration, including extras and test/dependency setup.	✅

Possibly linked issues

Data loader for TOML #73: The PR adds the TOML loader, using tomlkit and namedtuples, matching the issue’s requirements including inconsistent data handling.

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

sourcery-ai

Hey - I've found 2 issues, and left some high level feedback:

The __dict_to_namedtuple conversion assumes all TOML keys are valid Python identifiers, which will break for quoted/space-containing keys; consider normalizing or falling back to a dict for such cases to avoid runtime ValueError from namedtuple.
The TOML loader always yields a single top-level namedtuple (with arrays-of-tables nested inside), which may differ from how other loaders stream multiple records; it may be worth verifying and documenting that this behavior is intentionally aligned with the existing loader APIs.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- The `__dict_to_namedtuple` conversion assumes all TOML keys are valid Python identifiers, which will break for quoted/space-containing keys; consider normalizing or falling back to a dict for such cases to avoid runtime `ValueError` from `namedtuple`.
- The TOML loader always yields a single top-level namedtuple (with arrays-of-tables nested inside), which may differ from how other loaders stream multiple records; it may be worth verifying and documenting that this behavior is intentionally aligned with the existing loader APIs.

## Individual Comments

### Comment 1
<location path="pystreamapi/loaders/__toml/__toml_loader.py" line_range="55-59" />
<code_context>
+    return generator()
+
+
+def __dict_to_namedtuple(data, name='Item'):
+    """Recursively convert a dictionary (or list) to namedtuples."""
+    if isinstance(data, dict):
+        fields = list(data.keys())
+        Item = namedtuple(name, fields)
+        return Item(**{k: __dict_to_namedtuple(v, k) for k, v in data.items()})
+    if isinstance(data, list):
</code_context>
<issue_to_address>
**issue:** Raw TOML keys and the `name` argument may not be valid Python identifiers, which can cause `namedtuple` construction to fail.

Because TOML keys and table names can include `-`, spaces, or start with digits, both the generated field names and the `name` you pass to `namedtuple` can be invalid identifiers, causing a `ValueError` at runtime.

To handle this safely:
- Call `namedtuple(name, fields, rename=True)` to automatically fix invalid/duplicate field names.
- Sanitize `name` when it comes from a TOML key (e.g., replace non-identifier characters, or prefix if it starts with a digit).

This will let the loader handle a wider range of valid TOML inputs without failing.
</issue_to_address>

### Comment 2
<location path="tests/_loaders/test_toml_loader.py" line_range="58-67" />
<code_context>
+        with self.mock_file(""):
+            self.assertEqual(list(toml(file_path)), [])
+
+    def test_toml_loader_from_empty_string(self):
+        self.assertEqual(list(toml("", read_from_src=True)), [])
+
+    def test_toml_loader_with_invalid_path(self):
</code_context>
<issue_to_address>
**suggestion (testing):** Consider adding a test for whitespace-only TOML input to mirror the implementation’s `strip()` check

Since empty content is detected via `.strip()`, whitespace-only input (e.g. `"   \n"`) should be treated as empty as well. Please add a test like `self.assertEqual(list(toml("   \n", read_from_src=True)), [])` to capture this behavior and prevent regressions in whitespace handling.

```suggestion
    def test_toml_loader_with_empty_file(self):
        with self.mock_file(""):
            self.assertEqual(list(toml(file_path)), [])

    def test_toml_loader_from_empty_string(self):
        self.assertEqual(list(toml("", read_from_src=True)), [])

    def test_toml_loader_from_whitespace_string(self):
        self.assertEqual(list(toml("   \n", read_from_src=True)), [])

    def test_toml_loader_with_invalid_path(self):
        with self.assertRaises(FileNotFoundError):
            toml('path/to/invalid.toml')
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai · 2026-04-13T13:24:59Z

pystreamapi/loaders/__toml/__toml_loader.py

+def __dict_to_namedtuple(data, name='Item'):
+    """Recursively convert a dictionary (or list) to namedtuples."""
+    if isinstance(data, dict):
+        fields = list(data.keys())
+        Item = namedtuple(name, fields)


issue: Raw TOML keys and the name argument may not be valid Python identifiers, which can cause namedtuple construction to fail.

Because TOML keys and table names can include -, spaces, or start with digits, both the generated field names and the name you pass to namedtuple can be invalid identifiers, causing a ValueError at runtime.

To handle this safely:

Call namedtuple(name, fields, rename=True) to automatically fix invalid/duplicate field names.

Sanitize name when it comes from a TOML key (e.g., replace non-identifier characters, or prefix if it starts with a digit).

This will let the loader handle a wider range of valid TOML inputs without failing.

sourcery-ai · 2026-04-13T13:24:59Z

tests/_loaders/test_toml_loader.py

+    def test_toml_loader_with_empty_file(self):
+        with self.mock_file(""):
+            self.assertEqual(list(toml(file_path)), [])
+
+    def test_toml_loader_from_empty_string(self):
+        self.assertEqual(list(toml("", read_from_src=True)), [])
+
+    def test_toml_loader_with_invalid_path(self):
+        with self.assertRaises(FileNotFoundError):
+            toml('path/to/invalid.toml')


suggestion (testing): Consider adding a test for whitespace-only TOML input to mirror the implementation’s strip() check

Since empty content is detected via .strip(), whitespace-only input (e.g. " \n") should be treated as empty as well. Please add a test like self.assertEqual(list(toml(" \n", read_from_src=True)), []) to capture this behavior and prevent regressions in whitespace handling.

Suggested change

def test_toml_loader_with_empty_file(self):

with self.mock_file(""):

self.assertEqual(list(toml(file_path)), [])

def test_toml_loader_from_empty_string(self):

self.assertEqual(list(toml("", read_from_src=True)), [])

def test_toml_loader_with_invalid_path(self):

with self.assertRaises(FileNotFoundError):

toml('path/to/invalid.toml')

def test_toml_loader_with_empty_file(self):

with self.mock_file(""):

self.assertEqual(list(toml(file_path)), [])

def test_toml_loader_from_empty_string(self):

self.assertEqual(list(toml("", read_from_src=True)), [])

def test_toml_loader_from_whitespace_string(self):

self.assertEqual(list(toml(" \n", read_from_src=True)), [])

def test_toml_loader_with_invalid_path(self):

with self.assertRaises(FileNotFoundError):

toml('path/to/invalid.toml')

Initial plan

42227ca

Copilot AI assigned Copilot and garlontas Apr 13, 2026

Copilot started work on behalf of garlontas April 13, 2026 12:30 View session

Copilot AI linked an issue Apr 13, 2026 that may be closed by this pull request

Data loader for TOML #73

Closed

Copilot AI and others added 2 commits April 13, 2026 12:38

Add TOML data loader with full-coverage tests

6c3cca4

Agent-Logs-Url: https://github.com/pickwicksoft/pystreamapi/sessions/1a3f86a6-3bc8-41dc-9d9f-8f20f6746e42 Co-authored-by: garlontas <70283087+garlontas@users.noreply.github.com>

Rename parameter d to data in __dict_to_namedtuple

1d61f1e

Agent-Logs-Url: https://github.com/pickwicksoft/pystreamapi/sessions/1a3f86a6-3bc8-41dc-9d9f-8f20f6746e42 Co-authored-by: garlontas <70283087+garlontas@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Add data loader for TOML files~~ Add TOML data loader Apr 13, 2026

Copilot AI requested a review from garlontas April 13, 2026 12:40

Copilot finished work on behalf of garlontas April 13, 2026 12:40

garlontas approved these changes Apr 13, 2026

View reviewed changes

garlontas added 3 commits April 13, 2026 15:03

Add toml_loader to poetry extras and update all

9479471

Reorder import statements in __toml_loader.py

eef1ffd

Add tomlkit to dependencies

b2cc6f0

Copilot started work on behalf of garlontas April 13, 2026 13:12 View session

Regenerate poetry.lock to match updated pyproject.toml

b74f77c

Agent-Logs-Url: https://github.com/pickwicksoft/pystreamapi/sessions/201e043c-1007-4214-bfe2-55563c47281f Co-authored-by: garlontas <70283087+garlontas@users.noreply.github.com>

Copilot finished work on behalf of garlontas April 13, 2026 13:16

Copilot AI requested a review from garlontas April 13, 2026 13:16

garlontas marked this pull request as ready for review April 13, 2026 13:23

garlontas merged commit 393abc7 into main Apr 13, 2026
10 checks passed

garlontas deleted the copilot/add-data-loader-for-toml branch April 13, 2026 13:23

sourcery-ai bot reviewed Apr 13, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add TOML data loader#123

Add TOML data loader#123
garlontas merged 7 commits intomainfrom
copilot/add-data-loader-for-toml

Copilot AI commented Apr 13, 2026 •

edited by sourcery-ai bot

Loading

Uh oh!

sonarqubecloud bot commented Apr 13, 2026

Uh oh!

Uh oh!

sourcery-ai bot commented Apr 13, 2026

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai bot left a comment

Uh oh!

sourcery-ai bot Apr 13, 2026

Uh oh!

sourcery-ai bot Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Copilot AI commented Apr 13, 2026 • edited by sourcery-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by Sourcery

Uh oh!

sonarqubecloud bot commented Apr 13, 2026

Quality Gate passed

Uh oh!

Uh oh!

sourcery-ai bot commented Apr 13, 2026

Reviewer's Guide

Sequence diagram for TOML loader entrypoint and lazy loading

File-Level Changes

Assessment against linked issues

Possibly linked issues

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Apr 13, 2026 •

edited by sourcery-ai bot

Loading