Skip to content

create_node|edge_attr_index for SQLGraph#223

Merged
JoOkuma merged 79 commits intoroyerlab:mainfrom
yfukai:sql_indexing
Jan 12, 2026
Merged

create_node|edge_attr_index for SQLGraph#223
JoOkuma merged 79 commits intoroyerlab:mainfrom
yfukai:sql_indexing

Conversation

@yfukai
Copy link
Copy Markdown
Contributor

@yfukai yfukai commented Dec 10, 2025

This pull request introduces a new feature for the SQLGraph backend: the ability to create explicit database indexes on node and edge attributes to improve query performance, especially for frequently filtered attributes. The documentation and tests have been updated to reflect and validate this functionality.

SQLGraph indexing improvements:

  • Added methods ensure_node_attr_index and ensure_edge_attr_index to SQLGraph for creating indexes on node and edge attribute columns, including support for composite and unique indexes. (src/tracksdata/graph/_sql_graph.py)
  • Updated documentation to describe the new index feature and provide usage examples for creating indexes on attributes. (docs/concepts.md)
  • Updated the project README to mention SQLGraph's ability to index frequently queried attributes for faster filtering. (README.md)

Testing and validation:

  • Added tests to ensure index creation works as expected, including checks for composite and unique indexes, and error handling for missing columns. (src/tracksdata/graph/_test/test_graph_backends.py)
  • Added sqlalchemy import to support index inspection in tests. (src/tracksdata/graph/_test/test_graph_backends.py)

JoOkuma and others added 30 commits November 5, 2025 10:56
Co-authored-by: Jordão Bragantini <jordao.bragantini@gmail.com>
Co-authored-by: Jordão Bragantini <jordao.bragantini@gmail.com>
Co-authored-by: Jordão Bragantini <jordao.bragantini@gmail.com>
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Dec 10, 2025

Codecov Report

❌ Patch coverage is 83.33333% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 88.50%. Comparing base (c5af9f3) to head (72000bd).
⚠️ Report is 7 commits behind head on main.

Files with missing lines Patch % Lines
src/tracksdata/graph/_sql_graph.py 83.33% 3 Missing and 3 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #223      +/-   ##
==========================================
+ Coverage   88.45%   88.50%   +0.04%     
==========================================
  Files          55       55              
  Lines        3890     3993     +103     
  Branches      674      700      +26     
==========================================
+ Hits         3441     3534      +93     
- Misses        267      275       +8     
- Partials      182      184       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@yfukai
Copy link
Copy Markdown
Contributor Author

yfukai commented Dec 10, 2025

Maybe we need drop functions

Copy link
Copy Markdown
Member

@JoOkuma JoOkuma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yfukai, awesome PR, I hadn't thought of adding this.

Comment thread docs/concepts.md Outdated
Comment on lines +25 to +26
SQLGraph lets you create indexes on node or edge attributes to keep repeated
filters fast:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yfukai this is awesome. Could you briefly mention what kind of speed-up we can expect with this? 2x, 10x?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I benchmarked the performance and added the result to the doc!

Comment thread src/tracksdata/graph/_sql_graph.py Outdated
Copy link
Copy Markdown
Contributor Author

@yfukai yfukai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Benchmarked the performance improvement by indexing. Code:

import tracksdata as td
import tempfile
import time

if __name__ == "__main__":
    for node_count in [1_000_000, 100_000_000]:
        print(f"\nBenchmarking SQLGraph with {node_count} nodes")
        graph_db_file = tempfile.NamedTemporaryFile(suffix=".db", delete=False).name
        graph: td.graph.SQLGraph = td.graph.SQLGraph(
            drivername="sqlite",
            database=graph_db_file,
            overwrite=True,
        )
        graph.add_node_attr_key("attr1", 0)
        graph.bulk_add_nodes([{td.DEFAULT_ATTR_KEYS.T: i, "attr1": i % 100} for i in range(node_count)])
        print("Finished adding nodes.")
        # measure time to filter nodes by attr1
        start_time = time.time()
        filtered_graph = graph.filter(td.NodeAttr("attr1") == 0).subgraph()
        end_time = time.time()
        time_without_index = end_time - start_time
        print(f"Time to filter nodes without index: {time_without_index:.2f} seconds")
        graph.ensure_node_attr_index("attr1")
        start_time = time.time()
        filtered_graph = graph.filter(td.NodeAttr("attr1") == 0).subgraph()
        end_time = time.time()
        time_with_index = end_time - start_time
        print(f"Time to filter nodes with index: {time_with_index:.2f} seconds")
        print(f"Speedup factor: {time_without_index / time_with_index:.2f}x")

@JoOkuma
Copy link
Copy Markdown
Member

JoOkuma commented Jan 9, 2026

@yfukai, that's an amazing speedup!
One last comment: do you think we could replace ensure with set in the method's name for clarity?
I can do the change if you agree and are busy.
The docs already make sure that nothing will happen if they are already used for indexing.

@yfukai
Copy link
Copy Markdown
Contributor Author

yfukai commented Jan 10, 2026

Sure! Can we use "create_{node|edge}_attr_index" then? This agrees with actual SQL statement.

@JoOkuma JoOkuma changed the title ensure_node|edge_attr_index for SQLGraph create_node|edge_attr_index for SQLGraph Jan 12, 2026
@JoOkuma
Copy link
Copy Markdown
Member

JoOkuma commented Jan 12, 2026

@yfukai, that's even better.
Merged, thanks for this PR.

@JoOkuma JoOkuma merged commit e4e820f into royerlab:main Jan 12, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants