Skip to content

Conversation

@moonmile
Copy link
Contributor

@moonmile moonmile commented Dec 20, 2025

フローチャートの矢印がうまく追えていなかったので、Shape の構造を修正しました。

  • name: str | null // Excel shape Name プロパティ
  • begin_connected_shape: str | null // コネクタ始点の接続先 Shape.Name
  • end_connected_shape: str | null // コネクタ終点の接続先 Shape.Name

を追加しています。

image image image

こんな感じに、フローチャートの矢印が mermaid 上でも繋がるようになります。

Summary by CodeRabbit

  • New Features

    • Shapes now include a nullable sequential id and expose begin_id/end_id to link connector endpoints by id.
  • Documentation

    • Added sample diagram and updated data model and changelog entries describing id-based connector references.
  • Tests

    • Expanded tests to validate shape ids, connector begin_id/end_id wiring, and updated fixtures to include ids.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 20, 2025

Walkthrough

Adds per-sheet sequential shape identifiers and connector endpoint references (id, begin_id, end_id); extracts connector-linked shape names during parsing, maps names to ids, and emits begin_id/end_id on Shape objects. Schemas, docs, samples, and tests updated to reflect id-based connector wiring and version bumps.

Changes

Cohort / File(s) Summary
Documentation & Changelog
docs/agents/DATA_MODEL.md
Bumped versions (0.9 → 0.10 → 0.11) and documented new Shape fields id, begin_id, end_id; explained connector endpoint semantics and nullability.
Schemas
schemas/shape.json, schemas/sheet.json, schemas/workbook.json
Added nullable integer properties id, begin_id, end_id to Shape schemas (default null) with titles/descriptions for connector mapping.
Core implementation
src/exstruct/core/shapes.py
Collects excel shape names, assigns sequential per-sheet id for non-relationship shapes, captures connector Begin/End names, accumulates pending connections, then resolves and writes begin_id/end_id using name→id map.
Public models
src/exstruct/models/__init__.py
Exposed new Shape attributes: id, begin_id, end_id on the public model.
Samples & Documentation example
sample/sample-shape-connector.json, sample/sample-shape-connector.md
Added sample workbook JSON with shapes/connectors and a Mermaid diagram markdown describing nodes and directed connections.
Tests
tests/com/test_shapes_extraction.py, tests/test_engine.py, tests/test_export_requirements.py, tests/test_models_validation.py, tests/test_print_area_views.py
Updated tests and fixtures to construct Shapes with id; added test coverage for connector wiring (begin_id/end_id), uniqueness/existence assertions for ids, and resilient behavior when ConnectorFormat properties are absent.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Review mapping/resolution logic in src/exstruct/core/shapes.py for edge cases (duplicate names, missing names, ordering of assignment).
  • Verify consistency between model fields (src/exstruct/models/__init__.py) and all updated JSON schema files.
  • Check tests that assert id uniqueness and connector wiring for flakiness on platforms/Excel variations.

Possibly related issues

Poem

🐰 I hopped through sheets and gave each node a name,

I counted every shape and numbered them the same,
Connectors now point to ids, not just to text,
I stitched begin to end so diagrams link next,
Hop, skip, a tiny dance — the graph is less perplexed. 🥕

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 9.09% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title accurately describes the main change: adding connector source/destination fields to Shape model to enable flow tracking in flowcharts.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
src/exstruct/core/shapes.py (1)

221-242: Good defensive error handling for connector extraction.

The nested try/except structure correctly ensures:

  1. Connector shapes that don't have ConnectorFormat are gracefully skipped
  2. Partial extraction succeeds if only one end is connected

One minor readability nit: the variable name on lines 227 and 235 shadows the outer shape_name. Consider renaming to begin_name / end_name to avoid potential confusion.

🔎 Optional: Rename inner variables to avoid shadowing
                             try:
                                 begin_shape = connector.BeginConnectedShape
                                 if begin_shape is not None:
-                                    name = getattr(begin_shape, "Name", None)
-                                    if isinstance(name, str):
-                                        shape_obj.begin_connected_shape = name
+                                    begin_name = getattr(begin_shape, "Name", None)
+                                    if isinstance(begin_name, str):
+                                        shape_obj.begin_connected_shape = begin_name
                             except Exception:
                                 pass
                             try:
                                 end_shape = connector.EndConnectedShape
                                 if end_shape is not None:
-                                    name = getattr(end_shape, "Name", None)
-                                    if isinstance(name, str):
-                                        shape_obj.end_connected_shape = name
+                                    end_name = getattr(end_shape, "Name", None)
+                                    if isinstance(end_name, str):
+                                        shape_obj.end_connected_shape = end_name
                             except Exception:
                                 pass
tests/com/test_shapes_extraction.py (2)

48-54: Consider narrowing the exception handler scope.

The broad except Exception catch might mask unexpected errors during connector wiring. If there are known specific exceptions that can occur in some environments (e.g., COMError), consider catching those explicitly instead.

🔎 Example of more targeted exception handling
        try:
            connector.ConnectorFormat.BeginConnect(src_shape, 1)
            connector.ConnectorFormat.EndConnect(dst_shape, 1)
-       except Exception:
+       except (AttributeError, RuntimeError) as e:
            # In some environments connector wiring may fail; tests will
            # simply not find connected shapes in that case.
+           # Known issue: {type(e).__name__}
            pass

Note: Adjust exception types based on actual failures observed in different environments.


82-94: Consider extracting the filtering predicate for clarity.

The nested boolean logic spanning 11 lines makes the assertion harder to parse. Extracting it into a named helper function or variable would improve readability.

🔎 Example refactoring for improved readability
+   def is_non_relationship_autoshape(s):
+       has_no_text = s.text == "" or s.text is None
+       is_autoshape = (s.type or "").startswith("AutoShape")
+       has_relationship = (
+           s.direction
+           or s.begin_arrow_style is not None
+           or s.end_arrow_style is not None
+           or s.begin_connected_shape is not None
+           or s.end_connected_shape is not None
+       )
+       return has_no_text and is_autoshape and not has_relationship
+
    # Standard mode should not emit non-relationship AutoShapes without text.
-   assert not any(
-       (s.text == "" or s.text is None)
-       and (s.type or "").startswith("AutoShape")
-       and not (
-           s.direction
-           or s.begin_arrow_style is not None
-           or s.end_arrow_style is not None
-           or s.begin_connected_shape is not None
-           or s.end_connected_shape is not None
-       )
-       for s in shapes
-   )
+   assert not any(is_non_relationship_autoshape(s) for s in shapes)
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e0f1fe9 and 6b8e594.

⛔ Files ignored due to path filters (1)
  • sample/sample-shape-connector.xlsx is excluded by !**/*.xlsx
📒 Files selected for processing (7)
  • docs/agents/DATA_MODEL.md (4 hunks)
  • sample/sample-shape-connector.json (1 hunks)
  • sample/sample-shape-connector.md (1 hunks)
  • schemas/shape.json (2 hunks)
  • src/exstruct/core/shapes.py (2 hunks)
  • src/exstruct/models/__init__.py (2 hunks)
  • tests/com/test_shapes_extraction.py (3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
tests/com/test_shapes_extraction.py (1)
src/exstruct/core/integrate.py (1)
  • extract_workbook (290-388)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Codacy Static Code Analysis
🔇 Additional comments (9)
sample/sample-shape-connector.md (1)

1-45: LGTM! Good sample documentation for connector flowcharts.

The Mermaid diagram clearly demonstrates the new connector functionality with proper node definitions and flow connections. The loop section effectively shows cyclic relationships.

Minor: Line 45 has trailing whitespace after the closing code fence.

src/exstruct/models/__init__.py (1)

15-17: LGTM! Well-structured model extensions.

The new optional fields are properly defined with:

  • Correct str | None type annotations
  • Appropriate default=None values
  • Clear descriptions referencing the Excel ConnectorFormat API

Field placement is logical, keeping connector-related fields together.

Also applies to: 32-43

sample/sample-shape-connector.json (1)

1-313: LGTM! Comprehensive sample data for connector shapes.

The sample effectively demonstrates:

  • Shape names following Excel conventions
  • Connector relationships via begin_connected_shape / end_connected_shape
  • Various connector types (straight arrows, curved connectors)
  • Direction and arrow style metadata

The data aligns well with the Mermaid diagram in sample-shape-connector.md.

src/exstruct/core/shapes.py (1)

173-173: LGTM!

The isinstance(shape_name, str) check properly handles cases where getattr returns None or non-string values.

docs/agents/DATA_MODEL.md (1)

3-3: LGTM! Documentation properly updated.

The data model documentation is correctly updated with:

  • Version bump to 0.9
  • All three new Shape fields documented with proper types
  • Clear explanatory notes linking to Excel ConnectorFormat API
  • Appropriate changelog entry

The documentation maintains consistency with the code implementation.

Also applies to: 21-21, 30-31, 39-40, 194-194

tests/com/test_shapes_extraction.py (2)

76-76: LGTM!

The name field assertions correctly verify that Excel shape names are being captured.

Also applies to: 81-81


113-138: Well-structured connector validation test.

The test properly handles environment variability with pytest.skip and validates that connector metadata is correctly extracted and references valid shape names.

schemas/shape.json (2)

5-30: LGTM! Connector endpoint properties are well-defined.

The schema correctly defines begin_connected_shape and end_connected_shape as optional string fields with appropriate descriptions referencing the Excel COM properties.


121-133: LGTM! Shape name property is correctly defined.

The schema properly adds the name field as an optional string property with a clear description.

```jsonc
Shape {
text: str
name: str | null // Excel shape Name プロパティ
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

id: int にしてコンテキスト量を削減できそうです。

rotation: float | null
begin_arrow_style: int | null
end_arrow_style: int | null
begin_connected_shape: str | null // コネクタ始点の接続先 Shape.Name
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LLMが推論できそうならbegin_idくらい削りたいです!

Copy link
Owner

@harumiWeb harumiWeb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imo
PRありがとうございます!
矢印の前後関係を明示的に取得する方向性自体は、フローチャート認識精度の観点で非常に良いと思います。
実際にLLMに推論させた結果までて提示いただいて非常に助かります!

一方で、本ライブラリは RAG / LLM 投入前提の構造化データ生成を目的としているため、
出力コンテキスト量(トークン数)を強く意識しています。

現状の実装では、

  • 矢印オブジェクト側
  • 接続先図形側

の双方に name を持たせており、情報が重複しているように見えます。

以下のいずれかの形に整理できないか、ご検討いただけますでしょうか。

  • シートごとに図形に一意の整数IDを振り、name は冗長に持たない
  • もしくは name はオプション扱い(フラグ指定時のみ出力)

「フローチャート構造の推定に最低限必要な情報は何か」
という観点で、少し情報量を削減した形に調整いただけると助かります。

@moonmile
Copy link
Contributor Author

  • シートごとに図形に一意の整数IDを振り、name は冗長に持たない
  • もしくは name はオプション扱い(フラグ指定時のみ出力)

「フローチャート構造の推定に最低限必要な情報は何か」 という観点で、少し情報量を削減した形に調整いただけると助かります。

なるほど、了解です。
現状 max で設定を入れてしまったので、必要な分だけに絞ってみます。

  • name: string を id: int に変更
  • begin_connected_shape, end_connected_shape をbegin_id, end_id
  • "Arrow" っぽいものに id が不要

あたりで、修正してみます。

@moonmile
Copy link
Contributor Author

ひとまず、

  • name: string を id: int に変更
  • begin_connected_shape, end_connected_shape をbegin_id, end_id
  • AutoShape-Mixed の矢印 Shape では id 付与しない。

のバージョンです。

image

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (2)
tests/com/test_shapes_extraction.py (1)

85-96: Consider simplifying this complex assertion.

The nested condition with multiple or clauses is hard to parse. Consider extracting to a helper function for readability.

🔎 Suggested refactor
def _is_connector_like(s: Shape) -> bool:
    """Return True if shape has connector-related properties."""
    return bool(
        s.direction
        or s.begin_arrow_style is not None
        or s.end_arrow_style is not None
        or s.begin_id is not None
        or s.end_id is not None
    )

# Then use:
assert not any(
    (s.text == "" or s.text is None)
    and (s.type or "").startswith("AutoShape")
    and not _is_connector_like(s)
    for s in shapes
)
src/exstruct/core/shapes.py (1)

174-186: DRY violation: is_relationship_geom logic duplicates _should_include_shape.

The relationship detection logic here mirrors lines 91-103 in _should_include_shape. Consider extracting this into a shared helper to avoid divergence.

🔎 Proposed refactor

Extract a helper function:

def _is_relationship_shape(
    *,
    shape_type_num: int | None,
    shape_type_str: str | None,
    autoshape_type_str: str | None,
    shape_name: str | None,
) -> bool:
    """Return True if the shape represents a line/arrow/connector."""
    if shape_type_num in (3, 9):
        return True
    if autoshape_type_str and (
        "Arrow" in autoshape_type_str or "Connector" in autoshape_type_str
    ):
        return True
    if shape_type_str and (
        "Connector" in shape_type_str or shape_type_str in ("Line", "ConnectLine")
    ):
        return True
    if shape_name and ("Connector" in shape_name or "Line" in shape_name):
        return True
    return False

Then use it in both _should_include_shape and get_shapes_with_position.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6b8e594 and 646d2b1.

📒 Files selected for processing (13)
  • docs/agents/DATA_MODEL.md (4 hunks)
  • sample/sample-shape-connector.json (1 hunks)
  • sample/sample-shape-connector.md (1 hunks)
  • schemas/shape.json (1 hunks)
  • schemas/sheet.json (3 hunks)
  • schemas/workbook.json (3 hunks)
  • src/exstruct/core/shapes.py (5 hunks)
  • src/exstruct/models/__init__.py (2 hunks)
  • tests/com/test_shapes_extraction.py (3 hunks)
  • tests/test_engine.py (1 hunks)
  • tests/test_export_requirements.py (1 hunks)
  • tests/test_models_validation.py (2 hunks)
  • tests/test_print_area_views.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • sample/sample-shape-connector.json
  • sample/sample-shape-connector.md
🧰 Additional context used
🧬 Code graph analysis (6)
tests/com/test_shapes_extraction.py (2)
tests/com/test_charts_extraction.py (1)
  • _ensure_excel (11-16)
src/exstruct/core/integrate.py (1)
  • extract_workbook (290-388)
tests/test_print_area_views.py (1)
src/exstruct/models/__init__.py (1)
  • Shape (11-46)
tests/test_engine.py (1)
src/exstruct/models/__init__.py (1)
  • Shape (11-46)
tests/test_models_validation.py (1)
src/exstruct/models/__init__.py (1)
  • Shape (11-46)
tests/test_export_requirements.py (1)
src/exstruct/models/__init__.py (1)
  • Shape (11-46)
src/exstruct/core/shapes.py (1)
src/exstruct/models/__init__.py (1)
  • Shape (11-46)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Codacy Static Code Analysis
🔇 Additional comments (14)
schemas/sheet.json (1)

251-263: Schema additions for shape identification and connector wiring look good.

The new id, begin_id, and end_id fields are properly defined as nullable integers with appropriate defaults and descriptions. The schema structure is consistent with the Python model definitions in src/exstruct/models/__init__.py.

Also applies to: 277-289, 313-325

tests/test_export_requirements.py (1)

65-65: LGTM!

The test correctly exercises Shape construction with the new id field, ensuring JSON roundtrip serialization works with the updated model.

tests/test_print_area_views.py (1)

9-10: LGTM!

The test correctly assigns distinct id values to the two shapes, ensuring proper testing of shape identification within print area views.

tests/test_models_validation.py (1)

15-15: LGTM!

The test updates correctly include the id field in Shape construction while continuing to validate model defaults and literal field validation.

Also applies to: 51-51

tests/test_engine.py (1)

39-39: LGTM!

The _sample_workbook helper correctly includes the id field, ensuring all dependent tests exercise the updated Shape model.

src/exstruct/models/__init__.py (1)

14-16: LGTM!

The new id, begin_id, and end_id fields are well-defined with appropriate types, defaults, and descriptions. The field ordering logically groups connector-related fields together.

Also applies to: 32-43

schemas/workbook.json (1)

251-263: LGTM!

The Shape definition in the workbook schema is consistent with schemas/sheet.json and the Python model, ensuring schema coherence across the project.

Also applies to: 277-289, 313-325

tests/com/test_shapes_extraction.py (3)

39-54: Good defensive coding for connector setup.

The try/except block appropriately handles environments where Excel cannot wire connectors, allowing tests to gracefully degrade rather than fail unexpectedly.


76-83: LGTM!

The assertions properly validate that:

  1. Shape ids are positive integers
  2. All shape ids within a sheet are unique

115-140: Thorough connector validation test.

The test properly:

  • Skips gracefully when Excel cannot populate ConnectorFormat properties
  • Validates that begin_id and end_id are distinct
  • Confirms referenced ids correspond to actual emitted shapes
docs/agents/DATA_MODEL.md (1)

16-41: Shape model documentation is clear and well-structured.

The documentation accurately reflects the new id, begin_id, and end_id fields with appropriate explanations of their purpose and nullability semantics. The notes about connectors referencing shape IDs via ConnectorFormat.BeginConnectedShape / EndConnectedShape provide helpful context.

src/exstruct/core/shapes.py (2)

232-253: Nested exception handling for COM API is appropriate.

The per-property try/except blocks for BeginConnectedShape and EndConnectedShape are well-structured. COM APIs can throw on unconnected connectors, so isolating each access prevents one failure from masking the other.


118-120: Deferred wiring approach is sound.

Using pending_connections to collect shapes first, then resolving begin_id/end_id after building the complete name_to_id map is a clean two-pass solution that handles forward references correctly.

Also applies to: 265-273

schemas/shape.json (1)

5-17: Schema definitions for new fields are correct.

The id field schema correctly uses anyOf with integer/null types, matching the Pydantic model definition in src/exstruct/models/__init__.py. The description aligns with the documented behavior.

# ExStruct データモデル仕様

**Version**: 0.8
**Version**: 0.10
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Version header inconsistent with changelog.

The document header states Version 0.10, but the changelog documents changes through 0.11 (connector field rename to begin_id / end_id). Consider updating the version to 0.11 for consistency.

🔎 Proposed fix
-**Version**: 0.10  
+**Version**: 0.11  

Also applies to: 194-196

🤖 Prompt for AI Agents
In docs/agents/DATA_MODEL.md around lines 3 and also lines 194-196, the Version
header and corresponding mention still show 0.10 while the changelog documents
changes up to 0.11; update the header and any version references at those
locations to 0.11 so the document version matches the changelog (verify there
are no other scattered 0.10 occurrences and update them to 0.11 as needed).

Comment on lines +18 to +43
"begin_id": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": null,
"description": "Shape id at the start of a connector (ConnectorFormat.BeginConnectedShape).",
"title": "Begin Id"
},
"end_id": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": null,
"description": "Shape id at the end of a connector (ConnectorFormat.EndConnectedShape).",
"title": "End Id"
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Inconsistent indentation for begin_id and end_id properties.

The begin_id and end_id property definitions use 8-space indentation while id and other properties use 4-space indentation. This inconsistency should be corrected for uniformity.

🔎 Proposed fix
-        "begin_id": {
-          "anyOf": [
-            {
-              "type": "integer"
-            },
-            {
-              "type": "null"
-            }
-          ],
-          "default": null,
-          "description": "Shape id at the start of a connector (ConnectorFormat.BeginConnectedShape).",
-          "title": "Begin Id"
-        },
-        "end_id": {
-          "anyOf": [
-            {
-              "type": "integer"
-            },
-            {
-              "type": "null"
-            }
-          ],
-          "default": null,
-          "description": "Shape id at the end of a connector (ConnectorFormat.EndConnectedShape).",
-          "title": "End Id"
-        },
+    "begin_id": {
+      "anyOf": [
+        {
+          "type": "integer"
+        },
+        {
+          "type": "null"
+        }
+      ],
+      "default": null,
+      "description": "Shape id at the start of a connector (ConnectorFormat.BeginConnectedShape).",
+      "title": "Begin Id"
+    },
+    "end_id": {
+      "anyOf": [
+        {
+          "type": "integer"
+        },
+        {
+          "type": "null"
+        }
+      ],
+      "default": null,
+      "description": "Shape id at the end of a connector (ConnectorFormat.EndConnectedShape).",
+      "title": "End Id"
+    },
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"begin_id": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": null,
"description": "Shape id at the start of a connector (ConnectorFormat.BeginConnectedShape).",
"title": "Begin Id"
},
"end_id": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": null,
"description": "Shape id at the end of a connector (ConnectorFormat.EndConnectedShape).",
"title": "End Id"
},
"begin_id": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": null,
"description": "Shape id at the start of a connector (ConnectorFormat.BeginConnectedShape).",
"title": "Begin Id"
},
"end_id": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": null,
"description": "Shape id at the end of a connector (ConnectorFormat.EndConnectedShape).",
"title": "End Id"
},
🤖 Prompt for AI Agents
In schemas/shape.json around lines 18 to 43 the begin_id and end_id property
blocks are indented with 8 spaces while other properties use 4 spaces; fix by
reformatting those blocks (their keys, inner fields like
anyOf/type/default/description/title, and closing braces) to use the same
4-space indentation as surrounding properties so the file formatting is
consistent.

Comment on lines +267 to +268
if pending_connections:
name_to_id = {name: sid for name, sid in excel_names}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's see if we can find the shapes.py file and read it around lines 267-268
find . -name "shapes.py" -path "*/core/*" -type f 2>/dev/null | head -5

Repository: harumiWeb/exstruct

Length of output: 91


🏁 Script executed:

# Read the file with context around lines 267-268 (let's get lines 250-290)
sed -n '250,290p' ./src/exstruct/core/shapes.py | cat -n

Repository: harumiWeb/exstruct

Length of output: 1450


🏁 Script executed:

# Get more context - let's find the function definition
# First, get line 250 and trace back to find the function start
head -n 270 ./src/exstruct/core/shapes.py | tail -n 100 | head -n 50

Repository: harumiWeb/exstruct

Length of output: 2193


🏁 Script executed:

# Let's find the function name and see the full function with line numbers
grep -n "def " ./src/exstruct/core/shapes.py | grep -B5 -A5 "267\|268" || grep -n "excel_names" ./src/exstruct/core/shapes.py

Repository: harumiWeb/exstruct

Length of output: 254


🏁 Script executed:

# Get the function definition - find the function that contains line 118
sed -n '100,130p' ./src/exstruct/core/shapes.py

Repository: harumiWeb/exstruct

Length of output: 1203


🏁 Script executed:

# Let's see the code around line 210 where items are appended to excel_names
sed -n '200,220p' ./src/exstruct/core/shapes.py

Repository: harumiWeb/exstruct

Length of output: 1019


🏁 Script executed:

# Now let's check if there are any tests for duplicate name handling
find . -name "*test*.py" -o -name "test_*" -type f | xargs grep -l "duplicate.*name\|name.*duplicate" 2>/dev/null

Repository: harumiWeb/exstruct

Length of output: 46


🏁 Script executed:

# Let's search for test files in the repository
find . -name "*test*.py" -type f 2>/dev/null | head -10

Repository: harumiWeb/exstruct

Length of output: 374


🏁 Script executed:

# Check the shapes extraction test for any duplicate name handling
grep -n "duplicate\|name" ./tests/com/test_shapes_extraction.py | head -20

Repository: harumiWeb/exstruct

Length of output: 92


🏁 Script executed:

# Let's read the entire shapes extraction test file to understand how it works
cat ./tests/com/test_shapes_extraction.py

Repository: harumiWeb/exstruct

Length of output: 4570


🏁 Script executed:

# Let's check if there are any comments or documentation about duplicate handling
grep -n -A5 -B5 "name_to_id\|excel_names" ./src/exstruct/core/shapes.py | head -40

Repository: harumiWeb/exstruct

Length of output: 1742


🏁 Script executed:

# Let's check if there are any safeguards or validation for duplicate names elsewhere
grep -n "shape_name\|excel_name" ./src/exstruct/core/shapes.py | grep -v "^Binary"

Repository: harumiWeb/exstruct

Length of output: 892


🏁 Script executed:

# Let's check the entire function to understand the flow better
sed -n '111,275p' ./src/exstruct/core/shapes.py

Repository: harumiWeb/exstruct

Length of output: 7487


🏁 Script executed:

# Let's check if there are any related issues or documentation about this
find . -name "*.md" -o -name "*.txt" -o -name "*.rst" | xargs grep -l "duplicate\|shape.*name" 2>/dev/null | head -5

Repository: harumiWeb/exstruct

Length of output: 44


Consider handling potential duplicate shape names in the dict comprehension.

The dict comprehension at line 268 silently overwrites duplicate shape names with the last occurrence. While Excel allows duplicate names, and duplicate shape names can be created through copy and paste operations, this silent overwrite could cause connectors to incorrectly resolve to the wrong shape IDs. Either validate that shape names are unique when building excel_names, log a warning on duplicates, or use the first occurrence instead of the last.

🤖 Prompt for AI Agents
In src/exstruct/core/shapes.py around lines 267-268, the dict comprehension
name_to_id = {name: sid for name, sid in excel_names} silently overwrites
duplicate shape names; update this code to detect and handle duplicates instead
of always keeping the last occurrence: iterate through excel_names in order, if
a name is not yet in name_to_id insert it (to keep first occurrence) or
alternatively detect when a name is already present and log a warning (use the
module logger) listing the duplicated name and both IDs; if you prefer
strictness, raise a ValueError when duplicates are found. Ensure tests or
callers still receive a deterministic mapping and that the chosen behavior
(keep-first, warn, or error) is clearly implemented and logged.

@harumiWeb
Copy link
Owner

ローカルで挙動を確認しましたが、完璧にフローチャート推論ができており、
現状の実装で問題ありません。
フローチャート認識精度向上に寄与する非常に素晴らしい改善です!
PRありがとうございます。マージします。

@harumiWeb harumiWeb merged commit c159b02 into harumiWeb:main Dec 21, 2025
9 checks passed
@harumiWeb harumiWeb mentioned this pull request Dec 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants