Skip to content

feat: integrate CoreML icon detection (OmniParser icon_detect_v1_5)#1

Open
scut204 wants to merge 2 commits intoqdore:mainfrom
scut204:feat-omniparser-add
Open

feat: integrate CoreML icon detection (OmniParser icon_detect_v1_5)#1
scut204 wants to merge 2 commits intoqdore:mainfrom
scut204:feat-omniparser-add

Conversation

@scut204
Copy link
Copy Markdown

@scut204 scut204 commented Apr 5, 2026

  • Add IconDetector class using CoreML + Vision for YOLOv8 icon detection
  • Add IoU/Containment dedup to prioritize AX elements over detected icons
  • Add IconElement model and wire into snapshot JSON output (iconElements)
  • Add icon elements to display pipeline with '#' marker and red debug boxes
  • Link CoreML framework in Swift bridge and CGO LDFLAGS
  • Include converted model_v1_5.mlpackage (38MB)
  • mlpackage的原模型在https://huggingface.co/microsoft/OmniParser/tree/main/icon_detect_v1_5这里转换的
  • 现在可以检测比如vscode之类的不被AX组件识别的特殊按钮了比如以下图标
image

tianyi and others added 2 commits April 4, 2026 22:01
- Add IconDetector class using CoreML + Vision for YOLOv8 icon detection
- Add IoU/Containment dedup to prioritize AX elements over detected icons
- Add IconElement model and wire into snapshot JSON output (iconElements)
- Add icon elements to display pipeline with '#' marker and red debug boxes
- Link CoreML framework in Swift bridge and CGO LDFLAGS
- Include converted model_v1_5.mlpackage (38MB)
@qdore
Copy link
Copy Markdown
Owner

qdore commented Apr 8, 2026

现在 只有 detect model,没有 caption model?这样 llm 不知道对应 icon 具体的含义,无法正常工作吧?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants