Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
67 changes: 18 additions & 49 deletions .github/workflows/issue-arborist.lock.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

41 changes: 33 additions & 8 deletions .github/workflows/issue-arborist.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,21 +29,46 @@ steps:
- name: Fetch issues data
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
GH_AW_ORIGINAL_GITHUB_API_URL: ${{ github.api_url }}
GH_AW_GITHUB_REPOSITORY: ${{ github.repository }}
run: |
# Create output directory
mkdir -p /tmp/gh-aw/issues-data

echo "⬇ Downloading the last 100 open issues (excluding sub-issues)..."

# Fetch the last 100 open issues that don't have a parent issue
gh issue list --repo "$GH_AW_GITHUB_REPOSITORY" \
--search "-parent-issue:*" \
--state open \
--json number,title,author,createdAt,state,url,body,labels,updatedAt,closedAt,milestone,assignees \
--limit 100 \
> /tmp/gh-aw/issues-data/issues.json
# Use REST API directly to avoid gh CLI failures under the DIFC proxy
# (see https://github.com/githubnext/agentics/issues/339 and microsoft/testfx#8571).
# The /meta block referenced in #339 was fixed in gh-aw-mcpg v0.3.12, but
# `gh issue list` still fails under the proxy with `malformed version:`
# (observed with mcpg v0.3.17), so we keep the curl-based fallback.
# Fetches the most recently created 100 issues (intentional limit matching previous behavior).
# State is normalized to uppercase (OPEN/CLOSED) to match gh CLI GraphQL output format.
curl -s \
-H "Authorization: Bearer ${GITHUB_TOKEN}" \
-H "Accept: application/vnd.github+json" \
--get \
--data-urlencode "q=repo:${GH_AW_GITHUB_REPOSITORY} is:issue is:open -is:sub-issue" \
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MAJOR] Algorithmic Correctness

Semantic mismatch in issue filtering query

The old query uses -parent-issue:* which excludes issues that have a parent-issue: custom field (i.e., issues that are children in a parent-child relationship).

The new query uses -is:sub-issue which excludes issues marked by GitHub's native sub-issue feature.

These are not equivalent.

Concrete failing scenario:

  1. Repository has Issue DeploymentItem should be implemented via Fx extensibility #100 (parent issue)
  2. Issue Removed Attribute from the namespace for DynamicDataAttribute #200 is linked as a child via the custom parent-issue: #100 field
  3. Old query (-parent-issue:*) → excludes Removed Attribute from the namespace for DynamicDataAttribute #200 (has parent)
  4. New query (-is:sub-issue) → includes Removed Attribute from the namespace for DynamicDataAttribute #200 (not marked as native sub-issue)
  5. The arborist now incorrectly processes Issue Removed Attribute from the namespace for DynamicDataAttribute #200 as a candidate for linking

Impact: Child issues that should be excluded will now be included, breaking the workflow's core filtering logic. The arborist may attempt to re-link issues that are already children, or create duplicate parent-child relationships.

Recommendation:
The search query should match the original intent. Options:

  • Use NOT "parent-issue" in:body if the parent relationship is tracked in issue bodies
  • Switch to GitHub's native sub-issue feature consistently (document the migration)
  • Use the GitHub GraphQL API directly with the same query structure as gh issue list

Note: The comment states this "re-applies the exact same workaround" from #8185 and #8507, but if those PRs had the same query mismatch, the filtering bug has been present since the original workaround.

--data-urlencode "sort=created" \
--data-urlencode "order=desc" \
--data-urlencode "per_page=100" \
"${GH_AW_ORIGINAL_GITHUB_API_URL}/search/issues" \
| jq '.items // [] | map({
number: .number,
Comment on lines +55 to +57
title: .title,
author: {login: .user.login},
createdAt: .created_at,
state: (.state | ascii_upcase),
url: .html_url,
body: .body,
labels: [.labels[] | {name: .name}],
updatedAt: .updated_at,
closedAt: .closed_at,
milestone: (if .milestone != null then {title: .milestone.title} else null end),
assignees: [.assignees[] | {login: .login}]
})' \
> /tmp/gh-aw/issues-data/issues.json \
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MAJOR] Defensive Coding at Boundaries

jq parse failure not caught by fallback

The || echo '[]' fallback is chained only to curl, not to the entire curl | jq pipeline. If jq fails to parse the response (malformed JSON, HTML error page, network corruption), the fallback does NOT execute.

Concrete failing scenario:

  1. GitHub API returns HTTP 500 with an HTML error page (service outage, proxy error)
  2. curl succeeds (exit 0) and writes HTML to stdout
  3. jq attempts to parse HTML → parse error, exits 1
  4. The || echo '[]' fallback does not execute (it's only attached to curl)
  5. /tmp/gh-aw/issues-data/issues.json is empty or truncated
  6. The next line runs jq 'length' /tmp/gh-aw/issues-data/issues.jsoncrashes with parse error
  7. Workflow fails with obscure error instead of graceful degradation

Proof:

# Simulate malformed JSON
echo "not json" | jq '.items // []' > output.json || echo '[]' > output.json
# jq exits 1, but the || fallback doesn't fire because it's not part of the piped command

Recommendation:
The fallback must apply to the entire pipeline:

curl -s \
  -H "Authorization: Bearer ${GITHUB_TOKEN}" \
  -H "Accept: application/vnd.github+json" \
  --get \
  --data-urlencode "q=repo:${GH_AW_GITHUB_REPOSITORY} is:issue is:open -is:sub-issue" \
  --data-urlencode "sort=created" \
  --data-urlencode "order=desc" \
  --data-urlencode "per_page=100" \
  "${GH_AW_ORIGINAL_GITHUB_API_URL}/search/issues" \
  | jq '.items // [] | map({
      number: .number,
      title: .title,
      author: {login: .user.login},
      createdAt: .created_at,
      state: (.state | ascii_upcase),
      url: .html_url,
      body: .body,
      labels: [.labels[] | {name: .name}],
      updatedAt: .updated_at,
      closedAt: .closed_at,
      milestone: (if .milestone != null then {title: .milestone.title} else null end),
      assignees: [.assignees[] | {login: .login}]
    })' > /tmp/gh-aw/issues-data/issues.json \
  || echo '[]' > /tmp/gh-aw/issues-data/issues.json

Move the || echo '[]' to the end of the entire pipeline so it catches both curl and jq failures.

|| echo '[]' > /tmp/gh-aw/issues-data/issues.json

echo "✓ Issues data saved to /tmp/gh-aw/issues-data/issues.json"
echo "Total issues fetched: $(jq 'length' /tmp/gh-aw/issues-data/issues.json)"
Expand Down
Loading