Skip to content

feat(mcp): rewrite 49 tool descriptions to improve Glama TDQS#43

Merged
hyoshi merged 2 commits intomainfrom
feat/tdqs-improve-core-tool-descriptions
Apr 24, 2026
Merged

feat(mcp): rewrite 49 tool descriptions to improve Glama TDQS#43
hyoshi merged 2 commits intomainfrom
feat/tdqs-improve-core-tool-descriptions

Conversation

@hyoshi
Copy link
Copy Markdown
Collaborator

@hyoshi hyoshi commented Apr 24, 2026

Summary

Rewrite MCP tool descriptions for 49 core tools (google_ads.campaigns/ad_groups/ads/budget/accounts + keywords/negative_keywords; meta_ads.campaigns/ad_sets/ads) to improve Glama's Tool Definition Quality Score (TDQS). Also introduces docs/tdqs-style-guide.md as the contributor reference.

Why

Glama scored mureo's 173 tools at C grade, 3.1/5 avg, with 86 tools in the C bucket and the lowest at 2.1 (meta_ads.creatives.list). Descriptions were one-liners that restated tool names — failing five of the six TDQS dimensions. Example:

Before (google_ads.campaigns.diagnose, score 2.3):

Diagnose Google Ads campaign delivery status

After (target ~4.4):

Explains why a campaign is not serving or is under-delivering. Returns an ordered list of issues drawn from serving_status, primary_status, and primary_status_reasons (e.g. LIMITED_BY_BUDGET, AD_GROUPS_PAUSED, KEYWORDS_DISAPPROVED, NO_ELIGIBLE_ADS), each annotated with a plain-language description and a remediation hint. Read-only — does not change anything. Use this before pulling raw performance reports; it narrows the problem space.

Scope

Family # Tools
google_ads.campaigns / ad_groups / ads / budget / accounts 19
google_ads.keywords / negative_keywords 13
meta_ads.campaigns / ad_sets / ads 18
Total 50 (49 rewritten + policy_details reworked)

Remaining ~125 tools (meta_ads.creatives, analysis, extensions, search_console, etc.) will be covered in follow-up PRs under the same style guide to reach AAA.

Approach

docs/tdqs-style-guide.md codifies the 6 TDQS dimensions into a template:

<verb> <resource> in <platform>. Returns <returned fields>.
<Read-only | Mutating | Destructive>. <Defaults, thresholds>.
Use this when <scenario>; prefer <sibling tool> when <other scenario>.

Every rewritten tool now covers:

  • Specific verb + resource (Purpose Clarity)
  • Usage vs sibling tools (Usage Guidelines) — e.g. campaigns.update_status vs campaigns.update
  • Read-only / mutating / reversible flags (Behavioral Transparency)
  • Parameter format/examples (customer_id: 10-digit without dashes; account_id: act_ prefix)
  • Concise 50-100 word length (Conciseness)
  • Returned field list + pagination behavior (Contextual Completeness)

Test plan

  • Python syntax validation (ast.parse) on all 3 rewritten files.
  • docker build succeeds.
  • docker run + MCP initialize + tools/list — all 173 tools still load end-to-end.
  • No existing tests assert on description text.
  • Merge → Glama rebuilds → TDQS re-scores; confirm avg moves from 3.1 toward B+ and no regressions elsewhere.
  • Follow-up PRs for remaining tool families until AAA achieved.

Non-goals

  • No runtime behavior changes.
  • TOOLS: list[Tool] public interface unchanged.
  • No schema-shape changes (properties added where TDQS rewards it: enum, minimum, minItems, maxItems — backwards compatible).

Follow-up

Remaining target families for subsequent PRs (avg < 3.5 currently):

  • meta_ads.creatives.* (family avg 2.88, lowest 2.1)
  • google_ads.search_terms.*, google_ads.performance.*, google_ads.auction_insights.*, google_ads.landing_page.*, google_ads.capture.*
  • meta_ads.leads.*, meta_ads.pixels.*
  • Extensions / sitelinks / analysis

hyoshi added 2 commits April 24, 2026 09:02
Glama's Tool Definition Quality Score (TDQS) rated mureo's 173 tools
at an average of 3.1/5 (C), with 86 tools in the C bucket. Root cause:
descriptions were terse one-liners that repeated the tool name, with no
disclosure of returned fields, side effects, or sibling tool
differentiation.

This commit introduces docs/tdqs-style-guide.md — codifying the six
TDQS dimensions (Purpose Clarity, Usage Guidelines, Behavioral
Transparency, Parameter Semantics, Conciseness, Contextual
Completeness) into a template contributors can follow — and applies
the template to 49 core tools:

- google_ads.campaigns.* (6), ad_groups.* (3), ads.* (5 + policy), budget.* (3), accounts.* (1)
- google_ads.keywords.* (6), negative_keywords.* (4)
- meta_ads.campaigns.* (6), ad_sets.* (6), ads.* (6)

Each rewritten description now leads with a specific verb, names the
returned fields, discloses read-only vs mutating semantics and
rollback reversibility, and ends with a differentiation clause
pointing to sibling tools (e.g. campaigns.update_status vs
campaigns.update). Parameter descriptions add format/example
(e.g. customer_id: 10-digit without dashes; account_id: 'act_' prefix
required), defaults (limit: 50, max 1000), and enum constraints where
the Meta/Google Ads APIs define them.

No behavioral changes — TOOLS public interface is unchanged, the 173
tools still load, initialize + tools/list still respond end-to-end
(verified via docker run with the new image). Remaining tools will be
addressed in follow-up PRs.
CI lint job (black --check) flagged the 3 rewritten tool-definition
files. Apply black 24.10.0 formatting. No semantic changes.
@hyoshi hyoshi merged commit 0833d53 into main Apr 24, 2026
7 checks passed
@hyoshi hyoshi deleted the feat/tdqs-improve-core-tool-descriptions branch April 24, 2026 00:14
hyoshi added a commit that referenced this pull request Apr 24, 2026
… PR 2) (#44)

Applies the TDQS template from docs/tdqs-style-guide.md to every Meta
Ads tool not covered by PR #43. Files touched:

- _tools_meta_ads_creatives.py (9 tools: creatives, images, videos)
- _tools_meta_ads_audiences.py (9 tools: audiences, pixels)
- _tools_meta_ads_insights.py  (8 tools: insights, analysis)
- _tools_meta_ads_catalog.py   (10 tools: catalogs, products, feeds)
- _tools_meta_ads_conversions.py (3 tools: Conversions API)
- _tools_meta_ads_leads.py     (5 tools: lead forms, leads)
- _tools_meta_ads_other.py     (14 tools: split tests, ad rules, page posts, Instagram)

Notably rewrites meta_ads.creatives.list - the TDQS lowest-scoring
tool in mureo at 2.1/5 - to a full description covering returned
fields, read-only semantics, pagination defaults, and sibling-tool
differentiation.

All 173 tools still load; docker build + MCP initialize + tools/list
verified end-to-end. No behavioral changes - descriptions and
parameter schema hints only.
hyoshi added a commit that referenced this pull request Apr 24, 2026
* feat(mcp): rewrite lowest-5 TDQS tool descriptions (PR 3 pilot)

Targets the five tools currently bottlenecking the Glama server-level
Tool Definition Quality Score (TDQS = 0.6*mean + 0.4*min = 3.36, B):

  - google_ads.landing_page.analyze     (was 2.4/5, the min)
  - google_ads.performance.report       (was 2.5/5)
  - google_ads.schedule_targeting.list  (was 2.5/5)
  - search_console.sitemaps.submit      (was 2.5/5)
  - google_ads.search_terms.report      (was 2.6/5)

Rewrites follow docs/tdqs-style-guide.md: specific verb, returned
fields, side effects, sibling differentiation, parameter format hints.
Adds _CUSTOMER_ID_PARAM and _PERIOD_PARAM reusable fragments to
_tools_google_ads_analysis.py (same pattern already adopted by
_tools_google_ads_campaigns.py and _tools_meta_ads_campaigns.py in
PRs #43 and #44).

No tool name, required-field, or handler dispatch changes -- this is
purely description and schema metadata. Tool count remains 173.

Goal: lift server TDQS to A tier (>= 3.5) by raising the minimum.
If effective, the remaining ~46 C/B- tools get the same treatment
in a follow-up PR to carry quality_grade from B to A (AAB to AAA
composite grade).

* fix(mcp): correct factual claims in TDQS pilot descriptions

Code review flagged CRITICAL/HIGH/MEDIUM factual errors in the return-
shape and side-effect claims added by the previous commit. A TDQS
rewrite whose goal is "help agents call the tool correctly on first
try" cannot ship with descriptions that contradict the mapper. All
changes are description-only; schemas, tool names, required fields,
and handlers remain unchanged.

CRITICAL

- google_ads.performance.report: return shape is
  {campaign_id, campaign_name, metrics:{...}} per mappers.map_performance_report
  and client.get_performance_report GAQL. Previous description claimed
  flat fields and added status / conversion_value / conversion_rate
  that are neither SELECTed nor mapped. Rewritten to match the actual
  nested metrics object (impressions, clicks, cost_micros, cost,
  conversions, ctr, average_cpc_micros, average_cpc,
  cost_per_conversion_micros, cost_per_conversion).

- google_ads.search_terms.report: return shape is
  {search_term, metrics:{impressions, clicks, cost_micros, cost,
  conversions, ctr}} per mappers.map_search_term. Previous description
  invented keyword, match_type, ad_group, and conversion_value fields.
  Rewritten to match; clarifies that campaign_id / ad_group_id filters
  do not echo back in the output rows.

- search_console.sitemaps.submit: actual return is
  {"status": "submitted", "sitemap": feedpath} per
  SearchConsoleApiClient.submit_sitemap, not an empty object.
  Description corrected.

HIGH

- google_ads.schedule_targeting.list: end_hour range is 0-24 (not
  1-24; 24 denotes end-of-day per Google Ads). start_minute /
  end_minute are string forms of the MinuteOfHour enum
  ('ZERO' / 'FIFTEEN' / 'THIRTY' / 'FORTY_FIVE') per
  _extensions_targeting.list_schedule_targeting, not integers.
  bid_modifier is float-or-null. All now disclosed.

MEDIUM

- search_console.sitemaps.submit: softened the "idempotent" overclaim
  to "safe to call repeatedly; re-submitting re-queues a crawl without
  creating a duplicate entry".

- google_ads.landing_page.analyze: added truncation caps per
  lp_analyzer.py (features <= 30, structured_data <= 5 JSON-LD
  blocks, max_redirects = 5, User-Agent 'MarketingAgent/1.0') and
  expanded the SSRF disclosure to cover link-local / reserved ranges
  and the post-redirect re-validation.

Verified: ast.parse OK, ruff clean, black unchanged, pytest 1774
passed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant