Skip to content

chore: sync Arize skills from arize-skills#1690

Merged
aaronpowell merged 1 commit into
github:stagedfrom
Arize-ai:sync/arize-skills
May 13, 2026
Merged

chore: sync Arize skills from arize-skills#1690
aaronpowell merged 1 commit into
github:stagedfrom
Arize-ai:sync/arize-skills

Conversation

@jimbobbennett
Copy link
Copy Markdown
Contributor

Pull Request Checklist

  • I have read and followed the CONTRIBUTING.md guidelines.
  • I have read and followed the Guidance for submissions involving paid services.
  • My contribution adds a new instruction, prompt, agent, skill, or workflow file in the correct directory.
  • The file follows the required naming convention.
  • The content is clearly structured and follows the example format.
  • I have tested my instructions, prompt, agent, skill, or workflow with GitHub Copilot.
  • I have run npm start and verified that README.md is up to date.
  • I am targeting the staged branch for this pull request.

Description

Updating the Arize AX and Phoenix skills to the latest version.


Type of Contribution

  • New instruction file.
  • New prompt file.
  • New agent file.
  • New plugin.
  • New skill file.
  • New agentic workflow.
  • Update to existing instruction, prompt, agent, plugin, skill, or workflow.
  • Other (please specify):

Additional Notes


By submitting this pull request, I confirm that my contribution abides by the Code of Conduct and will be licensed under the MIT License.

…cabff161d8aae6 and phoenix@30ccbe6b38cc83719038bf30041335f29bae45e9
Copilot AI review requested due to automatic review settings May 13, 2026 00:56
@github-actions github-actions Bot added the skills PR touches skills label May 13, 2026
@github-actions github-actions Bot added the skill-check-warning Skill validator reported warnings label May 13, 2026
@github-actions
Copy link
Copy Markdown
Contributor

🔍 Skill Validator Results

⚠️ Warnings or advisories found

Scope Checked
Skills 11
Agents 1
Total 12
Severity Count
--- ---:
❌ Errors 0
⚠️ Warnings 14
ℹ️ Advisories 0

Summary

Level Finding
ℹ️ Found 11 skill(s)
ℹ️ [arize-ai-provider-integration] 📊 arize-ai-provider-integration: 2,684 BPE tokens [chars/4: 2,601] (standard ~), 29 sections, 16 code blocks
ℹ️ [arize-ai-provider-integration] ⚠ Skill is 2,684 BPE tokens (chars/4 estimate: 2,601) — approaching "comprehensive" range where gains diminish.
ℹ️ [arize-ai-provider-integration] ⚠ No numbered workflow steps — agents follow sequenced procedures more reliably.
ℹ️ [arize-annotation] 📊 arize-annotation: 2,528 BPE tokens [chars/4: 2,696] (standard ~), 27 sections, 15 code blocks
ℹ️ [arize-annotation] ⚠ Skill is 2,528 BPE tokens (chars/4 estimate: 2,696) — approaching "comprehensive" range where gains diminish.
ℹ️ [arize-annotation] ⚠ No numbered workflow steps — agents follow sequenced procedures more reliably.
ℹ️ [arize-dataset] 📊 arize-dataset: 3,861 BPE tokens [chars/4: 3,854] (standard ~), 51 sections, 16 code blocks
ℹ️ [arize-dataset] ⚠ Skill is 3,861 BPE tokens (chars/4 estimate: 3,854) — approaching "comprehensive" range where gains diminish.
ℹ️ [arize-evaluator] 📊 arize-evaluator: 7,825 BPE tokens [chars/4: 8,053] (comprehensive ✗), 59 sections, 28 code blocks
Full validator output ```text Found 11 skill(s) [arize-ai-provider-integration] 📊 arize-ai-provider-integration: 2,684 BPE tokens [chars/4: 2,601] (standard ~), 29 sections, 16 code blocks [arize-ai-provider-integration] ⚠ Skill is 2,684 BPE tokens (chars/4 estimate: 2,601) — approaching "comprehensive" range where gains diminish. [arize-ai-provider-integration] ⚠ No numbered workflow steps — agents follow sequenced procedures more reliably. [arize-annotation] 📊 arize-annotation: 2,528 BPE tokens [chars/4: 2,696] (standard ~), 27 sections, 15 code blocks [arize-annotation] ⚠ Skill is 2,528 BPE tokens (chars/4 estimate: 2,696) — approaching "comprehensive" range where gains diminish. [arize-annotation] ⚠ No numbered workflow steps — agents follow sequenced procedures more reliably. [arize-dataset] 📊 arize-dataset: 3,861 BPE tokens [chars/4: 3,854] (standard ~), 51 sections, 16 code blocks [arize-dataset] ⚠ Skill is 3,861 BPE tokens (chars/4 estimate: 3,854) — approaching "comprehensive" range where gains diminish. [arize-evaluator] 📊 arize-evaluator: 7,825 BPE tokens [chars/4: 8,053] (comprehensive ✗), 59 sections, 28 code blocks [arize-evaluator] ⚠ Skill is 7,825 BPE tokens (chars/4 estimate: 8,053) — "comprehensive" skills hurt performance by 2.9pp on average. Consider splitting into 2–3 focused skills. [arize-experiment] 📊 arize-experiment: 4,616 BPE tokens [chars/4: 4,646] (standard ~), 34 sections, 20 code blocks [arize-experiment] ⚠ Skill is 4,616 BPE tokens (chars/4 estimate: 4,646) — approaching "comprehensive" range where gains diminish. [arize-instrumentation] 📊 arize-instrumentation: 6,117 BPE tokens [chars/4: 6,210] (comprehensive ✗), 19 sections, 4 code blocks [arize-instrumentation] ⚠ Skill is 6,117 BPE tokens (chars/4 estimate: 6,210) — "comprehensive" skills hurt performance by 2.9pp on average. Consider splitting into 2–3 focused skills. [arize-link] 📊 arize-link: 1,239 BPE tokens [chars/4: 1,121] (detailed ✓), 9 sections, 6 code blocks [arize-prompt-optimization] 📊 arize-prompt-optimization: 4,489 BPE tokens [chars/4: 4,799] (standard ~), 58 sections, 19 code blocks [arize-prompt-optimization] ⚠ Skill is 4,489 BPE tokens (chars/4 estimate: 4,799) — approaching "comprehensive" range where gains diminish. [arize-trace] 📊 arize-trace: 5,896 BPE tokens [chars/4: 5,853] (comprehensive ✗), 43 sections, 10 code blocks [arize-trace] ⚠ Skill is 5,896 BPE tokens (chars/4 estimate: 5,853) — "comprehensive" skills hurt performance by 2.9pp on average. Consider splitting into 2–3 focused skills. [phoenix-cli] 📊 phoenix-cli: 3,920 BPE tokens [chars/4: 4,050] (standard ~), 20 sections, 17 code blocks [phoenix-cli] ⚠ Skill is 3,920 BPE tokens (chars/4 estimate: 4,050) — approaching "comprehensive" range where gains diminish. [phoenix-cli] ⚠ No numbered workflow steps — agents follow sequenced procedures more reliably. [phoenix-evals] 📊 phoenix-evals: 1,089 BPE tokens [chars/4: 1,126] (detailed ✓), 5 sections, 0 code blocks [phoenix-evals] ⚠ No code blocks — agents perform better with concrete snippets and commands. [phoenix-evals] ⚠ No numbered workflow steps — agents follow sequenced procedures more reliably. ✅ All checks passed (11 skill(s)) ```

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR syncs Arize- and Phoenix-related skills/reference documentation to the latest upstream versions, expanding guidance for dataset upserts, Phoenix CLI workflows (open/axial coding), and refreshing Arize skill metadata/descriptions.

Changes:

  • Update Phoenix eval dataset references (Python/TypeScript) to document upsert semantics, stable example IDs, and split handling.
  • Expand the Phoenix CLI skill and workflow references (open coding / axial coding) with identifiers, sidecar handoff, profiles, and deletion/cleanup guidance.
  • Refresh Arize skill SKILL.md frontmatter descriptions/metadata and update the generated skills index entries accordingly.

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 11 comments.

Show a summary per file
File Description
skills/phoenix-evals/references/experiments-datasets-typescript.md Adds upsert + stable example ID guidance and updates the documented example type shape.
skills/phoenix-evals/references/experiments-datasets-python.md Documents upsert behavior, stable IDs, and split key guidance for Python dataset creation.
skills/phoenix-cli/SKILL.md Expands Phoenix CLI reference commands and introduces profiles + coding identifier workflow framing.
skills/phoenix-cli/references/open-coding.md Substantially expands open-coding workflow (unit of analysis, identifiers, sidecar, UI filter, cleanup).
skills/phoenix-cli/references/axial-coding.md Updates axial-coding workflow to use the open-coding identifier + sidecar-based gather/quantify.
skills/arize-trace/SKILL.md Refreshes skill description and adds metadata/compatibility fields.
skills/arize-prompt-optimization/SKILL.md Refreshes skill description and adds metadata/compatibility fields.
skills/arize-link/SKILL.md Refreshes skill description and adds metadata fields.
skills/arize-instrumentation/SKILL.md Refreshes skill description and expands guidance (including Go) plus metadata/compatibility fields.
skills/arize-experiment/SKILL.md Refreshes skill description and adds metadata/compatibility fields.
skills/arize-evaluator/SKILL.md Refreshes skill description and adds metadata/compatibility fields.
skills/arize-dataset/SKILL.md Refreshes skill description and adds metadata/compatibility fields.
skills/arize-annotation/SKILL.md Refreshes skill description and adds metadata/compatibility fields.
skills/arize-ai-provider-integration/SKILL.md Refreshes skill description and adds metadata/compatibility fields.
docs/README.skills.md Updates the skills index table descriptions for the Arize skills to match the refreshed SKILL.md content.

});

// With stable example IDs for targeted updates across uploads
const { datasetId } = await createDataset({
Comment on lines +45 to +51
interface Example {
input: Record<string, unknown>; // Task input
output?: Record<string, unknown>; // Expected output
metadata?: Record<string, unknown>; // Additional context
output?: Record<string, unknown> | null; // Expected output
metadata?: Record<string, unknown> | null; // Additional context
splits?: string | string[] | null; // Split assignment ("train", ["train", "easy"], etc.)
spanId?: string | null; // OTEL span ID to link back to source trace
id?: string | null; // Stable user-provided ID; server updates matching row
---
name: arize-ai-provider-integration
description: "INVOKE THIS SKILL when creating, reading, updating, or deleting Arize AI integrations. Covers listing integrations, creating integrations for any supported LLM provider (OpenAI, Anthropic, Azure OpenAI, AWS Bedrock, Vertex AI, Gemini, NVIDIA NIM, custom), updating credentials or metadata, and deleting integrations using the ax CLI."
description: Creates, reads, updates, and deletes Arize AI integrations that store LLM provider credentials used by evaluators and other Arize features. Supports any LLM provider (e.g. OpenAI, Anthropic, Azure OpenAI, AWS Bedrock, Vertex AI, Gemini, NVIDIA NIM). Use when the user mentions AI integration, LLM provider credentials, create integration, list integrations, update credentials, delete integration, or connecting an LLM provider to Arize.
---
name: arize-annotation
description: "INVOKE THIS SKILL when creating, managing, or using annotation configs or annotation queues on Arize (categorical, continuous, freeform), or applying human annotations to project spans via the Python SDK. Configs are the label schema for human feedback; queues are review workflows that route records to annotators. Triggers: annotation config, annotation queue, label schema, human feedback schema, bulk annotate spans, update_annotations, labeling queue, annotate record."
description: Creates and manages annotation configs (categorical, continuous, freeform label schemas) and annotation queues (human review workflows) on Arize. Applies human annotations to project spans via the Python SDK. Use when the user mentions annotation config, annotation queue, label schema, human feedback, bulk annotate spans, update_annotations, labeling queue, annotate record, or human review.
---
name: arize-dataset
description: "INVOKE THIS SKILL when creating, managing, or querying Arize datasets and examples. Also use when the user needs test data or evaluation examples for their model. Covers dataset CRUD, appending examples, exporting data, and file-based dataset creation using the ax CLI."
description: Creates, manages, and queries Arize datasets and examples. Covers dataset CRUD, appending examples, exporting data, and file-based dataset creation using the ax CLI. Use when the user needs test data, evaluation examples, or mentions create dataset, list datasets, export dataset, append examples, dataset version, golden dataset, or test set.
---
name: arize-experiment
description: "INVOKE THIS SKILL when creating, running, or analyzing Arize experiments. Also use when the user wants to evaluate or measure model performance, compare models (including GPT-4, Claude, or others), or assess how well their AI is doing. Covers experiment CRUD, exporting runs, comparing results, and evaluation workflows using the ax CLI."
description: Creates, runs, and analyzes Arize experiments for evaluating and comparing model performance. Covers experiment CRUD, exporting runs, comparing results, and evaluation workflows using the ax CLI. Use when the user mentions create experiment, run experiment, compare models, model performance, evaluate AI, experiment results, benchmark, A/B test models, or measure accuracy.
---
name: arize-instrumentation
description: "INVOKE THIS SKILL when adding Arize AX tracing or observability to an app for the first time, or when the user wants to instrument their LLM app or get started with LLM observability. Follow the Agent-Assisted Tracing two-phase flow: analyze the codebase (read-only), then implement after user confirmation. When the app uses LLM tool/function calling, add manual CHAIN + TOOL spans. Leverages https://arize.com/docs/ax/alyx/tracing-assistant and https://arize.com/docs/PROMPT.md."
description: Adds Arize AX tracing to an LLM application for the first time. Follows a two-phase agent-assisted flow to analyze the codebase then implement instrumentation after user confirmation. Use when the user wants to instrument their app, add tracing from scratch, set up LLM observability, integrate OpenTelemetry or openinference, or get started with Arize tracing.
---
name: arize-link
description: Generate deep links to the Arize UI. Use when the user wants a clickable URL to open or share a specific trace, span, session, dataset, labeling queue, evaluator, or annotation config, or when sharing Arize resources with team members.
description: Generates deep links to the Arize UI for traces, spans, sessions, datasets, labeling queues, evaluators, and annotation configs. Produces clickable URLs for sharing Arize resources with team members. Use when the user wants to link to or open a trace, span, session, dataset, evaluator, or annotation config in the Arize UI.
---
name: arize-prompt-optimization
description: "INVOKE THIS SKILL when optimizing, improving, or debugging LLM prompts using production trace data, evaluations, and annotations. Also use when the user wants to make their AI respond better or improve AI output quality. Covers extracting prompts from spans, gathering performance signal, and running a data-driven optimization loop using the ax CLI."
description: Optimizes, improves, and debugs LLM prompts using production trace data, evaluations, and annotations. Extracts prompts from spans, gathers performance signal, and runs a data-driven optimization loop using the ax CLI. Use when the user mentions optimize prompt, improve prompt, make AI respond better, improve output quality, prompt engineering, prompt tuning, or system prompt improvement.
---
name: arize-trace
description: "INVOKE THIS SKILL when downloading, exporting, or inspecting Arize traces and spans, or when a user wants to look at what their LLM app is doing using existing trace data, or when an already-instrumented app has a bug or error to investigate. Use for debugging unknown runtime issues, failures, and behavior regressions. Covers exporting traces by ID, spans by ID, sessions by ID, and root-cause investigation with the ax CLI."
description: Downloads, exports, and inspects existing Arize traces and spans to understand what an LLM app is doing or debug runtime issues. Covers exporting traces by ID, spans by ID, sessions by ID, and root-cause investigation using the ax CLI. Use when the user wants to look at existing trace data, see what their LLM app is doing, export traces, download spans, investigate errors, or analyze behavior regressions.
@aaronpowell aaronpowell merged commit a4d0afc into github:staged May 13, 2026
15 of 16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

skill-check-warning Skill validator reported warnings skills PR touches skills

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants