Note
Currently in beta (pre-v1.0), and may see breaking changes until the first stable release (v1.0).
This repository provides a set of agent skills to interact with Dataproc clusters and jobs. These skills can be used with various AI agents, including Gemini CLI, Claude Code, and Codex, to manage your clusters, monitor jobs, and troubleshoot issues using natural language prompts.
Important
We Want Your Feedback! Please share your thoughts with us by filling out our feedback form. Your input is invaluable and helps us improve the project for everyone.
- Why Use Dataproc Agent Skills?
- Prerequisites
- Getting Started
- Usage Examples
- Supported Skills
- Troubleshooting
- Seamless Workflow: Integrates seamlessly into your AI agent's environment. No need to constantly switch contexts for common Dataproc tasks.
- Natural Language Queries: Stop wrestling with complex gcloud commands. Manage your clusters and jobs by describing what you want in plain English.
- Full Lifecycle Control: Manage the entire lifecycle of your Dataproc resources, from listing clusters to checking job statuses.
Before you begin, ensure you have the following:
- One of these AI agents installed
- Gemini CLI version v0.6.0 or higher
- Claude Code version v2.1.94 or higher
- Codex v0.117.0 or higher
- Antigravity v1.14.2 or higher
- A Google Cloud project with the Dataproc API enabled.
- Ensure Application Default Credentials are available in your environment.
- IAM Permissions:
- Dataproc Viewer (
roles/dataproc.viewer) or Dataproc Editor (roles/dataproc.editor)
- Dataproc Viewer (
Please keep these env vars handy during the installation process:
DATAPROC_PROJECT: The GCP project ID.DATAPROC_REGION: The region of your Dataproc resources.
Note
- Ensure Application Default Credentials are available in your environment.
- If your Cloud SQL for PostgreSQL instance uses private IPs, you must run your agent in the same Virtual Private Cloud (VPC) network.
To start interacting with Dataproc, install the skills for your preferred AI agent, then launch the agent and use natural language to ask questions or perform tasks.
For the latest version, check the releases page.
Gemini CLI
1. Install the extension:
gemini extensions install https://github.com/gemini-cli-extensions/dataprocDuring the installation, enter your environment vars as described in the configuration section.
2. (Optional) Manage Configuration: To view or update your configuration in Gemini CLI:
- Terminal:
gemini extensions config dataproc [setting name] [--scope <scope>] - Gemini CLI:
/extensions list
3. Start the agent:
gemini(Tip: Run /extensions list to verify your configuration and active extensions.)
[!WARNING] Changing Instance & Database Connections Currently, the database connection must be configured before starting the agent and can not be changed during a session. To save and resume conversation history in Gemini CLI use command:
/chat save <tag>and/chat resume <tag>.
Claude Code
1. Set env vars: In your terminal, set your environment vars as described in the configuration section.
2. Start the agent:
claude3. Add the marketplace:
/plugin marketplace add https://github.com/gemini-cli-extensions/dataproc.git#0.1.04. Install the plugin:
/plugin install dataproc@dataproc-marketplaceCodex
1. Clone the Repo:
git clone --branch 0.1.0 git@github.com:gemini-cli-extensions/dataproc.git2. Install the plugin:
mkdir -p ~/.codex/plugins
cp -R /absolute/path/to/dataproc ~/.codex/plugins/dataproc3. Set env vars: Enter your environment vars as described in the configuration section.
4. Create or update marketplace.json:
~/.agents/plugins/marketplace.json
{
"name": "my-data-cloud-google-marketplace",
"interface": {
"displayName": "Google Data Cloud Skills"
},
"plugins": [
{
"name": "dataproc",
"source": {
"source": "local",
"path": "./plugins/dataproc"
},
"policy": {
"installation": "AVAILABLE",
"authentication": "ON_INSTALL"
},
"category": "Data & Analytics"
}
]
}Antigravity
1. Clone the Repo:
git clone --branch 0.1.0 https://github.com/gemini-cli-extensions/dataproc.git2. Install the skills:
Choose a location for the skills:
- Global (all workspaces):
~/.gemini/antigravity/skills/ - Workspace-specific:
<workspace-root>/.agents/skills/
Copy the skill folders from the cloned repository's skills/ directory to your chosen location:
cp -R dataproc/skills/* ~/.gemini/antigravity/skills/3. Set env vars: Set your environment vars as described in the configuration section.
Interact with Dataproc using natural language:
- List Clusters:
- "List all my Dataproc clusters in us-central1."
- Check Jobs:
- "Show me the status of the job with ID 'my-spark-job-123'."
- "List all failed jobs in my project."
- Get Details:
- "Get details for the cluster named 'my-cluster'."
The following skills are available in this repository:
- Dataproc Skills - Skills to interact with your Dataproc clusters and jobs.
Use the debug mode of your agent (e.g., gemini --debug) to enable debugging.
Common issues:
- "failed to find default credentials": Ensure Application Default Credentials are available in your environment.
- "cannot execute binary file": The Toolbox binary did not download correctly.