# Create a Dataproc Cluster using CloudShell

Create a **Google Cloud Dataproc** single-node cluster with Jupyter enabled.  

**Prerequisites**
- Your user has permissions to create Dataproc clusters in the project.
- APIs enabled: **Dataproc API** and **Compute Engine API**.


In [None]:
# Set project & region
PROJECT_ID=pp-bigquery-03
REGION=us-east1
gcloud config set project "$PROJECT_ID"

# Default Compute Engine service account
PROJECT_NUMBER=$(gcloud projects describe "$PROJECT_ID" --format='value(projectNumber)')
COMPUTE_SA="${PROJECT_NUMBER}-compute@developer.gserviceaccount.com"

# Create staging/temp buckets in the same region (skip if they already exist)
gcloud storage buckets create "gs://${PROJECT_ID}-dp-staging" --location="$REGION"
gcloud storage buckets create "gs://${PROJECT_ID}-dp-temp"    --location="$REGION"

# Grant the VM service account object-level permissions on both buckets
gcloud storage buckets add-iam-policy-binding "gs://${PROJECT_ID}-dp-staging" \
  --member="serviceAccount:${COMPUTE_SA}" \
  --role="roles/storage.objectAdmin"

gcloud storage buckets add-iam-policy-binding "gs://${PROJECT_ID}-dp-temp" \
  --member="serviceAccount:${COMPUTE_SA}" \
  --role="roles/storage.objectAdmin"

# Safe cluster name: lowercase, digits only (no underscores), <=51 chars
CLUSTER="learner-$(whoami | tr '[:upper:]' '[:lower:]' | tr -cd 'a-z0-9' | cut -c1-40)"

# Create the cluster and EXPLICITLY use your buckets
gcloud dataproc clusters create "$CLUSTER" \
  --region="$REGION" \
  --single-node \
  --master-boot-disk-size=60 \
  --image-version=2.2-debian12 \
  --enable-component-gateway \
  --optional-components=JUPYTER \
  --service-account="$COMPUTE_SA" \
  --bucket="${PROJECT_ID}-dp-staging" \
  --temp-bucket="${PROJECT_ID}-dp-temp"