In [None]:
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Run RoseTTAFold Pipeline\n",
    "\n",
    "This notebook demonstrates how to:\n",
    "1. Connect to your Azure Machine Learning workspace.\n",
    "2. Locate and submit the published **RoseTTAFold** pipeline.\n",
    "3. Monitor the run and retrieve outputs.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. Environment Setup\n",
    "\n",
    "Install or upgrade the Azure ML SDK if needed (uncomment if running locally):\n",
    "```bash\n",
    "# !pip install --upgrade azureml-core azureml-pipeline-core\n",
    "```\n"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "import os\n",
    "from azureml.core import Workspace, Experiment\n",
    "from azureml.pipeline.core import PublishedPipeline, PipelineRun\n",
    "\n",
    "# Adjust these values if not using default environment variables\n",
    "subscription_id = os.getenv(\"AZURE_SUBSCRIPTION_ID\", \"<YOUR-SUBSCRIPTION-ID>\")\n",
    "resource_group = os.getenv(\"AZURE_RG\", \"<YOUR-RESOURCE-GROUP>\")\n",
    "workspace_name = os.getenv(\"AZURE_WORKSPACE_NAME\", \"<YOUR-AML-WORKSPACE>\")\n",
    "\n",
    "# Connect to the AML workspace\n",
    "ws = Workspace.get(\n",
    "    name=workspace_name,\n",
    "    subscription_id=subscription_id,\n",
    "    resource_group=resource_group\n",
    ")\n",
    "print(\"Workspace:\", ws.name, \"loaded.\")"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. Locate the Published RoseTTAFold Pipeline\n",
    "\n",
    "When pipelines were registered, a pipeline named `RoseTTAFold_Pipeline` should have been published in your workspace. Below, we list all pipelines and pick out that one."
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "published_pipelines = PublishedPipeline.list(ws)\n",
    "pipeline_id = None\n",
    "\n",
    "for p in published_pipelines:\n",
    "    print(f\"Found pipeline: {p.name} (ID: {p.id})\")\n",
    "    if p.name == \"RoseTTAFold_Pipeline\":\n",
    "        pipeline_id = p.id\n",
    "\n",
    "if pipeline_id:\n",
    "    print(\"\\nRoseTTAFold_Pipeline found. ID:\", pipeline_id)\n",
    "else:\n",
    "    raise ValueError(\"No published pipeline named 'RoseTTAFold_Pipeline' found.\")"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. Submit a Run of the Pipeline\n",
    "\n",
    "We'll create an **Experiment** in AML and submit the RoseTTAFold pipeline with a sample input. Adjust the input data path and parameters as needed."
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "# Create or use an existing AML experiment\n",
    "experiment = Experiment(ws, \"rosettafold-inference\")\n",
    "\n",
    "published_pipeline = PublishedPipeline.get(ws, pipeline_id)\n",
    "# If your pipeline has parameters, you can pass them like so:\n",
    "# pipeline_parameters = {\"input_sequence\": \"data/sample.fasta\"}\n",
    "\n",
    "pipeline_run = experiment.submit(\n",
    "    published_pipeline,\n",
    "    # pipeline_parameters=pipeline_parameters\n",
    ")\n",
    "print(f\"Submitted pipeline run: {pipeline_run.id}\")\n",
    "pipeline_run.wait_for_completion(show_output=True)\n",
    "\n",
    "# Retrieve final status\n",
    "run_status = pipeline_run.get_status()\n",
    "print(\"\\nPipeline run finished with status:\", run_status)\n",
    "\n",
    "if run_status == \"Failed\":\n",
    "    raise Exception(\"Pipeline run failed. Check logs for details.\")"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4. Inspect Outputs\n",
    "\n",
    "After the run completes, you can check Azure ML Studio → **Jobs** → **Pipeline runs** to see logs and outputs. Below is an example of how you might download outputs programmatically, if the pipeline step has a named output."
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "# Attempt to find a pipeline step for RoseTTAFold output\n",
    "rosetta_step_run = None\n",
    "for step_run in pipeline_run.get_children():\n",
    "    if step_run.name.lower().startswith(\"rosetta\"):\n",
    "        rosetta_step_run = step_run\n",
    "        break\n",
    "\n",
    "if rosetta_step_run:\n",
    "    print(\"RoseTTAFold step run ID:\", rosetta_step_run.id)\n",
    "    rosetta_step_run.download_files(\n",
    "        output_directory=\"./local_rosetta_results\",\n",
    "        prefix=\"rosetta_output\"  # or a prefix your step uses\n",
    "    )\n",
    "    print(\"Downloaded RoseTTAFold results to local_rosetta_results/\")\n",
    "else:\n",
    "    print(\"No step matching 'RoseTTAFold' found. Check pipeline step names.\")"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 5. Next Steps\n",
    "1. **Adjust** the pipeline definition in `rosettafold_pipeline.py` to add more steps (e.g., MSA or post-processing).\n",
    "2. **Change** the environment Dockerfile in `environments/rosettafold_env.dockerfile` if you need different versions or dependencies.\n",
    "3. **Redeploy** pipeline changes by rerunning your GitHub Actions or using the `register_pipelines.py` script.\n",
    "4. **Submit** subsequent runs from here, from the AML Studio UI, or from a custom Python script.\n",
    "\n",
    "You now have a working RoseTTAFold pipeline in Azure ML!"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "name": "python"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
