Skip to content

Conversation

@momchil-flex
Copy link
Collaborator

@momchil-flex momchil-flex commented Nov 8, 2025

Greptile Overview

Greptile Summary

This PR successfully refactors the RF webapi to use a simplified workflow for Component Modeler batch tasks, migrating from legacy tidy3d/tasks/{id}/batch-* endpoints to new rf/task/{id}/* endpoints and removing ~400 lines of complex async split/validation logic.

Key Changes

  • Simplified batch workflow: Removed the _upload_component_modeler_subtasks() function that handled async job splitting and multi-phase validation (~170 lines), as this logic now happens server-side
  • Unified task handling: Introduced WebTask base class with BatchTask and SimulationTask subclasses, enabling consistent routing through TaskFactory.get()
  • Endpoint migration:
    • Task creation: POST rf/task (new) vs POST tidy3d/projects/{folder}/tasks (old)
    • Batch detail: GET rf/task/{id}/statistics (new) vs GET tidy3d/tasks/{id}/batch-detail (old)
    • Batch submit: POST rf/task/{id}/submit (new) vs POST tidy3d/projects/{id}/batch-submit (old)
  • Removed postprocessing logic: Eliminated separate postprocess phase and auto-trigger logic from Job and Batch.monitor() classes
  • Validation deferral: Component modeler simulations now use validate=False in intermediate updated_copy() calls to defer validation, with final validation happening in the base_sim property getter (terminal.py:541)
  • Proper error handling: BatchTask.submit() now raises NotImplementedError for unsupported pay_type/priority parameters instead of silently ignoring them

Implementation Notes

The validation deferral pattern is correctly implemented: intermediate simulation copies skip validation for performance, but the base_sim property (accessed during validate_pre_upload()) performs full validation before upload. Grid size checks were moved earlier to run on the validated base_sim rather than on individual unvalidated simulations.

Confidence Score: 4/5

  • This PR is reasonably safe to merge with proper testing - the refactor is well-structured and removes significant complexity
  • Score of 4 reflects: (1) Large refactor removing ~400 lines and migrating to new endpoints represents moderate risk despite clean implementation. (2) The previously reported critical bug with pay_type/priority parameters has been properly fixed with NotImplementedError exceptions. (3) Validation deferral pattern is correctly implemented with proper final validation. (4) Limited visibility into backend API changes and how new rf/task/* endpoints handle edge cases. (5) Removal of complex async split/validation logic appears sound but should be verified with integration tests.
  • All files look good - no specific files require special attention. Integration testing recommended to verify new RF endpoint behavior.

Important Files Changed

File Analysis

Filename Score Overview
tidy3d/web/api/webapi.py 4/5 Removed complex async split/validation logic (~200 lines), simplified RF batch handling to use new rf/task/* endpoints, removed _upload_component_modeler_subtasks() function
tidy3d/web/core/task_core.py 5/5 Introduced WebTask base class, migrated BatchTask to use rf/task/{id}/* endpoints, properly raises NotImplementedError for unsupported pay_type/priority params
tidy3d/web/api/container.py 5/5 Removed Component Modeler-specific postprocess logic and auto-trigger functionality (~100 lines cleanup), simplified status handling
tidy3d/plugins/smatrix/component_modelers/terminal.py 4/5 Added validate=False to intermediate updated_copy() calls to defer validation, moved grid size checks to run on validated base_sim, final validation happens in base_sim property (line 541)

Sequence Diagram

sequenceDiagram
    participant User
    participant webapi
    participant WebTask
    participant BatchTask
    participant API as RF API (rf/task/*)

    User->>webapi: upload(component_modeler)
    webapi->>webapi: Check task type (RF)
    webapi->>WebTask: create(task_type="RF")
    WebTask->>API: POST rf/task (create group)
    API-->>WebTask: batchId/groupId
    WebTask->>API: Upload modeler.hdf5.gz to S3
    webapi->>webapi: estimate_cost(resource_id)
    webapi-->>User: return batch_id

    User->>webapi: start(batch_id)
    webapi->>BatchTask: get(batch_id)
    BatchTask->>API: GET rf/task/{id}/statistics (verify exists)
    API-->>BatchTask: BatchDetail
    webapi->>BatchTask: submit()
    Note right of BatchTask: pay_type/priority raise<br/>NotImplementedError if provided
    BatchTask->>API: POST rf/task/{id}/submit
    API-->>BatchTask: Response

    User->>webapi: monitor(batch_id)
    
    loop Poll Status
        webapi->>BatchTask: detail()
        BatchTask->>API: GET rf/task/{id}/statistics
        API-->>BatchTask: BatchDetail
        Note right of webapi: No auto-postprocess<br/>triggering (removed)
    end

    User->>webapi: download(batch_id)
    webapi->>BatchTask: get_data_hdf5()
    BatchTask->>API: Download modeler_data.hdf5.gz from S3
    API-->>User: Data file
Loading

@momchil-flex momchil-flex changed the title Momchil/rf refactor SCRF-1630: Refactor RF webapi for new workflow Nov 8, 2025
Copy link
Contributor

@dmarek-flex dmarek-flex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All your changes to TCM seem good to me

@momchil-flex momchil-flex marked this pull request as ready for review November 11, 2025 15:37
Copilot AI review requested due to automatic review settings November 11, 2025 15:37
Copilot finished reviewing on behalf of momchil-flex November 11, 2025 15:41
Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

14 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the RF (component modeler) web API to align with a new backend workflow, simplifying the task lifecycle management and removing manual postprocessing steps.

  • Removes multi-phase task workflows (validate, run, postprocess) in favor of a unified flow
  • Updates RF task creation to use new rf/task endpoints instead of legacy tidy3d/projects endpoints
  • Eliminates manual postprocessing orchestration, now handled automatically by the backend

Reviewed Changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 16 comments.

Show a summary per file
File Description
tidy3d/web/core/task_info.py Removes TaskStatus and BatchStatus enums; adds taskType field to BatchDetail and removes deprecated totalStatus field
tidy3d/web/core/task_core.py Refactors task creation to use RF-specific endpoints; removes wait_for_validate, wait_for_run, and postprocess methods; adds abort method for batch tasks
tidy3d/web/core/http_util.py Temporarily hardcodes version string (should be reverted before production)
tidy3d/web/api/webapi.py Simplifies batch monitoring by removing manual postprocessing logic; updates status checking and error handling for new workflow; removes postprocess_start function
tidy3d/web/api/tidy3d_stub.py Generalizes validation to use else instead of explicit EMESimulation check
tidy3d/web/api/states.py Updates error state names to match new backend conventions (e.g., validate_error instead of validate_fail)
tidy3d/web/api/container.py Removes postprocess_status property and postprocess_start methods from Job and Batch classes
tidy3d/web/init.py Removes postprocess_start from public API exports
tidy3d/plugins/smatrix/component_modelers/terminal.py Adds validate=False to intermediate simulation copies for performance; moves grid checks to base simulation; adds validate_pre_upload method
tidy3d/plugins/smatrix/component_modelers/base.py Adds validate_pre_upload method to base class
tidy3d/config/profiles.py Updates dev profile endpoints to new infrastructure
tidy3d/components/mode/simulation.py Removes source_required parameter from validate_pre_upload signature
tidy3d/components/mode/mode_solver.py Removes source_required parameter from validate_pre_upload signature
docs/api/submit_simulations.rst Removes TaskStatus from documentation
Comments suppressed due to low confidence (1)

tidy3d/web/core/task_core.py:268

  • The payload structure for RF tasks differs from non-RF tasks. The RF task uses groupName and folderId directly while non-RF tasks use taskName. However, line 266 suggests that RF tasks should return a groupId or batchId as the taskId. Ensure that the response parsing logic on lines 264-268 correctly handles both response formats and that downstream code expecting a taskId can work with a groupId/batchId.
        if task_type == "RF":
            payload = {
                "groupName": task_name,
                "folderId": folder.folder_id,
                "fileType": file_type,
            }
            resp = http.post("rf/task", payload)
        else:
            payload = {
                "taskName": task_name,
                "taskType": task_type,
                "callbackUrl": callback_url,
                "simulationType": simulation_type,
                "parentTasks": parent_tasks,
                "fileType": file_type,
            }
            resp = http.post(f"{projects_endpoint}/{folder.folder_id}/tasks", payload)
        # RF group creation may return group-level info without 'taskId'.
        # Use 'groupId' (or 'batchId' as fallback) as the resource id for subsequent uploads.
        if "taskId" not in resp and task_type == "RF":
            # Prefer using 'batchId' as the resource id for uploads (S3 STS expects a task-like id).
            if "batchId" in resp:

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link

Copilot AI commented Nov 12, 2025

@momchil-flex I've opened a new pull request, #2993, to work on those changes. Once the pull request is ready, I'll request review from you.

@daquinteroflex daquinteroflex added rc3 3rd pre-release 2.10 labels Nov 14, 2025
@momchil-flex momchil-flex force-pushed the momchil/rf_refactor branch 2 times, most recently from 0fee7a6 to 23364ac Compare November 14, 2025 13:40
@github-actions
Copy link
Contributor

github-actions bot commented Nov 14, 2025

Diff Coverage

Diff: origin/develop...HEAD, staged and unstaged changes

  • tidy3d/init.py (100%)
  • tidy3d/components/mode/mode_solver.py (100%)
  • tidy3d/components/mode/simulation.py (100%)
  • tidy3d/components/tcad/mesher.py (50.0%): Missing lines 32
  • tidy3d/plugins/smatrix/component_modelers/base.py (50.0%): Missing lines 348
  • tidy3d/plugins/smatrix/component_modelers/modal.py (42.9%): Missing lines 65,379-381
  • tidy3d/plugins/smatrix/component_modelers/terminal.py (81.8%): Missing lines 882-883
  • tidy3d/web/api/container.py (100%)
  • tidy3d/web/api/states.py (100%)
  • tidy3d/web/api/webapi.py (39.3%): Missing lines 117-119,122,125,127-131,134-135,140-141,144,152,716,722-726,737-739,790,952,989,1024,1057,1094,1163,1197,1199-1212,1214,1224-1225,1228,1233,1235-1236,1247-1250,1255-1257,1260-1261,1263-1265,1314-1316,1320,1368-1371,1373,1475-1478,1481-1487,1491,1500-1502,1505,1507-1513,1515-1516,1526,1528
  • tidy3d/web/core/constants.py (100%)
  • tidy3d/web/core/task_core.py (60.9%): Missing lines 203,209,226,230-234,259-260,262,264,298,311-312,322-323,353-354,368,373-380,782,850-853,855,931-932,935-936,953-955,966,976,981-982
  • tidy3d/web/core/task_info.py (100%)
  • tidy3d/web/core/types.py (100%)

Summary

  • Total: 322 lines
  • Missing: 155 lines
  • Coverage: 51%

tidy3d/components/tcad/mesher.py

Lines 28-33

  28     def validate_pre_upload(self):
  29         """Validate the VolumeMesher before uploading to the cloud.
  30         Currently no validation but method is required when calling ``web.upload``.
  31         """
! 32         return

tidy3d/plugins/smatrix/component_modelers/base.py

Lines 344-352

  344         return data.smatrix()
  345 
  346     def validate_pre_upload(self):
  347         """Validate the modeler before upload."""
! 348         self.base_sim.validate_pre_upload(source_required=False)
  349 
  350 
  351 AbstractComponentModeler.update_forward_refs()

tidy3d/plugins/smatrix/component_modelers/modal.py

Lines 61-69

  61 
  62     @property
  63     def base_sim(self):
  64         """The base simulation."""
! 65         return self.simulation
  66 
  67     @cached_property
  68     def sim_dict(self) -> SimulationMap:
  69         """Generates all :class:`.Simulation` objects for the S-matrix calculation.

Lines 375-382

  375         return max_mode_index_out, max_mode_index_in
  376 
  377     def task_name_from_index(self, matrix_index: MatrixIndex) -> str:
  378         """Compute task name for a given (port_name, mode_index) without constructing simulations."""
! 379         port_name, mode_index = matrix_index
! 380         port = self.get_port_by_name(port_name=port_name)
! 381         return self.get_task_name(port=port, mode_index=mode_index)

tidy3d/plugins/smatrix/component_modelers/terminal.py

Lines 878-887

  878         raise Tidy3dKeyError(f"No radiation monitor named '{monitor_name}'.")
  879 
  880     def task_name_from_index(self, source_index: NetworkIndex) -> str:
  881         """Compute task name for a given network index without constructing simulations."""
! 882         port, mode_index = self.network_dict[source_index]
! 883         return self.get_task_name(port=port, mode_index=mode_index)
  884 
  885     def _extrude_port_structures(self, sim: Simulation) -> Simulation:
  886         """
  887         Extrude structures intersecting a port plane when a wave port lies on a structure boundary.

tidy3d/web/api/webapi.py

Lines 113-148

  113     """
  114 
  115     # TODO: test properly
  116     try:
! 117         batch = BatchTask.get(resource_id)
! 118         batch_detail = batch.detail()
! 119         status = batch_detail.status.lower()
  120     except Exception as e:
  121         log.error(f"Could not retrieve batch details for '{resource_id}': {e}")
! 122         raise WebError(f"Failed to retrieve status for batch '{resource_id}'.") from e
  123 
  124     if status not in ERROR_STATES:
! 125         return
  126 
! 127     if hasattr(batch_detail, "validateErrors") and batch_detail.validateErrors:
! 128         try:
! 129             error_details = []
! 130             for key, error_str in batch_detail.validateErrors.items():
! 131                 msg = f"- Subtask '{key}' failed: {error_str}"
  132                 error_details.append(msg)
  133 
! 134             details_string = "\n".join(error_details)
! 135             full_error_msg = (
  136                 "One or more subtasks failed validation. Please fix the component modeler "
  137                 "configuration.\n"
  138                 f"Details:\n{details_string}"
  139             )
! 140         except Exception as e:
! 141             raise WebError(
  142                 "One or more subtasks failed validation. Failed to parse validation errors."
  143             ) from e
! 144         raise WebError(full_error_msg)
  145 
  146     # Handle all other generic error states
  147     else:
  148         error_msg = (

Lines 148-156

  148         error_msg = (
  149             f"Batch '{resource_id}' failed with status '{status}'. Check server "
  150             "logs for details or contact customer support."
  151         )
! 152         raise WebError(error_msg)
  153 
  154 
  155 def _copy_simulation_data_from_cache_entry(entry: CacheEntry, path: PathLike) -> bool:
  156     """

Lines 712-720

  712         Is ``None`` if run info not available.
  713     """
  714     task = TaskFactory.get(task_id)
  715     if isinstance(task, BatchTask):
! 716         raise NotImplementedError("Operation not implemented for modeler batches.")
  717     return task.get_running_info()
  718 
  719 
  720 def _get_batch_detail_handle_error_status(batch: BatchTask) -> BatchDetail:

Lines 718-730

  718 
  719 
  720 def _get_batch_detail_handle_error_status(batch: BatchTask) -> BatchDetail:
  721     """Get batch detail and raise error if status is in ERROR_STATES."""
! 722     detail = batch.detail()
! 723     status = detail.status.lower()
! 724     if status in ERROR_STATES:
! 725         _batch_detail_error(batch.task_id)
! 726     return detail
  727 
  728 
  729 def get_status(task_id: TaskId) -> str:
  730     """Get the status of a task. Raises an error if status is "error".

Lines 733-743

  733     ----------
  734     task_id : str
  735         Unique identifier of task on server.  Returned by :meth:`upload`.
  736     """
! 737     task = TaskFactory.get(task_id)
! 738     if isinstance(task, BatchTask):
! 739         return _get_batch_detail_handle_error_status(task).status
  740     else:
  741         task_info = get_info(task_id)
  742         status = task_info.status
  743         if status == "visualize":

Lines 786-794

  786 
  787     # Batch/modeler monitoring path
  788     task = TaskFactory.get(task_id)
  789     if isinstance(task, BatchTask):
! 790         return _monitor_modeler_batch(task_id, verbose=verbose)
  791 
  792     console = get_logging_console() if verbose else None
  793 
  794     task_info = get_info(task_id)

Lines 948-956

  948     console = get_logging_console()
  949 
  950     task = TaskFactory.get(task_id, verbose=False)
  951     if not task:
! 952         return None
  953     url = task.get_url()
  954     task.abort()
  955     console.log(
  956         f"Task is aborting. View task using web UI at [link={url}]'{url}'[/link] to check the result."

Lines 985-993

  985     task = TaskFactory.get(task_id, verbose=False)
  986     if isinstance(task, BatchTask):
  987         if path.name == "simulation_data.hdf5":
  988             path = path.with_name("cm_data.hdf5")
! 989         task.get_data_hdf5(
  990             to_file=path,
  991             remote_data_file_gz=CM_DATA_HDF5_GZ,
  992             verbose=verbose,
  993             progress_callback=progress_callback,

Lines 1020-1028

  1020 
  1021     """
  1022     task = TaskFactory.get(task_id, verbose=False)
  1023     if isinstance(task, BatchTask):
! 1024         raise NotImplementedError("Operation not implemented for modeler batches.")
  1025     task.get_simulation_json(path, verbose=verbose)
  1026 
  1027 
  1028 @wait_for_connection

Lines 1053-1061

  1053         Simulation loaded from downloaded json file.
  1054     """
  1055     task = TaskFactory.get(task_id, verbose=False)
  1056     if isinstance(task, BatchTask):
! 1057         raise NotImplementedError("Operation not implemented for modeler batches.")
  1058     path = Path(path)
  1059     if path.suffix == ".json":
  1060         task.get_simulation_json(path, verbose=verbose)
  1061     elif path.suffix == ".hdf5":

Lines 1090-1098

  1090     To load downloaded results into data, call :meth:`load` with option ``replace_existing=False``.
  1091     """
  1092     task = TaskFactory.get(task_id, verbose=False)
  1093     if isinstance(task, BatchTask):
! 1094         raise NotImplementedError("Operation not implemented for modeler batches.")
  1095     task.get_log(path, verbose=verbose, progress_callback=progress_callback)
  1096 
  1097 
  1098 @wait_for_connection

Lines 1159-1167

  1159 
  1160     if verbose and task_id is not None:
  1161         console = get_logging_console()
  1162         if isinstance(task, BatchTask):
! 1163             console.log(f"Loading component modeler data from {path}")
  1164         else:
  1165             console.log(f"Loading simulation from {path}")
  1166 
  1167     stub_data = Tidy3dStubData.postprocess(path, lazy=lazy)

Lines 1193-1218

  1193 
  1194 
  1195 def _status_to_stage(status: str) -> tuple[str, int]:
  1196     """Map task status to monotonic stage for progress bars."""
! 1197     s = (status or "").lower()
  1198     # Map a broader set of states to monotonic stages for progress bars
! 1199     if s in ("draft", "created"):
! 1200         return ("draft", 0)
! 1201     if s in ("queue", "queued"):
! 1202         return ("queued", 1)
! 1203     if s in ("validating",):
! 1204         return ("validating", 2)
! 1205     if s in ("validate_success", "validate_warn", "preprocess", "preprocessing"):
! 1206         return ("preprocess", 3)
! 1207     if s in ("running", "preprocess_success"):
! 1208         return ("running", 4)
! 1209     if s in ("run_success", "postprocess"):
! 1210         return ("postprocess", 5)
! 1211     if s in ("success", "postprocess_success"):
! 1212         return ("success", 6)
  1213     # Unknown states map to earliest stage to avoid showing 100% prematurely
! 1214     return (s or "unknown", 0)
  1215 
  1216 
  1217 def _monitor_modeler_batch(
  1218     task_id: str,

Lines 1220-1240

  1220     max_detail_tasks: int = 20,
  1221 ) -> None:
  1222     """Monitor modeler batch progress with aggregate and per-task views."""
  1223     console = get_logging_console() if verbose else None
! 1224     task = BatchTask.get(task_id=task_id)
! 1225     detail = _get_batch_detail_handle_error_status(task)
  1226     name = detail.name or "modeler_batch"
  1227     group_id = detail.groupId
! 1228     status = detail.status.lower()
  1229 
  1230     # Non-verbose path: poll without progress bars then return
  1231     if not verbose:
  1232         # Run phase
! 1233         while _status_to_stage(status)[0] not in END_STATES:
  1234             time.sleep(REFRESH_TIME)
! 1235             detail = _get_batch_detail_handle_error_status(task)
! 1236             status = detail.status.lower()
  1237 
  1238         return
  1239 
  1240     progress_columns = (

Lines 1243-1269

  1243         TaskProgressColumn(),
  1244         TimeElapsedColumn(),
  1245     )
  1246     # Make the header
! 1247     header = f"Subtasks status - {name}"
! 1248     if group_id:
! 1249         header += f"\nGroup ID: '{group_id}'"
! 1250     console.log(header)
  1251     with Progress(*progress_columns, console=console, transient=False) as progress:
  1252         # Phase: Run (aggregate + per-task)
  1253         p_run = progress.add_task("Run Total", total=1.0)
  1254         task_bars: dict[str, int] = {}
! 1255         stage = _status_to_stage(status)[0]
! 1256         prev_stage = _status_to_stage(status)[0]
! 1257         console.log(f"Batch status = {status}")
  1258 
  1259         # Note: get_status errors if an erroring status occurred
! 1260         while stage not in END_STATES:
! 1261             total = len(detail.tasks)
  1262             r = detail.runSuccess or 0
! 1263             if stage != prev_stage:
! 1264                 prev_stage = stage
! 1265                 console.log(f"Batch status = {stage}")
  1266 
  1267             # Create per-task bars as soon as tasks appear
  1268             if total and total <= max_detail_tasks and detail.tasks:
  1269                 name_to_task = {(t.taskName or t.taskId): t for t in (detail.tasks or [])}

Lines 1310-1324

  1310                     )
  1311 
  1312             progress.refresh()
  1313             time.sleep(REFRESH_TIME)
! 1314             detail = _get_batch_detail_handle_error_status(task)
! 1315             status = detail.status.lower()
! 1316             stage = _status_to_stage(status)[0]
  1317 
  1318         if console is not None:
  1319             console.log("Modeler has finished running successfully.")
! 1320             real_cost(task.task_id, verbose=verbose)
  1321 
  1322 
  1323 @wait_for_connection
  1324 def delete(task_id: TaskId, versions: bool = False) -> TaskInfo:

Lines 1364-1377

  1364     progress_callback : Callable[[float], None] = None
  1365         Optional callback function called when downloading file with ``bytes_in_chunk`` as argument.
  1366 
  1367     """
! 1368     task = TaskFactory.get(task_id, verbose=False)
! 1369     if isinstance(task, BatchTask):
! 1370         raise NotImplementedError("Operation not implemented for modeler batches.")
! 1371     info = get_info(task_id, verbose=False)
  1372     remote_sim_file = SIM_FILE_HDF5_GZ
! 1373     if info.taskType == "MODE_SOLVER":
  1374         remote_sim_file = MODE_FILE_HDF5_GZ
  1375     task.get_simulation_hdf5(
  1376         path,
  1377         verbose=verbose,

Lines 1471-1495

  1471 
  1472     task = TaskFactory.get(task_id, verbose=False)
  1473     detail = task.detail()
  1474     if isinstance(task, BatchTask):
! 1475         check_task_type = "FDTD" if detail.taskType == "MODAL_CM" else "RF_FDTD"
! 1476         task.check(solver_version=solver_version, check_task_type=check_task_type)
! 1477         detail = task.detail()
! 1478         status = detail.status.lower()
  1479         while status not in ALL_POST_VALIDATE_STATES:
  1480             time.sleep(REFRESH_TIME)
! 1481             detail = task.detail()
! 1482             status = detail.status.lower()
! 1483         if status in ERROR_STATES:
! 1484             _batch_detail_error(resource_id=task_id)
! 1485         est_flex_unit = detail.estFlexUnit
! 1486         if verbose:
! 1487             console.log(
  1488                 f"Maximum FlexCredit cost: {est_flex_unit:1.3f}. Minimum cost depends on "
  1489                 "task execution details. Use 'web.real_cost(task_id)' after run."
  1490             )
! 1491         return est_flex_unit
  1492 
  1493     # simulation path
  1494     task.estimate_cost(solver_version=solver_version)
  1495     task_info = get_info(task_id)

Lines 1496-1520

  1496     status = task_info.metadataStatus
  1497 
  1498     # Wait for a termination status
  1499     while status not in ALL_POST_VALIDATE_STATES:
! 1500         time.sleep(REFRESH_TIME)
! 1501         task_info = get_info(task_id)
! 1502         status = task_info.metadataStatus
  1503 
  1504     if status in ERROR_STATES:
! 1505         try:
  1506             # Try to obtain the error message
! 1507             task = SimulationTask(taskId=task_id)
! 1508             with tempfile.NamedTemporaryFile(suffix=".json") as tmp_file:
! 1509                 task.get_error_json(to_file=tmp_file.name, validation=True)
! 1510                 with open(tmp_file.name) as f:
! 1511                     error_content = json.load(f)
! 1512                     error_msg = error_content["validation_error"]
! 1513         except Exception:
  1514             # If the error message could not be obtained, raise a generic error message
! 1515             error_msg = "Error message could not be obtained, please contact customer support."
! 1516         raise WebError(f"Error estimating cost for task {task_id}! {error_msg}")
  1517     if verbose:
  1518         console.log(
  1519             f"Estimated FlexCredit cost: {task_info.estFlexUnit:1.3f}. Minimum cost depends on "
  1520             "task execution details. Use 'web.real_cost(task_id)' to get the billed FlexCredit "

Lines 1522-1532

  1522         )
  1523         fc_mode = task_info.estFlexCreditMode
  1524         fc_post = task_info.estFlexCreditPostProcess
  1525         if fc_mode:
! 1526             console.log(f"  {fc_mode:1.3f} FlexCredit of the total cost from mode solves.")
  1527         if fc_post:
! 1528             console.log(f"  {fc_post:1.3f} FlexCredit of the total cost from post-processing.")
  1529     return task_info.estFlexUnit
  1530 
  1531 
  1532 @wait_for_connection

tidy3d/web/core/task_core.py

Lines 199-207

  199 
  200         folder = Folder.get(folder_name, create=True)
  201 
  202         if task_type in ["RF", "TERMINAL_CM", "MODAL_CM"]:
! 203             payload = {
  204                 "groupName": task_name,
  205                 "folderId": folder.folder_id,
  206                 "fileType": file_type,
  207                 "taskType": task_type,

Lines 205-213

  205                 "folderId": folder.folder_id,
  206                 "fileType": file_type,
  207                 "taskType": task_type,
  208             }
! 209             resp = http.post("rf/task", payload)
  210         else:
  211             payload = {
  212                 "taskName": task_name,
  213                 "taskType": task_type,

Lines 222-238

  222 
  223     def get_url(self) -> str:
  224         base = str(config.web.website_endpoint or "")
  225         if isinstance(self, BatchTask):
! 226             return "/".join([base.rstrip("/"), f"rf?taskId={self.task_id}"])
  227         return "/".join([base.rstrip("/"), f"workbench?taskId={self.task_id}"])
  228 
  229     def get_folder_url(self) -> Optional[str]:
! 230         folder_id = getattr(self, "folder_id", None)
! 231         if not folder_id:
! 232             return None
! 233         base = str(config.web.website_endpoint or "")
! 234         return "/".join([base.rstrip("/"), f"folders/{folder_id}"])
  235 
  236     def get_log(
  237         self,
  238         to_file: PathLike,

Lines 255-268

  255         path: pathlib.Path
  256             Path to saved file.
  257         """
  258 
! 259         if not self.task_id:
! 260             raise WebError("Expected field 'task_id' is unset.")
  261 
! 262         target_path = pathlib.Path(to_file)
  263 
! 264         return download_file(
  265             self.task_id,
  266             SIM_LOG_FILE,
  267             to_file=target_path,
  268             verbose=verbose,

Lines 294-302

  294         pathlib.Path
  295             Saved local path.
  296         """
  297         if not self.task_id:
! 298             raise WebError("Expected field 'task_id' is unset.")
  299         target_path = pathlib.Path(to_file)
  300         file = None
  301         try:
  302             file = download_gz_file(

Lines 307-316

  307                 progress_callback=progress_callback,
  308             )
  309         except ClientError:
  310             if verbose:
! 311                 console = get_logger_console()
! 312                 console.log(f"Unable to download '{remote_data_file_gz}'.")
  313         if not file:
  314             try:
  315                 file = download_file(
  316                     resource_id=self.task_id,

Lines 318-327

  318                     to_file=target_path,
  319                     verbose=verbose,
  320                     progress_callback=progress_callback,
  321                 )
! 322             except Exception as e:
! 323                 raise WebError(
  324                     "Failed to download the data file from the server. "
  325                     "Please confirm that the task completed successfully."
  326                 ) from e
  327         return file

Lines 349-358

  349             resp = http.get(
  350                 f"rf/task/{resource_id}/statistics",
  351                 suppress_404=True,
  352             )
! 353             status = bool(resp and isinstance(resp, dict) and "status" in resp)
! 354             return status
  355         except Exception:
  356             return False
  357 
  358     def delete(self, versions: bool = False) -> None:

Lines 364-384

  364             If ``True``, delete all versions of the task in the task group. Otherwise, delete only
  365             the version associated with the current task ID.
  366         """
  367         if not self.task_id:
! 368             raise ValueError("Task id not found.")
  369 
  370         task_details = self.detail().dict()
  371 
  372         if task_details and "groupId" in task_details:
! 373             group_id = task_details["groupId"]
! 374             if versions:
! 375                 http.delete("tidy3d/group", json={"groupIds": [group_id]})
! 376                 return
! 377             elif "version" in task_details:
! 378                 version = task_details["version"]
! 379                 http.delete(f"tidy3d/group/{group_id}/versions", json={"versions": [version]})
! 380                 return
  381 
  382         # Fallback to old method if we can't get the groupId and version
  383         http.delete(f"tidy3d/tasks/{self.task_id}")

Lines 778-786

  778         if not self.task_id:
  779             raise WebError("Expected field 'task_id' is unset.")
  780 
  781         target_path = pathlib.Path(to_file)
! 782         target_file = SIM_ERROR_FILE if not validation else SIM_VALIDATION_FILE
  783 
  784         return download_file(
  785             self.task_id,
  786             target_file,

Lines 846-859

  846         :class:`.BatchTask` | None
  847             BatchTask object if found, otherwise None.
  848         """
  849         try:
! 850             resp = http.get(f"rf/task/{task_id}/statistics")
! 851         except WebNotFoundError as e:
! 852             td.log.error(f"The requested batch ID '{task_id}' does not exist.")
! 853             raise e
  854         # We only need to validate existence; store id on the instance.
! 855         return BatchTask(taskId=task_id) if resp else None
  856 
  857     def detail(self) -> BatchDetail:
  858         """Fetches the detailed information and status of the batch.

Lines 927-940

  927             The server's response to the submit request.
  928         """
  929 
  930         # TODO: add support for pay_type and priority arguments
! 931         if pay_type != PayType.AUTO:
! 932             raise NotImplementedError(
  933                 "The 'pay_type' argument is not yet supported and will be ignored."
  934             )
! 935         if priority is not None:
! 936             raise NotImplementedError(
  937                 "The 'priority' argument is not yet supported and will be ignored."
  938             )
  939 
  940         if protocol_version is None:

Lines 949-959

  949         )
  950 
  951     def abort(self) -> requests.Response:
  952         """Abort the current task on the server."""
! 953         if not self.task_id:
! 954             raise ValueError("Batch id not found.")
! 955         return http.put(f"rf/task/{self.task_id}/abort", {})
  956 
  957 
  958 class TaskFactory:
  959     """Factory for obtaining the correct task subclass."""

Lines 962-970

  962 
  963     @classmethod
  964     def reset(cls) -> None:
  965         """Clear the cached task kind registry (used in tests)."""
! 966         cls._REGISTRY.clear()
  967 
  968     @classmethod
  969     def register(cls, task_id: str, kind: str) -> None:
  970         cls._REGISTRY[task_id] = kind

Lines 972-986

  972     @classmethod
  973     def get(cls, task_id: str, verbose: bool = True) -> WebTask:
  974         kind = cls._REGISTRY.get(task_id)
  975         if kind == "batch":
! 976             return BatchTask.get(task_id, verbose=verbose)
  977         if kind == "simulation":
  978             task = SimulationTask.get(task_id, verbose=verbose)
  979             return task
  980         if WebTask.is_batch(task_id):
! 981             cls.register(task_id, "batch")
! 982             return BatchTask.get(task_id, verbose=verbose)
  983         task = SimulationTask.get(task_id, verbose=verbose)
  984         if task:
  985             cls.register(task_id, "simulation")
  986         return task

@daquinteroflex
Copy link
Collaborator

So wanted to just check a few final things before merging:

image

So we're not giving them a way to check a batch sub-simulation log if an error occurs out of curiosity? Maybe this could be a future thing? I guess the expectation is that this does not happen in general if it passed the validation, but do you think it will propagate cases where it diverges etc?

It looks like it's handling properly a failing initial simulation in terms of the monitoring, but this is maybe something worth discussing
image

image It looks really nice other than those questions, and probably good to merge

Copy link
Collaborator

@yaugenst-flex yaugenst-flex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we have this branching between regular and component modeler tasks, wouldn't all the webapi-related functions now need to gate behind _is_modeler_batch, e.g. download_simulation(), get_run_info(), etc, to then call the RF endpoints? Currently only download() and upload() seem to be handled correctly?

@momchil-flex
Copy link
Collaborator Author

Since we have this branching between regular and component modeler tasks, wouldn't all the webapi-related functions now need to gate behind _is_modeler_batch, e.g. download_simulation(), get_run_info(), etc, to then call the RF endpoints? Currently only download() and upload() seem to be handled correctly?

Yeah you are right.

Maybe before making any changes in that direction, we can discuss what this should look like? Either in a perfect world or, and more relevant, given what we currently have as backend and endpoints.

On one hand, it feels like we shouldn't be working with task id-s directly but rather with something like the existing SimulationTask and BatchTask objects, i.e. passing around something like a WebTask subclass. However, for the user-facing functions, using a task_id is quite deeply ingrained, and one could argue is the right approach as e.g. the user can copy the task id directly from GUI? On the other hand wrapping it in a SimulationTask(task_id) before using all the web functions is maybe not that much to ask for - but definitely a breaking change.

The other thing I'm wondering about is avoiding all the api calls just to determine if something is a batch or not based on the task id. One option that comes to mind is to use the prefixes (e.g. currently modeler tasks are sid-***) but that doesn't strike me as a particularly good choice.

@momchil-flex momchil-flex force-pushed the momchil/rf_refactor branch 2 times, most recently from ab7d945 to 24fe116 Compare November 18, 2025 13:34
@yaugenst-flex
Copy link
Collaborator

So apart from addressing all review comments, I added some big updates along the lines of unifying SimulationTask and BatchTask and avoiding too many api calls to determine if something is a batch or not. These were introduced in this commit, notably a TaskFactory getter that creates the appropriate SimulationTask or BatchTask, and stores a registry mapping of task id-s to whether they are batch or simulation, so the web api should only be accessed once for task.

I'm done with chaages as far as I'm concerned, but the problem is that some web tests seem to pretty reliably fail with an API key or connection error, indicating that for some reason something is trying to use the server rather than the mock. What's confusing though is that I can't reproduce this locally, even after turning of my internet and removing the config file with my API key: locally tests still pass for me. Any idea @yaugenst-flex ?

left a couple of comments which are what i suspect is breaking. as to why this passes locally for you, i suspect this has to do with how tests are executed because i think the TaskFactory makes things depend on how tests are scheduled. so if TaskFactory.get() is called before anything has registered the TASK_ID="1234" then this will do the http call, but if an earlier test like test_start runs (which do use mock_get_info) then later tests will skip the http.

this is bad 😄 we now have (even more) stateful tests. i think cleanest would be to itnroduce a fixture that is automatically used that clears the task factory registry

Copy link
Collaborator

@yaugenst-flex yaugenst-flex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here you go

@momchil-flex
Copy link
Collaborator Author

I could resolve most failing tests easily in the end (thanks @yaugenst-flex) but there is one that's pointing to a bit of a problem, the local cache one. The problem is that to use the local cache, the user would still need internet connection currently because the check of whether a task is a simulation or a batch task is done through a web api call. The problem existed already since rc2 due to the _is_modeler_batch call in web.load, but it is only caught by the test now because TaskFactory calls SimulationTask.get, which calls the detail api directly. The local cache tests mock the webapi.get_info function which ws previously used in the _is_modeler_batch.

I can of course mock the SimulationTask.get response too, but it seems to me that it's good that this was caught because do we want to have the local cache run fine without internet connection? @marcorudolphflex what do you think?

@momchil-flex
Copy link
Collaborator Author

Fixed the tests (let's see the actions...), added a fixture to clear the TaskFactory registry.

I did end up monkeypatching SimulationTask.get to go around the cache load failure - this was already an issue in the existing code and maybe not needed to fix right away, but if we don't fix now then I'll make a ticket to follow up.

@marcorudolphflex
Copy link
Contributor

I could resolve most failing tests easily in the end (thanks @yaugenst-flex) but there is one that's pointing to a bit of a problem, the local cache one. The problem is that to use the local cache, the user would still need internet connection currently because the check of whether a task is a simulation or a batch task is done through a web api call. The problem existed already since rc2 due to the _is_modeler_batch call in web.load, but it is only caught by the test now because TaskFactory calls SimulationTask.get, which calls the detail api directly. The local cache tests mock the webapi.get_info function which ws previously used in the _is_modeler_batch.

I can of course mock the SimulationTask.get response too, but it seems to me that it's good that this was caught because do we want to have the local cache run fine without internet connection? @marcorudolphflex what do you think?

@momchil-flex I agree that this should run without internet connection. Perhaps we should store the information in the cache metadata if a task is a modeler batch? Unfortunately, this would produce ugly code add-ons as we have pass around the cache entry or is_modeler_batch information.

@momchil-flex
Copy link
Collaborator Author

I could resolve most failing tests easily in the end (thanks @yaugenst-flex) but there is one that's pointing to a bit of a problem, the local cache one. The problem is that to use the local cache, the user would still need internet connection currently because the check of whether a task is a simulation or a batch task is done through a web api call. The problem existed already since rc2 due to the _is_modeler_batch call in web.load, but it is only caught by the test now because TaskFactory calls SimulationTask.get, which calls the detail api directly. The local cache tests mock the webapi.get_info function which ws previously used in the _is_modeler_batch.
I can of course mock the SimulationTask.get response too, but it seems to me that it's good that this was caught because do we want to have the local cache run fine without internet connection? @marcorudolphflex what do you think?

@momchil-flex I agree that this should run without internet connection. Perhaps we should store the information in the cache metadata if a task is a modeler batch? Unfortunately, this would produce ugly code add-ons as we have pass around the cache entry or is_modeler_batch information.

This sounds related to the other discussion about storing a task type to get a default file name. It's actually 100% related because if we have the task type we can know whether it's a batch or not.

@marcorudolphflex
Copy link
Contributor

This sounds related to the other discussion about storing a task type to get a default file name. It's actually 100% related because if we have the task type we can know whether it's a batch or not.

Yup. Should I handle this in a separate PR?

@momchil-flex
Copy link
Collaborator Author

Yeah I think separate PR makes sense!

@momchil-flex
Copy link
Collaborator Author

@daquinteroflex I added one commit to remove compose_modeler in favor of the new Tidy3dBaseModel.from_file (another place where we had something like that implemented locally). I also added Tidy3dBaseModel to the td namespace because it's awkward to use otherwise if it has to be imported from components.base every time. Does that sound good @yaugenst-flex ?

@yaugenst-flex
Copy link
Collaborator

@daquinteroflex I added one commit to remove compose_modeler in favor of the new Tidy3dBaseModel.from_file (another place where we had something like that implemented locally). I also added Tidy3dBaseModel to the td namespace because it's awkward to use otherwise if it has to be imported from components.base every time. Does that sound good @yaugenst-flex ?

Well is Tidy3dBaseModel public or not? 😄 It's fine for now, at least until we better define that

@momchil-flex
Copy link
Collaborator Author

@daquinteroflex I added one commit to remove compose_modeler in favor of the new Tidy3dBaseModel.from_file (another place where we had something like that implemented locally). I also added Tidy3dBaseModel to the td namespace because it's awkward to use otherwise if it has to be imported from components.base every time. Does that sound good @yaugenst-flex ?

Well is Tidy3dBaseModel public or not? 😄 It's fine for now, at least until we better define that

I think it's convenient for users to be able to do td.Tidy3dBaseModel.from_file() on anything. Whether this means making Tidy3dBaseModel public or that we should move this somewhere else can be discussed?

@yaugenst-flex
Copy link
Collaborator

yeah it's fine. i guess as a user facing thing it's a bit cumbersome (naming) but still nice to have

@momchil-flex
Copy link
Collaborator Author

@copilot what do you think

Copy link

Copilot AI commented Nov 20, 2025

@momchil-flex I've opened a new pull request, #3016, to work on those changes. Once the pull request is ready, I'll request review from you.

@yaugenst-flex
Copy link
Collaborator

@greptile review again

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

28 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

@momchil-flex
Copy link
Collaborator Author

@daquinteroflex just pushed two more commits with further cleanup and generalization and looking ahead towards properly handling taskType again. Take a look if you want.

@momchil-flex momchil-flex added this pull request to the merge queue Nov 21, 2025
@momchil-flex momchil-flex removed this pull request from the merge queue due to a manual request Nov 21, 2025
unifying

add not implemented errors

introduce WebTask from which SimulationTask and BatchTask subclass

fix to taskType

Test fixes and fixture to clear TaskFactory registry

Remove compose_modeler in favor of Tidy3dBaseModel.from_file

Unifying detail and delete methods

Passing taskType, cleanup and using MODAL_CM and TERMINAL_CM

cleanup
@momchil-flex momchil-flex added this pull request to the merge queue Nov 21, 2025
Merged via the queue into develop with commit 1379f16 Nov 21, 2025
19 checks passed
@momchil-flex momchil-flex deleted the momchil/rf_refactor branch November 21, 2025 15:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

2.10 rc3 3rd pre-release

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants