-
Notifications
You must be signed in to change notification settings - Fork 66
feat(tidy3d): FXC-3722 better file path handling #2915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additional Comments (8)
-
tidy3d/plugins/smatrix/run.py, line 31 (link)syntax: docstring type should be PathLike, not str
-
tidy3d/plugins/dispersion/fit.py, line 738 (link)logic: Cast
fnameto string withstr(fname)before passing tonp.loadtxt, as numpy may not accept all PathLike objects. Does np.loadtxt handle all PathLike types correctly in the numpy version used, or should explicit conversion be added? -
tidy3d/updater.py, line 97 (link)syntax: Using
inoperator on PathLike will fail. Convert to string first: -
tidy3d/updater.py, line 103 (link)syntax: Using
inoperator on PathLike will fail. Convert to string first: -
tidy3d/updater.py, line 105 (link)syntax: Using
inoperator on PathLike will fail. Convert to string first: -
tidy3d/log.py, line 427 (link)style: Pass
fnamedirectly sinceopen()accepts PathLike objects natively in Python 3.6+; no need to convert to string first. -
tidy3d/material_library/material_library.py, line 66-67 (link)logic: PathLike parameter needs explicit conversion to string when passed to
open(). Usestr(fname)oros.fspath(fname)to ensure compatibility. -
tidy3d/web/api/autograd/autograd.py, line 844-849 (link)logic: Inconsistency:
path_dir_adjis a Path object but converted to string when passed to_run_async_tidy3d. Line 712 passespath_dir_adjdirectly as a Path. Check if_run_async_tidy3dconsistently handles both Path and string. Does_run_async_tidy3daccept PathLike objects for itspath_dirparameter, or does it require strings?
27 files reviewed, 35 comments
Diff CoverageDiff: origin/develop...HEAD, staged and unstaged changes
Summary
tidy3d/components/base.pyLines 118-126 118 valid_extensions = [".json", ".yaml", ".hdf5", ".h5", ".hdf5.gz"]
119 path = Path(fname)
120 extensions = [s.lower() for s in path.suffixes[-2:]]
121 if len(extensions) == 0:
! 122 raise FileError(f"File '{path}' missing extension.")
123 single_extension = extensions[-1]
124 if single_extension in valid_extensions:
125 return single_extension
126 double_extension = "".join(extensions)tidy3d/components/data/data_array.pyLines 262-275 262
263 @classmethod
264 def from_file(cls, fname: PathLike, group_path: str) -> Self:
265 """Load an DataArray from an hdf5 file with a given path to the group."""
! 266 path = pathlib.Path(fname)
! 267 if not any(suffix.lower() == ".hdf5" for suffix in path.suffixes):
268 raise FileError(
269 f"'DataArray' objects must be written to '.hdf5' format. Given filename of {path}."
270 )
! 271 return cls.from_hdf5(fname=path, group_path=group_path)
272
273 def __hash__(self) -> int:
274 """Generate hash value for a :class:`.DataArray` instance, needed for custom components."""
275 import dasktidy3d/updater.pyLines 92-108 92
93 @classmethod
94 def from_file(cls, fname: PathLike) -> Updater:
95 """Dictionary representing the simulation loaded from file."""
! 96 path = Path(fname)
97 # TODO: fix this, it broke
! 98 if path.suffix in {".hdf5", ".gz"}:
! 99 sim_dict = Tidy3dBaseModel.from_file(fname=str(path)).dict()
100 else:
! 101 with path.open(encoding="utf-8") as f:
! 102 if path.suffix == ".json":
103 sim_dict = json.load(f)
! 104 elif path.suffix == ".yaml":
105 sim_dict = yaml.safe_load(f)
106 else:
107 raise FileError('file extension must be ".json", ".yaml", ".hdf5", or ".gz"')
108 return cls(sim_dict=sim_dict)tidy3d/web/api/autograd/engine.pyLines 28-39 28 upload_sim_fields_keys(run_kwargs["sim_fields_keys"], task_id=job.task_id, verbose=verbose)
29 path = Path(run_kwargs.get("path", DEFAULT_DATA_PATH))
30 priority = run_kwargs.get("priority")
31 if task_name.endswith("_adjoint"):
! 32 suffixes = "".join(path.suffixes)
! 33 base_name = path.name
! 34 base_without_suffix = base_name[: -len(suffixes)] if suffixes else base_name
! 35 path = path.with_name(f"{base_without_suffix}_adjoint{suffixes}")
36 data = job.run(path, priority=priority)
37 return data, job.task_id
38 Lines 62-70 62 for task_name, sim_fields_keys in run_kwargs["sim_fields_keys_dict"].items():
63 task_id = task_ids[task_name]
64 upload_sim_fields_keys(sim_fields_keys, task_id=task_id, verbose=verbose)
65
! 66 if path_dir is not None:
67 batch_data = batch.run(path_dir, priority=priority)
68 else:
69 batch_data = batch.run(priority=priority)tidy3d/web/api/container.pyLines 593-604 593 :class:`BatchData`
594 Contains Union[:class:`.SimulationData`, :class:`.HeatSimulationData`, :class:`.EMESimulationData`]
595 for each Union[:class:`.Simulation`, :class:`.HeatSimulation`, :class:`.EMESimulation`] in :class:`Batch`.
596 """
! 597 base_dir = Path(path_dir)
! 598 batch_file = Batch._batch_path(path_dir=base_dir)
599 batch = Batch.from_file(batch_file)
! 600 return batch.load(path_dir=base_dir, replace_existing=replace_existing)
601
602
603 class Batch(WebContainer):
604 """Lines 981-989 981 f"File '{job_path}' already exists. Skipping download "
982 "(set `replace_existing=True` to overwrite)."
983 )
984 return
! 985 log.info(f"File '{job_path}' already exists. Overwriting.")
986
987 downloads_started.add(task_id)
988 download_futures[task_id] = download_executor.submit(job.download, job_path)Lines 1192-1201 1192 self.to_file(self._batch_path(path_dir=path_dir))
1193
1194 num_existing = 0
1195 for _, job in self.jobs.items():
! 1196 job_path = self._job_data_path(task_id=job.task_id, path_dir=path_dir)
! 1197 if job_path.exists():
1198 num_existing += 1
1199 if num_existing > 0:
1200 files_plural = "files have" if num_existing > 1 else "file has"
1201 log.warning(Lines 1207-1220 1207
1208 with ThreadPoolExecutor(max_workers=self.num_workers) as executor:
1209 fns = []
1210 for task_name, job in self.jobs.items():
! 1211 job_path = self._job_data_path(task_id=job.task_id, path_dir=path_dir)
! 1212 if job_path.exists():
1213 if replace_existing:
! 1214 log.info(f"File '{job_path}' already exists. Overwriting.")
1215 else:
! 1216 log.info(f"File '{job_path}' already exists. Skipping.")
1217 continue
1218 if "error" in job.status:
1219 log.warning(f"Not downloading '{task_name}' as the task errored.")
1220 continueLines 1218-1227 1218 if "error" in job.status:
1219 log.warning(f"Not downloading '{task_name}' as the task errored.")
1220 continue
1221
! 1222 def fn(job=job, job_path=job_path) -> None:
! 1223 return job.download(path=job_path)
1224
1225 fns.append(fn)
1226
1227 futures = [executor.submit(fn) for fn in fns]tidy3d/web/api/tidy3d_stub.pyLines 98-108 98 extension = _get_valid_extension(path)
99 if extension == ".json":
100 json_str = read_simulation_from_json(path)
101 elif extension == ".hdf5":
! 102 json_str = read_simulation_from_hdf5(path)
103 elif extension == ".hdf5.gz":
! 104 json_str = read_simulation_from_hdf5_gz(path)
105
106 data = json.loads(json_str)
107 type_ = data["type"]Lines 239-251 239 """
240 path = Path(file_path)
241 extension = _get_valid_extension(path)
242 if extension == ".json":
! 243 json_str = read_simulation_from_json(path)
244 elif extension == ".hdf5":
245 json_str = read_simulation_from_hdf5(path)
246 elif extension == ".hdf5.gz":
! 247 json_str = read_simulation_from_hdf5_gz(path)
248
249 data = json.loads(json_str)
250 type_ = data["type"]tidy3d/web/api/webapi.pyLines 1064-1073 1064
1065 if _is_modeler_batch(task_id):
1066 # Use a more descriptive default filename for component modeler downloads.
1067 # If the caller left the default as 'simulation_data.hdf5', prefer 'cm_data.hdf5'.
! 1068 if path.name == "simulation_data.hdf5":
! 1069 path = path.with_name("cm_data.hdf5")
1070
1071 def _download_cm() -> bool:
1072 try:
1073 BatchTask(task_id).get_data_hdf5(Lines 1244-1255 1244 # For component modeler batches, default to a clearer filename if the default was used.
1245 path = Path(path)
1246
1247 if _is_modeler_batch(task_id):
! 1248 if path.name == "simulation_data.hdf5":
! 1249 path = path.with_name("cm_data.hdf5")
! 1250 elif path.name == "simulation_data.hdf5.gz":
! 1251 path = path.with_name("cm_data.hdf5.gz")
1252
1253 if not path.exists() or replace_existing:
1254 download(
1255 task_id=task_id,tidy3d/web/core/s3utils.pyLines 202-210 202 -------
203 _S3STSToken
204 The S3 STS token.
205 """
! 206 file_name = str(Path(file_name).as_posix())
207 cache_key = f"{resource_id}:{file_name}"
208 if cache_key not in _s3_sts_tokens or _s3_sts_tokens[cache_key].is_expired():
209 method = f"tidy3d/py/tasks/{resource_id}/file?filename={file_name}"
210 if extra_arguments is not None:Lines 240-248 240 extra_arguments : Mapping[str, str]
241 Additional arguments used to specify the upload bucket.
242 """
243
! 244 path = Path(path)
245 token = get_s3_sts_token(resource_id, remote_filename, extra_arguments)
246
247 def _upload(_callback: Callable) -> None:
248 """Perform the upload with a callback function.Lines 252-260 252 _callback : Callable[[float], None]
253 Callback function for upload, accepts ``bytes_in_chunk``
254 """
255
! 256 with path.open("rb") as data:
257 token.get_client().upload_fileobj(
258 data,
259 Bucket=token.get_bucket(),
260 Key=token.get_s3_key(),Lines 269-278 269 _upload(progress_callback)
270 else:
271 if verbose:
272 with _get_progress(_S3Action.UPLOADING) as progress:
! 273 total_size = path.stat().st_size
! 274 task_id = progress.add_task(
275 "upload", filename=str(remote_filename), total=total_size
276 )
277
278 def _callback(bytes_in_chunk):Lines 318-326 318 remote_basename = Path(remote_filename).name
319
320 # set to_file if None
321 if to_file is None:
! 322 to_path = Path(resource_id) / remote_basename
323 else:
324 to_path = Path(to_file)
325
326 # make the leading directories in the 'to_path', if anyLines 401-418 401 User-supplied callback function with ``bytes_in_chunk`` as argument.
402 """
403
404 # If to_file is a gzip extension, just download
! 405 if to_file is None:
! 406 remote_basename = Path(remote_filename).name
! 407 if remote_basename.endswith(".gz"):
! 408 remote_basename = remote_basename[:-3]
! 409 to_path = Path(resource_id) / remote_basename
410 else:
! 411 to_path = Path(to_file)
412
! 413 suffixes = "".join(to_path.suffixes).lower()
! 414 if suffixes.endswith(".gz"):
415 return download_file(
416 resource_id,
417 remote_filename,
418 to_file=to_path,Lines 435-443 435 verbose=verbose,
436 progress_callback=progress_callback,
437 )
438 if os.path.exists(tmp_file_path_str):
! 439 extract_gzip_file(Path(tmp_file_path_str), to_path)
440 else:
441 raise WebError(f"Failed to download and extract '{remote_filename}'.")
442 finally:
443 os.unlink(tmp_file_path_str)tidy3d/web/core/task_core.pyLines 308-316 308 size, credits of task and others.
309 """
310 resp = http.get("tidy3d/py/tasks")
311 if not resp:
! 312 return []
313 return parse_obj_as(list[SimulationTask], resp)
314
315 def delete(self, versions: bool = False):
316 """Delete current task from server.Lines 687-695 687 """
688 if not self.task_id:
689 raise WebError("Expected field 'task_id' is unset.")
690
! 691 target_path = pathlib.Path(to_file)
692
693 return download_file(
694 self.task_id,
695 SIM_ERROR_FILE, |
45920fb to
c69c8d2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additional Comments (2)
-
tidy3d/web/api/container.py, line 1280 (link)style: Inconsistency -
task_pathsdictionary values should be stored asstrfor JSON serialization compatibility, but line 1280 stores as Path object. Consider usingstr(self._job_data_path(...))for serialization safety. -
tidy3d/material_library/material_library.py, line 66 (link)logic: When
fnameis a PathLike object (not a string or Path),open()may not handle it correctly. Cast to Path first:with open(Path(fname), "w") as f:to ensure compatibility with all PathLike implementations.
27 files reviewed, 11 comments
c69c8d2 to
9960390
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additional Comments (3)
-
tidy3d/material_library/material_library.py, line 66-67 (link)logic:
open()should receive string path for compatibility. Although Python'sopen()acceptsPathLikeprotocol, explicitly converting ensures compatibility with all PathLike implementations including custom ones. AddPath(fname)orstr(fname)conversion. -
tidy3d/web/api/autograd/autograd.py, line 707-709 (link)style: Path construction duplicates parent/child logic - consider consolidating to
path_dir_adj = Path(run_kwargs.pop("path")).parent / LOCAL_ADJOINT_DIRto be more explicit -
tidy3d/web/api/autograd/autograd.py, line 844-846 (link)style: Same pattern as line 707 - consider consolidating Path construction for consistency
27 files reviewed, 6 comments
9960390 to
9f67634
Compare
yaugenst-flex
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @marcorudolphflex this is great! Some much needed path handling cleanup!
d1c7261 to
c8e4cfd
Compare
yaugenst-flex
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @marcorudolphflex LGTM! Could you go over the greptile comments too and decide whether any of them need attention and mark them as resolved? We can merge then.
c8e4cfd to
e78475e
Compare
e78475e to
583e827
Compare
refers to #2148
We currently use a lot of string wrangling for paths such as in the run functions. It’d be great to have a pass over this and clean it up using pathlib and use that consistently throughout the codebase.
Implementation
We now support PathLike paths on all levels (expect internal dev functions like doc building). Paths are casted if necessary.
Open Question
Do we want to use os.PathLike or our own definition? Some editors give warnings if passing strings to PathLike arguments.
If we want to define out own type, where to define it? Alternatively, we could just use "str | Path"
Greptile Overview
Updated On: 2025-10-21 18:14:06 UTC
Greptile Summary
This PR modernizes file path handling across the tidy3d codebase by replacing string-based path manipulation with
pathlib.Pathandos.PathLiketype hints. The refactoring affects 28 files spanning core components, web API, plugins, and tests. The implementation follows a consistent pattern: acceptPathLikeat API boundaries (enabling both string and Path object inputs for backward compatibility), convert to Path objects early for internal operations, and leverage pathlib methods like.parent,.mkdir(),.with_name(), and.suffixesfor robust cross-platform path handling. This eliminates fragile string concatenation/splitting withos.path.join,os.path.basename, andos.path.dirname. The changes integrate with Pydantic models, web API clients, file I/O utilities, geometry export methods, and the autograd plugin.Important Files Changed
Changed Files
strtoPathLike- these represent URL paths not filesystem paths, will cause runtime failures when Path objects are passed to URL construction.hdf5.gzPathLikedoesn't include plain strings but default value is a string literal, will trigger type warningsPathLikewith proper string conversion but doesn't includestrin annotation despite accepting empty stringscompress_file_to_gzipdoesn't check output type before Path conversion, breaking in-memory buffer supportPathLikebut represent string identifiers, not filesystem paths.suffixesvalidationPathLike- concrete implementations handle casting appropriatelyPath('.')truthiness checkrun_asyncto accept PathLike with explicit Path conversion for downstream callsConfidence score: 2/5
http_util.pycausing URL construction failures, (2) breaking logic change inupdater.pythat breaks compound extension support like.hdf5.gz, (3) type mismatch inmaterial_library.pywith string default but PathLike-only annotation, (4) inverted isinstance logic inbase.pyline 830 breaking BytesIO handling, (5) BytesIO handling bug infile_util.py, (6) extensionless file edge case inengine.py, (7) semantic type errors intask_core.pyfor remote S3 keys.tidy3d/web/core/http_util.py(URL vs filesystem path confusion),tidy3d/updater.py(extension matching logic break),tidy3d/material_library/material_library.py(type annotation fix needed),tidy3d/components/base.py(isinstance condition inversion),tidy3d/components/file_util.py(BytesIO type checking),tidy3d/web/api/autograd/engine.py(empty suffix edge case),tidy3d/web/core/task_core.py(remote parameter semantics)Context used (5)
dashboard- When modifying a piece of logic, ensure the change is propagated to all independent functions or cal... (source)dashboard- Use explicitis Nonechecks instead of generic truthiness checks when distinguishing between None ... (source)dashboard- Remove temporary debugging code (print() calls), commented-out code, and other workarounds before fi... (source)dashboard- Prefer pytest.mark.parametrize over manual for loops to define and run distinct test cases, reducing... (source)dashboard- In docstrings, use the formatname : type = defaultfor parameter documentation instead of numpydo... (source)