Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

supports upload task with gz #999

Merged
merged 1 commit into from
Aug 16, 2023
Merged

supports upload task with gz #999

merged 1 commit into from
Aug 16, 2023

Conversation

magiWei
Copy link

@magiWei magiWei commented Jul 13, 2023

No description provided.

@magiWei magiWei force-pushed the qingeng/support-gz branch 7 times, most recently from da5934e to 5371b8a Compare July 17, 2023 07:50
@lucas-flexcompute
Copy link
Collaborator

Currently, we only use hdf5 when there are custom datasets in the simulation, otherwise we upload the json version. With this change, we'll always use a compressed hdf5, is that correct? Are there any downsides to this option?

Other than that, the use of os.system to call gzip/gunzip is non-portable. It's better to use the gzip module.

@magiWei magiWei force-pushed the qingeng/support-gz branch 3 times, most recently from 3d6e077 to d121071 Compare July 17, 2023 13:06
@tylerflex tylerflex requested a review from lei-flex July 17, 2023 13:58
@tylerflex
Copy link
Collaborator

Currently, we only use hdf5 when there are custom datasets in the simulation, otherwise we upload the json version. With this change, we'll always use a compressed hdf5, is that correct? Are there any downsides to this option?

I forget why we decided to upload the json if no data was present (instead of just all hdf5). There must have been some reason, maybe @momchil-flex remembers.

Anyway, what is the main advantage of this PR? I'm reluctant to make potentially breaking changes to the webapi so want to make sure it is worth changing things. Also I think it would be helpful for @lei-flex to take a look because I'm not sure how this affects our backend pipeline

@momchil-flex
Copy link
Collaborator

GUI is transitioning to working with hdf5.gz files so they can support custom data, and we can do the same here. The reason we used to upload .json files was for GUI.

We need to remove or modify the download_json and download_hdf5 functions.

tidy3d/tidy3d/web/webapi.py

Lines 464 to 513 in d121071

@wait_for_connection
def download_json(
task_id: TaskId,
path: str = SIM_FILE_JSON,
verbose: bool = True,
progress_callback: Callable[[float], None] = None,
) -> None:
"""Download the `.json` file associated with the :class:`.Simulation` of a given task.
Parameters
----------
task_id : str
Unique identifier of task on server. Returned by :meth:`upload`.
path : str = "simulation.json"
Download path to .json file of simulation (including filename).
verbose : bool = True
If `True`, will print progressbars and status, otherwise, will run silently.
progress_callback : Callable[[float], None] = None
Optional callback function called when downloading file with ``bytes_in_chunk`` as argument.
"""
task = SimulationTask(taskId=task_id)
task.get_simulation_json(path, verbose=verbose, progress_callback=progress_callback)
@wait_for_connection
def download_hdf5(
task_id: TaskId,
path: str = SIM_FILE_HDF5,
verbose: bool = True,
progress_callback: Callable[[float], None] = None,
) -> None:
"""Download the `.hdf5` file associated with the :class:`.Simulation` of a given task.
Parameters
----------
task_id : str
Unique identifier of task on server. Returned by :meth:`upload`.
path : str = "simulation.hdf5"
Download path to .hdf5 file of simulation (including filename).
verbose : bool = True
If `True`, will print progressbars and status, otherwise, will run silently.
progress_callback : Callable[[float], None] = None
Optional callback function called when downloading file with ``bytes_in_chunk`` as argument.
"""
task = SimulationTask(taskId=task_id)
task.get_simulation_hdf5(path, verbose=verbose, progress_callback=progress_callback)

Modifying would mean e.g. trying to download the file directly, and if failing trying to donwload the hdf5.gz file and then repackaging as json or hdf5.

Currently the mode solver does not support hdf5.gz. In general I think we will wait a little with merging this PR.

@magiWei magiWei force-pushed the qingeng/support-gz branch 2 times, most recently from 3afe5a3 to e5a6740 Compare July 19, 2023 02:13
@magiWei
Copy link
Author

magiWei commented Jul 19, 2023

GUI is transitioning to working with hdf5.gz files so they can support custom data, and we can do the same here. The reason we used to upload .json files was for GUI.

We need to remove or modify the download_json and download_hdf5 functions.

tidy3d/tidy3d/web/webapi.py

Lines 464 to 513 in d121071

@wait_for_connection
def download_json(
task_id: TaskId,
path: str = SIM_FILE_JSON,
verbose: bool = True,
progress_callback: Callable[[float], None] = None,
) -> None:
"""Download the `.json` file associated with the :class:`.Simulation` of a given task.
Parameters
----------
task_id : str
Unique identifier of task on server. Returned by :meth:`upload`.
path : str = "simulation.json"
Download path to .json file of simulation (including filename).
verbose : bool = True
If `True`, will print progressbars and status, otherwise, will run silently.
progress_callback : Callable[[float], None] = None
Optional callback function called when downloading file with ``bytes_in_chunk`` as argument.
"""
task = SimulationTask(taskId=task_id)
task.get_simulation_json(path, verbose=verbose, progress_callback=progress_callback)
@wait_for_connection
def download_hdf5(
task_id: TaskId,
path: str = SIM_FILE_HDF5,
verbose: bool = True,
progress_callback: Callable[[float], None] = None,
) -> None:
"""Download the `.hdf5` file associated with the :class:`.Simulation` of a given task.
Parameters
----------
task_id : str
Unique identifier of task on server. Returned by :meth:`upload`.
path : str = "simulation.hdf5"
Download path to .hdf5 file of simulation (including filename).
verbose : bool = True
If `True`, will print progressbars and status, otherwise, will run silently.
progress_callback : Callable[[float], None] = None
Optional callback function called when downloading file with ``bytes_in_chunk`` as argument.
"""
task = SimulationTask(taskId=task_id)
task.get_simulation_hdf5(path, verbose=verbose, progress_callback=progress_callback)

Modifying would mean e.g. trying to download the file directly, and if failing trying to donwload the hdf5.gz file and then repackaging as json or hdf5.

Currently the mode solver does not support hdf5.gz. In general I think we will wait a little with merging this PR.

download_json function has removed. Can the original versions download reslult.hdf5 and download simulation.hdf5 source file through download_hdf5 function? if so, need to change the download_hdf5 logic.

@magiWei magiWei force-pushed the qingeng/support-gz branch 2 times, most recently from 6f064fe to 4ec5d8a Compare July 19, 2023 04:38
@momchil-flex
Copy link
Collaborator

download_json function has removed. Can the original versions download reslult.hdf5 and download simulation.hdf5 source file through download_hdf5 function? if so, need to change the download_hdf5 logic.

Actually, looking at the code again, I propose something else: modifying SimulationTask.get_simulation.json and SimulationTask.get_simulation_hdf5 to download the hdf5.gz file. Then, SimulationTask.get_simulation_hdf5 can just unzip that to the provided path. SimulationTask.get_simulation_json will need to unzip it to a temporary hdf5 file, then read the json string from that file and write it to the provided path. This also won't break backwards compatibility for users who may be using the download_json and download_hdf5 web functions.

Note that there are two pylint errors in the tests. The one about protected access can be avoided by placing a comment # pylint:disable=protected-access on that line or right above it.

@magiWei
Copy link
Author

magiWei commented Jul 20, 2023

download_json function has removed. Can the original versions download reslult.hdf5 and download simulation.hdf5 source file through download_hdf5 function? if so, need to change the download_hdf5 logic.

Actually, looking at the code again, I propose something else: modifying SimulationTask.get_simulation.json and SimulationTask.get_simulation_hdf5 to download the hdf5.gz file. Then, SimulationTask.get_simulation_hdf5 can just unzip that to the provided path. SimulationTask.get_simulation_json will need to unzip it to a temporary hdf5 file, then read the json string from that file and write it to the provided path. This also won't break backwards compatibility for users who may be using the download_json and download_hdf5 web functions.

Note that there are two pylint errors in the tests. The one about protected access can be avoided by placing a comment # pylint:disable=protected-access on that line or right above it.

SimulationTask.get_simulation_hdf5 is just for downloading the output/monitor_data.hdf5 result file, so it no need to change.

SimulationTask.get_simulation_json will need to unzip it to a temporary hdf5 file, then read the json string from that file and write it to the provided path

@magiWei magiWei force-pushed the qingeng/support-gz branch 6 times, most recently from 09c2978 to 27a356d Compare July 20, 2023 12:10
@tylerflex tylerflex marked this pull request as draft July 21, 2023 14:27
@tylerflex tylerflex added the 2.4 label Jul 21, 2023
@magiWei magiWei force-pushed the qingeng/support-gz branch 3 times, most recently from 2b9105a to 52a0e89 Compare July 26, 2023 03:23
@magiWei magiWei marked this pull request as ready for review July 26, 2023 03:45
@tylerflex tylerflex marked this pull request as draft July 26, 2023 18:59
@momchil-flex
Copy link
Collaborator

momchil-flex commented Jul 27, 2023

Ok, actually I see that there is a bit of a mess that needs to be cleaned up. There are two different HDF5 files. The simulation.hdf5 that defines the initial simulation, and the simulation_data.hdf5 (called output/monitor_data.hdf5 on the server) that contains the results. The SimulationTask.get_simulation_hdf5 gets the monitor data file indeed. This method is used in both webapi.load and in webapi.download_hdf5. The second of those methods sounds like it should be downloading simulation.hdf5, but it is downloading the data file instead. As part of this PR, I think what we need to do is fix webapi.download_hdf5 to return simulation.hdf5 and make sure that webapi.download_json can also get the simulation.json successfully.

@magiWei I can help take care of this part once SimulationTask is updated properly. However, I think some changes are still needed there. Currently, it seems that you continue to upload both a simulation.json and a simulation.hdf5.gz, and the json file is only uploaded so that it can be downloaded later on. This doesn't make much sense. Instead we should change things to:

  • Only upload simulation.hdf5.gz, do not upload simulation.json
  • Change SimulationTask.get_simulation_hdf5 to download SIM_FILE_HDF5_GZ and unzip it
  • Rename the current SimulationTask.get_simulation_hdf5 to SimulationTask.get_sim_data_hdf5 and use that in webapi.download but not in webapi.download_hdf5
  • Change SimulationTask.get_simulation_json to call SimulationTask.get_simulation_hdf5, then load the json from the hdf5 file
  • Change SimulationTask.get_simulation to call SimulationTask.get_simulation_hdf5 and load the simulation from the hdf5 file directly, not from json.

Does that make sense?

@magiWei magiWei force-pushed the qingeng/support-gz branch 5 times, most recently from b64c87f to c0c16ed Compare July 28, 2023 04:32
@magiWei
Copy link
Author

magiWei commented Jul 28, 2023

Ok, actually I see that there is a bit of a mess that needs to be cleaned up. There are two different HDF5 files. The simulation.hdf5 that defines the initial simulation, and the simulation_data.hdf5 (called output/monitor_data.hdf5 on the server) that contains the results. The SimulationTask.get_simulation_hdf5 gets the monitor data file indeed. This method is used in both webapi.load and in webapi.download_hdf5. The second of those methods sounds like it should be downloading simulation.hdf5, but it is downloading the data file instead. As part of this PR, I think what we need to do is fix webapi.download_hdf5 to return simulation.hdf5 and make sure that webapi.download_json can also get the simulation.json successfully.

@magiWei I can help take care of this part once SimulationTask is updated properly. However, I think some changes are still needed there. Currently, it seems that you continue to upload both a simulation.json and a simulation.hdf5.gz, and the json file is only uploaded so that it can be downloaded later on. This doesn't make much sense. Instead we should change things to:

  • Only upload simulation.hdf5.gz, do not upload simulation.json
  • Change SimulationTask.get_simulation_hdf5 to download SIM_FILE_HDF5_GZ and unzip it
  • Rename the current SimulationTask.get_simulation_hdf5 to SimulationTask.get_sim_data_hdf5 and use that in webapi.download but not in webapi.download_hdf5
  • Change SimulationTask.get_simulation_json to call SimulationTask.get_simulation_hdf5, then load the json from the hdf5 file
  • Change SimulationTask.get_simulation to call SimulationTask.get_simulation_hdf5 and load the simulation from the hdf5 file directly, not from json.

Does that make sense?

Changed!And there is a unit test failure, I don't know the specific reason, can you help fix it?

@tylerflex
Copy link
Collaborator

tylerflex commented Jul 28, 2023

Hi @magiWei , the failing unit tests was from our jax updating and breaking things. I fixed it in #1038 and will release a patch 2.3.3. I restarted the tests for this PR and they should be passing now.

@tylerflex
Copy link
Collaborator

tylerflex commented Jul 28, 2023

Looks like tests fail but this time due to an issue related to this PR

@magiWei
Copy link
Author

magiWei commented Jul 29, 2023

Looks like tests fail but this time due to an issue related to this PR

Fixed.

@magiWei magiWei marked this pull request as ready for review July 29, 2023 01:21
@momchil-flex
Copy link
Collaborator

Thanks @magiWei I think this looks good now. I will just make a few cosmetic changes before merging (next week).

@momchil-flex momchil-flex changed the base branch from develop to pre/2.4 August 16, 2023 20:42
@momchil-flex momchil-flex force-pushed the qingeng/support-gz branch 2 times, most recently from 84c9f15 to 90c1252 Compare August 16, 2023 21:37
@momchil-flex
Copy link
Collaborator

@tylerflex this is the last frontend PR apart from the pydantic one. I rebased it and fixed eveyrthing up as it was somewhat stale, and tested and it seems to work. Note that with this, hdf5.gz will always be uploaded instead of json/hdf5. The solver already supports that and GUI team is working on supporting this when they switch to 2.4.

The one thing I'm wondering about is these types of statements where a regular "print" is used. https://github.com/flexcompute/tidy3d/pull/999/files#diff-9ef8b43458a5c52262aae2f54d04b0ca7d2f8e141fded282948a9f2413cc3f80R328-R329

Is this handled correctly or should it be some sort of e.g. console.log?

@tylerflex
Copy link
Collaborator

It should be like this
https://github.com/flexcompute/tidy3d/blob/develop/tidy3d/web/webapi.py#L187-L189

console = rich.Console()
console.log("message")

@momchil-flex momchil-flex merged commit 13d747b into pre/2.4 Aug 16, 2023
16 checks passed
@momchil-flex momchil-flex deleted the qingeng/support-gz branch August 16, 2023 23:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants