Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

batch refactoring / multi-threading #1630

Merged
merged 2 commits into from
May 1, 2024
Merged

Conversation

tylerflex
Copy link
Collaborator

@tylerflex tylerflex commented Apr 22, 2024

Goals:

  1. Simplifies how Batch and Job work internally.
  2. Improves the visual output from Batch.
  3. Implements multi-threading for batch operations (upload, download, etc).
image

Refactor

Previously, Job objects were defined like this

class Job:
    ...
    task_id = None

    @pd.root_validator()
    def _upload(cls, values) -> None:
          # upload task
          # set task id internally

This meant that when you created a Job, the task is immediately uploaded.
Also when you save a Job.to_file() the task info is contained.

This design made it a bit annoying to do normal multi-threading over job upload.

In the refactored version, I set it up to have upload in its own method.

class Job:
    ...
    task_id_cached = None

    @cached_property
    def task_id(self):
        # return self.task_id_cached or call `self.upload()` 

    def upload(self):
         ...

    def json(self):
          # extend json to add `cached_properties["task_id"] to json file for backwards compatibility when loading from file

Now a similar refactoring was done for Batch to move the upload() bits to their own method.

Before, the Batch created all Job objects and uploaded them at initialization time in a validator.

class Batch:
    ...
    jobs : Dict[str, Job] = None

    @pd.root_validator
    def _upload():
        # for each simulation in the batch, make a job (which automatically uploads it).

instead, now that we have the Job with its own upload method, I simplified this to.

class Batch:
    ...
    jobs : Dict[str, Job] = None

    @cached_property
    def jobs():
        # construct all of the Jobs for this Batch

    def upload(self):
        # upload each Job in a for loop

Multithreading

Implemented basic multi-threading for Batch.upload() and Batch.download(). Works as expected on the Design plugin notebook.

Also made it so that Batch.run() will download by default.

Output / Logging

  • The batch upload, download, and BatchData loading all just log a single output instead of one line for every file.

@tylerflex tylerflex marked this pull request as draft April 22, 2024 17:12
tidy3d/web/api/container.py Show resolved Hide resolved
tidy3d/web/api/container.py Outdated Show resolved Hide resolved
tidy3d/web/api/container.py Outdated Show resolved Hide resolved
tidy3d/web/api/container.py Outdated Show resolved Hide resolved
tidy3d/web/api/container.py Outdated Show resolved Hide resolved
@tylerflex tylerflex changed the base branch from develop to pre/2.7 April 24, 2024 15:35
@tylerflex tylerflex added the 2.7 will go into version 2.7.* label Apr 24, 2024
@tylerflex tylerflex marked this pull request as ready for review April 24, 2024 15:36
@tylerflex tylerflex requested a review from lei-flex April 24, 2024 15:36
tidy3d/web/api/container.py Show resolved Hide resolved
tidy3d/web/api/container.py Show resolved Hide resolved
tidy3d/web/api/container.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@daquinteroflex daquinteroflex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Looks great, excited to test this!

@tylerflex tylerflex force-pushed the tyler/web/batch_multiproc branch 2 times, most recently from e71b355 to b2004f2 Compare April 25, 2024 15:10
@tylerflex
Copy link
Collaborator Author

Thanks @lei-flex @daquinteroflex @lucas-flexcompute for the comments.
I think this is ready to go. Did a bit of testing with WebAPI.ipynb, Design.ipynb and the adjoint batching tutorial. All looks good. Seems faster than before and also the output is much more reasonable.

Let me know if you think this needs more review or work on any particular aspects, thanks!

@tylerflex tylerflex marked this pull request as draft May 1, 2024 00:14
@tylerflex tylerflex marked this pull request as ready for review May 1, 2024 12:59
Copy link
Collaborator

@lei-flex lei-flex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ship it.

@tylerflex tylerflex merged commit 7f5a82b into pre/2.7 May 1, 2024
16 checks passed
@tylerflex tylerflex deleted the tyler/web/batch_multiproc branch May 1, 2024 16:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.7 will go into version 2.7.*
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants