Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

On the fly vtk import #1300

Merged
merged 1 commit into from Dec 7, 2023
Merged

On the fly vtk import #1300

merged 1 commit into from Dec 7, 2023

Conversation

dbochkov-flexcompute
Copy link
Contributor

For some reason couldn't make it work with plain global variables, so ended up using a dict that contains relevant info. One issue with it (and probably in general using this on the fly approach) is that we can't use imported modules members in definition of function signatures. For this particular case, it is not critical, but could be for others.

@momchil-flex
Copy link
Collaborator

Yeah so we definitely need this. I tested a server task with latest pre/2.5 vs. using this modification, and there was a 5-6s difference (faster with this change), because we probably load vtk more than once in subprocesses. The task doesn't need it at all.

I am just not sure if this is the implementation we want to go with.

@tylerflex @daquinteroflex what do you think?

@daquinteroflex
Copy link
Collaborator

daquinteroflex commented Dec 7, 2023

Writing a few notes as was having a look through it on a first pass:

  • Wrapping a requirements verification function through every method implementation that requires vtk is a more advanced way of a try-except import vtk and I like this, it makes sense.
  • I'm concerned about the timing cost of accessing the vtk-modules through the vtk dictionary. How about if we accessed them directly since we know that the try-except import vtk already guarantees that they should be available in the sys.modules import cache? And then basically the @requires_vtk decorator becomes called in any of the vtk variable instance usages, and something like this on tests? Underneath the function it should be possible to write the code the same as without the dictionary? Or did this not work before and I've likely missed something?

How did you reproduce the timing test? I assume on the cloud on something like a reproducible hardware instance. So this may sound stupid because I believe you're getting a dependency import speedup, (and I know it might not be necessarily helpful) but I'm not able to reproduce it locally reliably, but I can share how I tested the timing. I just thought having a benchmark test that could be reproduced in multiple machines might be good in order to compare with previous and future versions + any changes on this import front. I had a look at this, this as references.

Assuming we're testing the root tidy3d import, on a common virtual envrionment with dev installed which includes vtk. So to look into this properly, and also verify that whatever changes we do can be quantified as better, I attempted to reproduce the same test setup in my machine:

So I run to get the breakdown of the imports and verifies that vtk is not loaded on root

python -X importtime import_tidy3d_script.py

where import_tidy3d_script.py just import tidy3d accordingly.

This should print a breakdown of the import timing in hierarchy. Note the cumulative timing is in relation to the printed output in hierarchy.

On pre-2.5 pre merge this PR

import time: self [us] | cumulative | imported package
import time:        98 |         98 |   _io
...
import time:        83 |         83 |         trimesh
import time:        39 |         39 |         vtk
import time:       875 |     182477 |       tidy3d.components.types
import time:       130 |        130 |         tidy3d.components.data
...
import time:       114 |        114 |     tidy3d.components.heat
import time:      2187 |       2187 |     tidy3d.components.heat.boundary
import time:       125 |        125 |       tidy3d.components.heat.viz
import time:      1354 |       1479 |     tidy3d.components.heat.source
import time:      1681 |       1681 |     tidy3d.components.heat.monitor
import time:      1545 |       1545 |     tidy3d.components.heat.grid
import time:      3676 |      10679 |   tidy3d.components.heat.simulation
import time:       108 |        108 |     tidy3d.components.heat.data
import time:      2150 |       2150 |     tidy3d.components.heat.data.monitor_data
import time:      1023 |       3280 |   tidy3d.components.heat.data.sim_data
import time:      2073 |     927203 | tidy3d

and with this PR, the same test does not import vtk so it's not included in the timing breakdown.

import time: self [us] | cumulative | imported package
...
import time:       126 |        126 |       tidy3d.components.heat.viz
import time:      1369 |       1494 |     tidy3d.components.heat.source
import time:      1814 |       1814 |     tidy3d.components.heat.monitor
import time:      1610 |       1610 |     tidy3d.components.heat.grid
import time:      3654 |      10834 |   tidy3d.components.heat.simulation
import time:       120 |        120 |     tidy3d.components.heat.data
import time:      2158 |       2158 |     tidy3d.components.heat.data.monitor_data
import time:      1029 |       3306 |   tidy3d.components.heat.data.sim_data
import time:      2071 |     923162 | tidy3d

However, I noticed that locally the test is not quite reproducible as it depends on the tasks the computer is running and the amount of compute dedicated to the taskand looked more into this, so I suspect maybe your cloud process is more reproducible in this case. And since we know For example if I run the same above multiple times just in case you're interested:

import subprocess,time

n=100
python_load_time = 0
tidy3d_load_time = 0

for i in range(n):
    s = time.time()
    subprocess.call(["python","-c","import tidy3d"])
    tidy3d_load_time += time.time()-s

    s = time.time()
    subprocess.call(["python","-c","pass"])
    python_load_time += time.time()-s

print("average tidy3d load time = {}".format((tidy3d_load_time-python_load_time)/n))

This took a bit of time to run and outputted: (I'd be interested in seeing how this is quantified in the cloud subprocess personally.)
Pre 2.5

(tidy3d_pre2) daquintero@dxps:~/flexcompute/debug/tidy3d_pre25$ python time_import_2.py 
average tidy3d load time = 1.1593191528320312 s
(tidy3d_pre2) daquintero@dxps:~/flexcompute/debug/tidy3d_pre25$ git branch
  daniil/onthefly-vtk-import
  develop
* pre/2.5

With this PR:

(tidy3d_onthefly) daquintero@dxps:~/flexcompute/debug/tidy3d_onthefly$ python time_import_2.py 
average tidy3d load time = 1.1516077351570129 s
(tidy3d_onthefly) daquintero@dxps:~/flexcompute/debug/tidy3d_onthefly$ ^C
(tidy3d_onthefly) daquintero@dxps:~/flexcompute/debug/tidy3d_onthefly$ git branch
* daniil/onthefly-vtk-import

I'm relatively interesting if this is reproduced in the cloud, or if I'm not testing this properly. I wonder if more specifically we want to test a class import that does not use a method with vtk_requires rather than the import tidy3d ? I'm also assuming we all are using the dev environment?

@tylerflex tylerflex requested review from daquinteroflex and removed request for tylerflex December 7, 2023 14:43
@daquinteroflex
Copy link
Collaborator

daquinteroflex commented Dec 7, 2023

I was looking again into this with more detail. For the sake of brainstorming, what if we had this instead rather than the wrapper as this solves the issue Dannil was mentioning on the module imports accordingly?
Sorry I'm being stupid again on this function wrapper idea - for a moment I thought it'd be different but the sequence is the same. What I've proposed is the same as the wrapper, the only difference is that rather than calling the dictionary, you call the import directly within the function.

This is the same sequence as the wrapper I short circuited for a moment when writing this

def verify_time_import():
    try:
        import time
    except ImportError:
        raise Warning("That didn't work")
        

def test_this_works():
    verify_time_import()
    import time # Here we import and write code and imports as usual?
    print(time.time())
    
test_this_works()

I think the timing principle of not importing unless the method is required will still work, whilst the import cache should mean that it doesn't get reloaded, whilst you're still able to do module imports and use the internal library functionality as usual within the methods accordingly without the need of the vtk dictionary.

@dbochkov-flexcompute
Copy link
Contributor Author

@daquinteroflex thanks for looking into timings so carefully. When I run these scripts on my side locally I get the following results:

For python -X importtime import_tidy3d_script.py:

  1. Before this PR:
import time:       259 |     218696 |         trimesh
...
import time:      2427 |     245991 |         vtk
import time:      2284 |     671884 |       tidy3d.components.types
...
import time:      2302 |    1411318 | tidy3d
  1. After this PR:
import time:       258 |     222313 |         trimesh
import time:      2345 |     433283 |       tidy3d.components.types
...
import time:      2420 |    1224365 | tidy3d

For python time_import_2.py:

  1. Before this PR: average tidy3d load time = 1.571178252696991
  2. After this PR: average tidy3d load time = 1.2565033292770387

So, it seems these results are consistent with each other. That is, importing vtk adds about 0.25-0.3s on my machine. However, it differs significantly with your results, which basically says that there is no difference. Can you double check that your environment does have vtk installed? When I do pip uninstall vtk and try python -X importtime import_tidy3d_script.py I get the following

import time:        81 |         81 |         vtk

which is similar to what you get.

Anyway, 0.25-0.3s still seems quite far from from what @momchil observed

@dbochkov-flexcompute
Copy link
Contributor Author

I was looking again into this with more detail. For the sake of brainstorming, what if we had this instead rather than the wrapper as this solves the issue Dannil was mentioning on the module imports accordingly?

rather than the wrapper

def verify_time_import():
    try:
        import time
    except ImportError:
        raise Warning("That didn't work")
        

def test_this_works():
    verify_time_import()
    import time # Here we import and write code and imports as usual?
    print(time.time())
    
test_this_works()

I think the timing principle of not importing unless the method is required will still work, whilst the import cache should mean that it doesn't get reloaded, whilst you're still able to do module imports and use the internal library functionality as usual within the methods accordingly without the need of the vtk dictionary.

Yeah, I don't really have a preference between going the wrapper way or the helper function way. I do agree that dictionary thing is kinda of awkward. On the other hand, it seems using a helper function would require explicitly calling import vtk and other function imports in each method that requires it. If we do decide for the latter one, we could maybe do something like this

def get_vtk():
    try:
        vtk = __import__("vtk")
        return vtk
    except ImportError:
        raise Warning("That didn't work")

def get_vtk_numpy():
    try:
        vtk_np = __import__("vtk.util.numpy_support")
        return vtk_np
    except ImportError:
        raise Warning("That didn't work")
        
def test_this_works():
    vtk = get_vtk()
    vtk_numpy = get_vtk_numpy()
    ...

In case we decide to stick with the wrapper approach, we could maybe slightly improve that by replacing vtk dictionary with a vtk config similar to tidy3d config https://github.com/flexcompute/tidy3d/blob/pre/2.5/tidy3d/config.py.

@daquinteroflex
Copy link
Collaborator

daquinteroflex commented Dec 7, 2023

Hi Daniil,

Thanks for this, you were right, I found there was something wrong on my setup. I was being stupid. I didn't do pip install -r requirements/dev.txt and did what I've been used to more recently in the new setup which was pip install .[dev]. Turns out for some reason it didn't install vtk and I should have checked this. For reference:

These were the commands I ran on both this PR and pre-PR which have worked on my poetry setup and expected it to be the same on the older version.

git checkout daniil/onthefly-vtk-import
micromamba create -n tidy3d_pre2.5 python=3.10 -c conda-forge
micromamba activate tidy3d_pre2.5
python3 -m pip install -e .[dev]

and

micromamba create -n tidy3d_onthefly python=3.10 -c conda-forge
micromamba activate tidy3d_onthefly
python3 -m pip install -e .[dev]
python time_import.py

However, to my surprise, because I checked the requirements/dev.txt installation also included vtk.txt. It turns out when I ran the above vtk didn't get installed on either virtual environment, despite them being globally "imported" when I ran those import timing decomposition tests as above.

Once I properly did the installation running pip install .[vtk] then I reproduced the same timing as you mention above:

(tidy3d_pre2) daquintero@dxps:~/flexcompute/debug/tidy3d_pre25$ python time_import_2.py 
average tidy3d load time = 1.3655261087417603
(tidy3d_onthefly) daquintero@dxps:~/flexcompute/debug/tidy3d_onthefly$ python time_import_2.py 
average tidy3d load time = 1.0845233035087585

@daquinteroflex
Copy link
Collaborator

daquinteroflex commented Dec 7, 2023

Edit: I've moved the corrections to my previous comment to that comment directly. On further thought:

I see what you're proposing. Yeah that could work as well. I am wondering if we have any particular timing costs in between these and am testing that atm.

@daquinteroflex
Copy link
Collaborator

daquinteroflex commented Dec 7, 2023

Apologies for the earlier edits, I think I needed a bit of a break after being connected to the computer for a bit without thinking too clearly.

So I've done a bit of testing, mainly I was curious about what was the time cost of a function call with the import. The memory cost is minimal if we assume the test setup to be valid.

In a directory, create get_vtk_module.py:

def get_vtk():
    try:
        vtk = __import__("vtk")
        return vtk
    except ImportError:
        raise Warning("That didn't work")

This gets imported as a function call in the call_get_vtk.py:

from get_vtk_module import get_vtk

get_vtk()

and then the timing test setup over 100 runs:

import subprocess
import time

n = 100
get_vtk_call_time = 0
vtk_import_time = 0

for i in range(n):
    # Timing get_vtk function call
    start_time = time.time()
    subprocess.call(["python", "call_get_vtk.py"])
    get_vtk_call_time += time.time() - start_time

    # Timing simple import vtk
    start_time = time.time()
    subprocess.call(["python", "-c", "import vtk"])
    vtk_import_time += time.time() - start_time

print("Average call time for get_vtk: {}".format(get_vtk_call_time / n))
print("Average import time for vtk: {}".format(vtk_import_time / n))

when I run it:

(tidy3d_onthefly) daquintero@dxps:~/flexcompute/debug/import_time_test$ python test_compare.py 
Average call time for get_vtk: 0.18276306867599487
Average import time for vtk: 0.18201354026794433

So it's a minimal cost, but it maybe reimporting again with the current function wrapper is fine within the actual functions rather than variables?

Apologies for the mess of messages earlier! I'm going to call it a day as I think I need to unplug from the terminal.

@momchil-flex
Copy link
Collaborator

In case we decide to stick with the wrapper approach, we could maybe slightly improve that by replacing vtk dictionary with a vtk config similar to tidy3d config https://github.com/flexcompute/tidy3d/blob/pre/2.5/tidy3d/config.py.

All things considered maybe yeah the current approach with this improvement is good enough? I'm even fine with merging as is for now - I just want to get this fix in, even if it's not the prettiest under the hood.

@dbochkov-flexcompute
Copy link
Contributor Author

Edit: I've moved the corrections to my previous comment to that comment directly. On further thought:

I see what you're proposing. Yeah that could work as well. I am wondering if we have any particular timing costs in between these and am testing that atm.

Not exactly that, but I just did a quick test of executing a method that requires vtk 10000 times in a loop with with PR and without, and I didn't see any difference performance wise. Probably we don't need to worry about performance either way, especially given that typically methods that require vtk are expected to be on the heavy side, so their execution time will strongly dominate overheads from reading a dict, creating additional variables, etc.

All things considered maybe yeah the current approach with this improvement is good enough? I'm even fine with merging as is for now - I just want to get this fix in, even if it's not the prettiest under the hood.

Yeah, we can probably just merge as it is and then refine the implementation in a separate PR. Perhaps, we should take a look at other heavy imports, like trimesh, gdspy, gdstk, jax, etc, and figure out a unified solution that would make sense for all of them

@momchil-flex momchil-flex merged commit 7fc2c2d into pre/2.5 Dec 7, 2023
14 checks passed
@momchil-flex momchil-flex deleted the daniil/onthefly-vtk-import branch December 7, 2023 19:23
@daquinteroflex
Copy link
Collaborator

All sounds good!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants