Environment Details
Please indicate the following details about the environment in which you found the bug:
- SDGym version: v0.14.2
- Pytorch version: 2.11.0
Error Description
When running RealTabFormerSynthesizer on the GCP benchmark environment, the benchmark fails during GPU initialization with the latest PyTorch version (2.11.0).
The observed error is:
ERROR:sdgym.benchmark:Synthesis failed for RealTabFormerSynthesizer on dataset adult;
Traceback (most recent call last):
File "/root/env/lib/python3.10/site-packages/sdgym/benchmark.py", line 438, in _synthesize
fitted_synthesizer = get_synthesizer(data, metadata)
File "/root/env/lib/python3.10/site-packages/sdgym/synthesizers/base.py", line 120, in get_trained_synthesizer
return self._get_trained_synthesizer(data, metadata)
File "/root/env/lib/python3.10/site-packages/sdgym/synthesizers/base.py", line 98, in _get_trained_synthesizer
synthesizer._fit(data, metadata)
File "/root/env/lib/python3.10/site-packages/sdgym/synthesizers/realtabformer.py", line 42, in _fit
model.fit(data)
File "/root/env/lib/python3.10/site-packages/realtabformer/realtabformer.py", line 458, in fit
trainer = self._train_with_sensitivity(
File "/root/env/lib/python3.10/site-packages/realtabformer/realtabformer.py", line 697, in _train_with_sensitivity
trainer = self._fit_tabular(
File "/root/env/lib/python3.10/site-packages/realtabformer/realtabformer.py", line 1092, in _fit_tabular
self.model.cuda()
File "/root/env/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3668, in cuda
return super().cuda(*args, **kwargs)
File "/root/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1097, in cuda
return self._apply(lambda t: t.cuda(device))
File "/root/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 934, in _apply
module._apply(fn)
File "/root/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 934, in _apply
module._apply(fn)
File "/root/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 965, in _apply
param_applied = fn(param)
File "/root/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1097, in <lambda>
return self._apply(lambda t: t.cuda(device))
File "/root/env/lib/python3.10/site-packages/torch/cuda/__init__.py", line 478, in _lazy_init
torch._C._cuda_init()
RuntimeError: The NVIDIA driver on your system is too old (found version 12080). Please update your GPU driver by downloading and installing a new version from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: https://pytorch.org to install a PyTorch version that has been compiled with your version of the CUDA driver.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/root/env/lib/python3.10/site-packages/sdgym/benchmark.py", line 622, in _score
synthetic_data, train_time, sample_time, synthesizer_size, peak_memory = _synthesize(
File "/root/env/lib/python3.10/site-packages/sdgym/benchmark.py", line 478, in _synthesize
raise BenchmarkError(
sdgym.errors.BenchmarkError: The NVIDIA driver on your system is too old (found version 12080). Please update your GPU driver by downloading and installing a new version from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: https://pytorch.org to install a PyTorch version that has been compiled with your version of the CUDA driver.
/root/env/lib/python3.10/site-packages/sdgym/benchmark.py:909: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
scores = pd.concat(scores, ignore_index=True)
RealTabFormerSynthesizer crashes because it directly forces GPU usage through self.model.cuda(). SDV synthesizers that use PyTorch didn't crash in this environment because they check whether GPU usage is available and fall back to CPU if it is not.
The gcp compute configuration used is defined here:
This seems to indicate that our current GCP image / driver / CUDA stack is not compatible with the latest PyTorch version we install in the benchmark environment.
Expected Behavior
The GCP benchmark environment should be able to use the configured GPU with the latest supported PyTorch version, so that RealTabFormerSynthesizer runs successfully and SDV synthesizers can also use the GPU as expected.
Steps to reproduce
In order to run the following code snippet, one has to have credentials saved in their environment:
from sdgym._benchmark_launcher import BenchmarkLauncher
from sdgym.run_benchmark.run_benchmark import _get_config
config = _get_config('single_table')
config.instance_jobs = [
{
'output_destination': 's3://sdgym-benchmark/Debug/v0.14.2_RealTabFormer/',
'synthesizers': ['RealTabFormerSynthesizer'],
'datasets': ['adult']
}
]
launcher = BenchmarkLauncher(config)
launcher.launch()
Additional context
Investigate whether the incompatibility comes from the current PyTorch version, the GCP image/driver configuration, or the interaction between both. Based on that, either pin PyTorch to a compatible version or update the benchmark machine/image configuration accordingly.
Environment Details
Please indicate the following details about the environment in which you found the bug:
Error Description
When running
RealTabFormerSynthesizeron the GCP benchmark environment, the benchmark fails during GPU initialization with the latest PyTorch version (2.11.0).The observed error is:
RealTabFormerSynthesizer crashes because it directly forces GPU usage through
self.model.cuda(). SDV synthesizers that use PyTorch didn't crash in this environment because they check whether GPU usage is available and fall back to CPU if it is not.The gcp compute configuration used is defined here:
SDGym/sdgym/_benchmark/config_utils.py
Line 21 in 782b845
This seems to indicate that our current GCP image / driver / CUDA stack is not compatible with the latest PyTorch version we install in the benchmark environment.
Expected Behavior
The GCP benchmark environment should be able to use the configured GPU with the latest supported PyTorch version, so that RealTabFormerSynthesizer runs successfully and SDV synthesizers can also use the GPU as expected.
Steps to reproduce
In order to run the following code snippet, one has to have credentials saved in their environment:
Additional context
Investigate whether the incompatibility comes from the current PyTorch version, the GCP image/driver configuration, or the interaction between both. Based on that, either pin PyTorch to a compatible version or update the benchmark machine/image configuration accordingly.