Out-of-Memory PAR #1952

prupireddy · 2024-04-22T18:24:36Z

Environment Details

SDV version: 1.11.0
Python version: 3.11.7
Operating System: Windows 10 Enterprise

Error Description

I have a PAR model running on a health dataset. The line synthesizer.fit() throws the following error: RuntimeError: [enforce fail at alloc_cpu.cpp:114] data. DefaultCPUAllocator: not enough memory: you tried to allocate 683656 bytes. I find this particularly surprising given that I am running this on a machine with 128 GM RAM and I just restarted it.

Steps to reproduce

For privacy reasons, I cannot send the full data and code.
Here is the traceback, though:

Traceback (most recent call last):
File "C:\Users\Pranav.Rupireddy\Documents\MillimanSynthetic\par\par.py", line 261, in
synthesizer.fit(df_input)
File "C:\Users\Pranav.Rupireddy\AppData\Local\anaconda3\Lib\site-packages\sdv\single_table\base.py", line 405, in fit
self.fit_processed_data(processed_data)
File "C:\Users\Pranav.Rupireddy\AppData\Local\anaconda3\Lib\site-packages\sdv\single_table\base.py", line 386, in fit_processed_data
self._fit(processed_data)
File "C:\Users\Pranav.Rupireddy\AppData\Local\anaconda3\Lib\site-packages\sdv\sequential\par.py", line 317, in _fit
self._fit_sequence_columns(processed_data)
File "C:\Users\Pranav.Rupireddy\AppData\Local\anaconda3\Lib\site-packages\sdv\sequential\par.py", line 303, in _fit_sequence_columns
self._model.fit_sequences(sequences, context_types, data_types)
File "C:\Users\Pranav.Rupireddy\AppData\Local\anaconda3\Lib\site-packages\deepecho\models\par.py", line 315, in fit_sequences
X.append(self._data_to_tensor(sequence['data']))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Pranav.Rupireddy\AppData\Local\anaconda3\Lib\site-packages\deepecho\models\par.py", line 203, in _data_to_tensor
x = torch.zeros(self._data_dims)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: [enforce fail at alloc_cpu.cpp:114] data. DefaultCPUAllocator: not enough memory: you tried to allocate 683656 bytes.

srinify · 2024-04-26T18:20:55Z

Hi there @prupireddy PARSynthesizer isn't super well optimized yet from a performance standpoint.

To help us prioritize this, I've created the following collection thread and I've added your situation to it: #1965

I put a suggested workaround here that I recommend trying and see if that works for you: #1965 (comment)

I'm closing this issue out for now to centralize our discussion there. Thanks! Duplicate of #1965

prupireddy · 2024-05-16T16:40:20Z

Hi Srini, FYI, Milliman (my organization for whom I am doing PAR modelling) is now an enterprise client. Anyways, sorry for the delay. I wanted to try out a few possible fixes before getting back to you (i.e. using segment size, decreasing the number of columns). None of these suggestions have worked. Unfortunately, decreasing the number of rows (as you suggested in the collection thread) used for training isn't a possibility either. Data Context: # rows: 4242950 # unique sequences: 115742 # columns: 17 Metadata object: Attached Code Context: metadata = SingleTableMetadata() metadata.detect_from_dataframe(df_input) metadata.update_column('Member_ID', sdtype = 'id') metadata.set_sequence_key('Member_ID') metadata.set_sequence_index('DateIndex') synthesizer = PARSynthesizer(metadata = metadata, epochs = 1, context_columns = ['Gender', 'Age'], verbose = True, segment_size = 10) synthesizer.fit(df_input) Best, *Pranav Rupireddy* Quantitative/Machine Learning Research Intern *Milliman* 8500 Normandale Lake Blvd, Suite 1850 Minneapolis, MN 55437 USA *milliman.com <https://us.milliman.com/en/>*

…

On Fri, Apr 26, 2024 at 10:02 AM Srini Kadamati ***@***.***> wrote: Hi there @prupireddy <https://github.com/prupireddy> can you share a bit more about the characteristics of your data and some of the code you wrote? This will help isolate what the issue is! Data Context: - How many rows of data do you have? - How many unique sequences / sequence_key values you have? - Number of columns - Your metadata object (optional, but would be nice to have) Code Context: - The relevant parts of SDV specific code you wrote & ran (especially when instantiating the PARSynthesizer object) — Reply to this email directly, view it on GitHub <#1952 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AOGQX54QK2IEYWFKYOLZIKLY7JT6ZAVCNFSM6AAAAABGTJIENGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANZZGU3DOOJTGQ> . You are receiving this because you were mentioned.Message ID: ***@***.***>

npatki · 2024-05-17T17:42:43Z

Hi @prupireddy, thanks for the feedback and details. I think it is best to discuss on Discourse.

Since GitHub is the open source forum, we have a slightly different system on here for triaging and collecting requests. As a licensed SDV Enterprise user, you will get prioritized responses, troubleshooting, etc. from Discourse. Thanks, and apologies for any confusion.

prupireddy added bug Something isn't working new Automatic label applied to new issues labels Apr 22, 2024

srinify added under discussion Issue is currently being discussed and removed new Automatic label applied to new issues labels Apr 26, 2024

srinify mentioned this issue Apr 26, 2024

Optimize PARSynthesizer's performance #1965

Open

srinify closed this as completed Apr 26, 2024

srinify added resolution:duplicate This issue or pull request already exists and removed under discussion Issue is currently being discussed labels Apr 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Out-of-Memory PAR #1952

Out-of-Memory PAR #1952

prupireddy commented Apr 22, 2024

srinify commented Apr 26, 2024

prupireddy commented May 16, 2024 via email

npatki commented May 17, 2024

Out-of-Memory PAR #1952

Out-of-Memory PAR #1952

Comments

prupireddy commented Apr 22, 2024

Environment Details

Error Description

Steps to reproduce

srinify commented Apr 26, 2024

prupireddy commented May 16, 2024 via email

npatki commented May 17, 2024