-
Notifications
You must be signed in to change notification settings - Fork 315
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Out-of-Memory PAR #1952
Comments
Hi there @prupireddy PARSynthesizer isn't super well optimized yet from a performance standpoint. To help us prioritize this, I've created the following collection thread and I've added your situation to it: #1965 I put a suggested workaround here that I recommend trying and see if that works for you: #1965 (comment) I'm closing this issue out for now to centralize our discussion there. Thanks! Duplicate of #1965 |
Hi Srini,
FYI, Milliman (my organization for whom I am doing PAR modelling) is now an
enterprise client.
Anyways, sorry for the delay. I wanted to try out a few possible fixes
before getting back to you (i.e. using segment size, decreasing the number
of columns). None of these suggestions have worked. Unfortunately,
decreasing the number of rows (as you suggested in the collection thread)
used for training isn't a possibility either.
Data Context:
# rows: 4242950
# unique sequences: 115742
# columns: 17
Metadata object: Attached
Code Context:
metadata = SingleTableMetadata()
metadata.detect_from_dataframe(df_input)
metadata.update_column('Member_ID', sdtype = 'id')
metadata.set_sequence_key('Member_ID')
metadata.set_sequence_index('DateIndex')
synthesizer = PARSynthesizer(metadata = metadata, epochs = 1,
context_columns = ['Gender', 'Age'], verbose = True, segment_size = 10)
synthesizer.fit(df_input)
Best,
*Pranav Rupireddy*
Quantitative/Machine Learning Research Intern
*Milliman*
8500 Normandale Lake Blvd, Suite 1850
Minneapolis, MN 55437
USA
*milliman.com <https://us.milliman.com/en/>*
…On Fri, Apr 26, 2024 at 10:02 AM Srini Kadamati ***@***.***> wrote:
Hi there @prupireddy <https://github.com/prupireddy> can you share a bit
more about the characteristics of your data and some of the code you wrote?
This will help isolate what the issue is!
Data Context:
- How many rows of data do you have?
- How many unique sequences / sequence_key values you have?
- Number of columns
- Your metadata object (optional, but would be nice to have)
Code Context:
- The relevant parts of SDV specific code you wrote & ran (especially
when instantiating the PARSynthesizer object)
—
Reply to this email directly, view it on GitHub
<#1952 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AOGQX54QK2IEYWFKYOLZIKLY7JT6ZAVCNFSM6AAAAABGTJIENGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANZZGU3DOOJTGQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Hi @prupireddy, thanks for the feedback and details. I think it is best to discuss on Discourse. Since GitHub is the open source forum, we have a slightly different system on here for triaging and collecting requests. As a licensed SDV Enterprise user, you will get prioritized responses, troubleshooting, etc. from Discourse. Thanks, and apologies for any confusion. |
Environment Details
Error Description
I have a PAR model running on a health dataset. The line synthesizer.fit() throws the following error: RuntimeError: [enforce fail at alloc_cpu.cpp:114] data. DefaultCPUAllocator: not enough memory: you tried to allocate 683656 bytes. I find this particularly surprising given that I am running this on a machine with 128 GM RAM and I just restarted it.
Steps to reproduce
For privacy reasons, I cannot send the full data and code.
Here is the traceback, though:
Traceback (most recent call last):
File "C:\Users\Pranav.Rupireddy\Documents\MillimanSynthetic\par\par.py", line 261, in
synthesizer.fit(df_input)
File "C:\Users\Pranav.Rupireddy\AppData\Local\anaconda3\Lib\site-packages\sdv\single_table\base.py", line 405, in fit
self.fit_processed_data(processed_data)
File "C:\Users\Pranav.Rupireddy\AppData\Local\anaconda3\Lib\site-packages\sdv\single_table\base.py", line 386, in fit_processed_data
self._fit(processed_data)
File "C:\Users\Pranav.Rupireddy\AppData\Local\anaconda3\Lib\site-packages\sdv\sequential\par.py", line 317, in _fit
self._fit_sequence_columns(processed_data)
File "C:\Users\Pranav.Rupireddy\AppData\Local\anaconda3\Lib\site-packages\sdv\sequential\par.py", line 303, in _fit_sequence_columns
self._model.fit_sequences(sequences, context_types, data_types)
File "C:\Users\Pranav.Rupireddy\AppData\Local\anaconda3\Lib\site-packages\deepecho\models\par.py", line 315, in fit_sequences
X.append(self._data_to_tensor(sequence['data']))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Pranav.Rupireddy\AppData\Local\anaconda3\Lib\site-packages\deepecho\models\par.py", line 203, in _data_to_tensor
x = torch.zeros(self._data_dims)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: [enforce fail at alloc_cpu.cpp:114] data. DefaultCPUAllocator: not enough memory: you tried to allocate 683656 bytes.
The text was updated successfully, but these errors were encountered: