You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Oct 12, 2021. It is now read-only.
a) Start sampling training data when the dataset is too large (e.g. can barely fit in RAM)
b) Automatically disable the transformed cache for lightwood when the dataset is too large (e.g. ~5-10% or more of the total RAM)
For now let's say we have 4 potential state for the data:
small
medium
big
huge
Determined as a how much of the available RAM the pandas dataframe uses.
If small --- do nothing
If medium --- use the current sampling algorithm to sample the data analysis data
If big --- same as medium + set transform cache disable in the lightwood configuration for training
If huge --- same as big + use the current sampling algorithm to sample data for the testing, training and validation sets.
The text was updated successfully, but these errors were encountered:
We need to:
a) Start sampling training data when the dataset is too large (e.g. can barely fit in RAM)
b) Automatically disable the transformed cache for lightwood when the dataset is too large (e.g. ~5-10% or more of the total RAM)
For now let's say we have 4 potential state for the data:
Determined as a how much of the available RAM the pandas dataframe uses.
If small --- do nothing
If medium --- use the current sampling algorithm to sample the data analysis data
If big --- same as medium + set transform cache disable in the lightwood configuration for training
If huge --- same as big + use the current sampling algorithm to sample data for the testing, training and validation sets.
The text was updated successfully, but these errors were encountered: