Automatic sampling & transform cache disabling #4

George3d6 · 2020-06-24T16:44:14Z

We need to:

a) Start sampling training data when the dataset is too large (e.g. can barely fit in RAM)
b) Automatically disable the transformed cache for lightwood when the dataset is too large (e.g. ~5-10% or more of the total RAM)

For now let's say we have 4 potential state for the data:

small
medium
big
huge

Determined as a how much of the available RAM the pandas dataframe uses.

If small --- do nothing
If medium --- use the current sampling algorithm to sample the data analysis data
If big --- same as medium + set transform cache disable in the lightwood configuration for training
If huge --- same as big + use the current sampling algorithm to sample data for the testing, training and validation sets.

George3d6 added the enhancement New feature or request label Jun 24, 2020

George3d6 assigned btseytlin Jun 24, 2020

George3d6 mentioned this issue Jun 24, 2020

New sampling interface #5

Closed

George3d6 added current priority and removed current priority labels Jun 25, 2020

btseytlin mentioned this issue Jun 29, 2020

New sampling interface #24

Merged

btseytlin linked a pull request Jun 29, 2020 that will close this issue

New sampling interface #24

Merged

George3d6 added this to the Release 2.0 milestone Jul 6, 2020

George3d6 closed this as completed Jul 6, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automatic sampling & transform cache disabling #4

Automatic sampling & transform cache disabling #4

George3d6 commented Jun 24, 2020

Automatic sampling & transform cache disabling #4

Automatic sampling & transform cache disabling #4

Comments

George3d6 commented Jun 24, 2020