Skip to content
This repository has been archived by the owner on Oct 12, 2021. It is now read-only.

Automatic sampling & transform cache disabling #4

Closed
George3d6 opened this issue Jun 24, 2020 · 0 comments · Fixed by #24
Closed

Automatic sampling & transform cache disabling #4

George3d6 opened this issue Jun 24, 2020 · 0 comments · Fixed by #24
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@George3d6
Copy link
Contributor

We need to:

a) Start sampling training data when the dataset is too large (e.g. can barely fit in RAM)
b) Automatically disable the transformed cache for lightwood when the dataset is too large (e.g. ~5-10% or more of the total RAM)

For now let's say we have 4 potential state for the data:

  • small
  • medium
  • big
  • huge

Determined as a how much of the available RAM the pandas dataframe uses.

If small --- do nothing
If medium --- use the current sampling algorithm to sample the data analysis data
If big --- same as medium + set transform cache disable in the lightwood configuration for training
If huge --- same as big + use the current sampling algorithm to sample data for the testing, training and validation sets.

@George3d6 George3d6 added the enhancement New feature or request label Jun 24, 2020
@btseytlin btseytlin linked a pull request Jun 29, 2020 that will close this issue
@George3d6 George3d6 added this to the Release 2.0 milestone Jul 6, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants