Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resampling with label uniformity and user uniformity #39

Open
qa-aleksejs-fomins opened this issue Dec 1, 2022 · 0 comments
Open

Resampling with label uniformity and user uniformity #39

qa-aleksejs-fomins opened this issue Dec 1, 2022 · 0 comments

Comments

@qa-aleksejs-fomins
Copy link

Hi,

I have a regression problem, so the label is a single floating-point number within a well-defined range (e.g. [0, 1]). The label distribution is non-uniform: namely, there is markedly less data at the edges, but also in the very middle of the range. So far, a classical problem for SMOGN. However, I sample data from multiple users, and there is also a huge imbalance in amount of data among users. I would prefer that all users are well-represented in the training set in addition to balancing the label range distribution. Thus, I would prefer that the algorithm is aware of user labels, and tries to undersample users with a lot of data and preserve or oversample users with little data. Is this currently possible? Do you have suggestions?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant