Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Functions that produce nan/inf values #84

Closed
granawkins opened this issue Aug 6, 2022 · 7 comments
Closed

Functions that produce nan/inf values #84

granawkins opened this issue Aug 6, 2022 · 7 comments

Comments

@granawkins
Copy link
Collaborator

granawkins commented Aug 6, 2022

Some of the operators we support will produce unusable values (nan or inf) in the course of normal use:

Operator X == 0 X > 1e3 X < 0 X > 1
/ nan*
** inf
sqrt nan* nan
log -inf nan
log1p nan
arcsin nan
arccos nan

*We currently use helper functions for division and square root which ignore 0s.

What to do?

Here are 3 ideas:

  1. Deal with them case-by-case.

    • / and sqrt seem ok for now.
    • log1p is a built-in function that extends log by ignoring 0s. We could add a helper which does sign(x) * log1p(abs(x)).
    • arccos and arcsin are maybe rare enough, we could add a check in karoo.fit() when using them that -1 < X < 1, else raise a ValueError.
    • That leaves **. X > 1e3 happens frequently with small numbers too when combined with other operators, e.g. 2 ** (1 / .001). Replacing with 0 is the simplest option, but it's a big nonlinearity (as X increases, outputs get exponentially larger and then drop to 0).
  2. Accept a kwarg with a replacement value (e.g. 0) in the case that a nan and/or inf is produced. Basically like we do in the *'s above, for everything.

  3. If and when a tree produces a nan or inf, just remove it from the gene pool and don't bother scoring it. This is basically the method used by swim, i.e. eliminate trees with less than the minimum number of nodes.

I lean toward 3.

@asksak
Copy link

asksak commented Aug 6, 2022

If idea 3 will be used consistently with all cases, then I think it would be best available resolution.

@granawkins
Copy link
Collaborator Author

The best approach seems to be:

  • Keep the helper fx for / and sqrt, add one for log: sign(x) * log1p(abs(x))
  • Add an unfit=False attribute to Trees. After predicting each tree, if the output contains nan or inf, set unfit=True.
  • Skip unfit trees when scoring
  • Remove unfit trees from gene_pool

@kstaats
Copy link
Owner

kstaats commented Aug 20, 2022 via email

@granawkins
Copy link
Collaborator Author

granawkins commented Aug 20, 2022 via email

@kstaats
Copy link
Owner

kstaats commented Aug 20, 2022 via email

@granawkins
Copy link
Collaborator Author

This was implemented in #85

@granawkins
Copy link
Collaborator Author

granawkins commented Oct 11, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants