Decay functions should allow to specify a value in case a field is missing #18892

Open
FlorianWilhelm opened this Issue Jun 15, 2016 · 13 comments

Comments

Projects
None yet
@FlorianWilhelm

Describe the feature:

When using a decay function in a function_score the documentation says in the offical documentation: "If the numeric field is missing in the document, the function will return 1." In many cases this is not intended since documents missing the field are scored with the highest possible value.

Similar to field_value_factor decay functions should provide a missing parameter allowing to define the score in case the field is missing.

@clintongormley

This comment has been minimized.

Show comment
Hide comment
@clintongormley

clintongormley Jun 15, 2016

Member

I think this makes sense. @brwe what do you think?

Member

clintongormley commented Jun 15, 2016

I think this makes sense. @brwe what do you think?

@brwe

This comment has been minimized.

Show comment
Hide comment
@brwe

brwe Jun 16, 2016

Contributor

Last time we had the discussion we decided we will not do anything because there is a workaround: #7788 We can re evaluate this decision. Might make sense to make it consistent with field_value_factor. I have no strong opinion though.

Contributor

brwe commented Jun 16, 2016

Last time we had the discussion we decided we will not do anything because there is a workaround: #7788 We can re evaluate this decision. Might make sense to make it consistent with field_value_factor. I have no strong opinion though.

@clintongormley

This comment has been minimized.

Show comment
Hide comment
@clintongormley

clintongormley Jun 16, 2016

Member

+1 to consistency

Member

clintongormley commented Jun 16, 2016

+1 to consistency

@FlorianWilhelm

This comment has been minimized.

Show comment
Hide comment
@FlorianWilhelm

FlorianWilhelm Jun 16, 2016

@brwe Thank you, I haven't thought about that option actually but I guess I am not the only one and consistency is always better. I post your solution for completeness reasons here.

{
  "query": {
    "function_score": {
      "score_mode": "first",
      "functions": [
        {
          "filter": {
            "exists": {
              "field": "age"
            }
          },
          "gauss": {
            "age": {
              "origin": 22,
              "scale": 5,
              "decay": 0.5
            }
          }
        },
        {
          "script_score": {
            "script": "0"
          }
        }
      ]
    }
  }
}

@brwe Thank you, I haven't thought about that option actually but I guess I am not the only one and consistency is always better. I post your solution for completeness reasons here.

{
  "query": {
    "function_score": {
      "score_mode": "first",
      "functions": [
        {
          "filter": {
            "exists": {
              "field": "age"
            }
          },
          "gauss": {
            "age": {
              "origin": 22,
              "scale": 5,
              "decay": 0.5
            }
          }
        },
        {
          "script_score": {
            "script": "0"
          }
        }
      ]
    }
  }
}
@magicleo

This comment has been minimized.

Show comment
Hide comment
@magicleo

magicleo Jun 17, 2016

what if I want to use other functions and need "score_mode": "sum"?

what if I want to use other functions and need "score_mode": "sum"?

@FlorianWilhelm

This comment has been minimized.

Show comment
Hide comment
@FlorianWilhelm

FlorianWilhelm Jun 20, 2016

@magicleo I would say that score_mode: first is only an optimization at that point. If you have sum you will just add 0. sometimes.

@magicleo I would say that score_mode: first is only an optimization at that point. If you have sum you will just add 0. sometimes.

@kaka19ace

This comment has been minimized.

Show comment
Hide comment
@kaka19ace

kaka19ace Sep 28, 2016

At most usage cases, we disabled the script option for security reason, so we could not using decay function if the doc field not exists,

Using missing value is a good idea :)

kaka19ace commented Sep 28, 2016

At most usage cases, we disabled the script option for security reason, so we could not using decay function if the doc field not exists,

Using missing value is a good idea :)

@matthuhiggins

This comment has been minimized.

Show comment
Hide comment
@matthuhiggins

matthuhiggins Jan 2, 2017

kaka19ace - I'm having the same pain. Scripting is not enabled, so the suggested workaround is not available. Makes it tough to use the feature on coordinate fields.

kaka19ace - I'm having the same pain. Scripting is not enabled, so the suggested workaround is not available. Makes it tough to use the feature on coordinate fields.

@tuzz

This comment has been minimized.

Show comment
Hide comment
@tuzz

tuzz Feb 17, 2017

I wasn't able to use the workaround above because I have more than one function in function_score. Instead, I found another workaround which is to copy the nullable field to a new field in the index mapping to guarantee a value:

"mappings": {
  "name_of_type": {
    "field_that_might_be_null": {
      "type": "float",
      "copy_to": "field_that_definitely_wont_be_null"
    },
    "field_that_definitely_wont_be_null" {
      "type": "float",
      "null_value": 0
    }
  }
}

Depending on the type of decay, you may need to pick a default value that's far enough out of range to result in a value of 0. Hopefully that helps someone.

tuzz commented Feb 17, 2017

I wasn't able to use the workaround above because I have more than one function in function_score. Instead, I found another workaround which is to copy the nullable field to a new field in the index mapping to guarantee a value:

"mappings": {
  "name_of_type": {
    "field_that_might_be_null": {
      "type": "float",
      "copy_to": "field_that_definitely_wont_be_null"
    },
    "field_that_definitely_wont_be_null" {
      "type": "float",
      "null_value": 0
    }
  }
}

Depending on the type of decay, you may need to pick a default value that's far enough out of range to result in a value of 0. Hopefully that helps someone.

@brooks

This comment has been minimized.

Show comment
Hide comment
@brooks

brooks Jun 8, 2017

+1 for missing 👍

brooks commented Jun 8, 2017

+1 for missing 👍

@adamdunkley

This comment has been minimized.

Show comment
Hide comment
@adamdunkley

adamdunkley Jul 1, 2017

There is actually a workaround that does not need a null value or scripting to be enabled.

If you make the second score function something that will always yield 0, for example:

{
  "query": {
    "function_score": {
      "score_mode": "first",
      "functions": [
        {
          "filter": {
            "exists": {
              "field": "age"
            }
          },
          "gauss": {
            "age": {
              "origin": 22,
              "scale": 5,
              "decay": 0.5
            }
          }
        },
        {
          "gauss": {
            "age": {
              "origin": "0",
              "offset": "0",
              "scale": "100"
            }
          }
        }
      ]
    }
  }
}

(where age is never going to be 0)

This still does not solve the issue for where you want multiple functions as it will screw with averages (if using average as the rollup function) but at least it doesn't require scripting or changes to mappings :)

adamdunkley commented Jul 1, 2017

There is actually a workaround that does not need a null value or scripting to be enabled.

If you make the second score function something that will always yield 0, for example:

{
  "query": {
    "function_score": {
      "score_mode": "first",
      "functions": [
        {
          "filter": {
            "exists": {
              "field": "age"
            }
          },
          "gauss": {
            "age": {
              "origin": 22,
              "scale": 5,
              "decay": 0.5
            }
          }
        },
        {
          "gauss": {
            "age": {
              "origin": "0",
              "offset": "0",
              "scale": "100"
            }
          }
        }
      ]
    }
  }
}

(where age is never going to be 0)

This still does not solve the issue for where you want multiple functions as it will screw with averages (if using average as the rollup function) but at least it doesn't require scripting or changes to mappings :)

@kemcon

This comment has been minimized.

Show comment
Hide comment
@kemcon

kemcon Nov 15, 2017

i like to have the possibility to add a 1 to the result of the decay score (this is an easy to change issue). so the results will be between 1-2 and not between 0-1. with an exists-filter query (if this is nessesary), all non-existing documents will not be scored (this is similar to factor 1) and all others will scored better (1-2).

kemcon commented Nov 15, 2017

i like to have the possibility to add a 1 to the result of the decay score (this is an easy to change issue). so the results will be between 1-2 and not between 0-1. with an exists-filter query (if this is nessesary), all non-existing documents will not be scored (this is similar to factor 1) and all others will scored better (1-2).

@javanna

This comment has been minimized.

Show comment
Hide comment
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment