Skip to content

Commit

Permalink
doc: remove relative_number as superceded by regular number (#1228)
Browse files Browse the repository at this point in the history
  • Loading branch information
shuttie committed Jan 25, 2024
1 parent 78ef29d commit 46fffb7
Show file tree
Hide file tree
Showing 2 changed files with 1 addition and 41 deletions.
1 change: 0 additions & 1 deletion doc/configuration/feature-extractors.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,6 @@ The `scope: item` means that the extracted popularity field from item metadata s
* [boolean](features/scalar.md#boolean-and-numerical-extractors): uses a raw boolean field as a 1 or 0 feature value.
* [string](features/scalar.md#string-extractors): uses a raw string or list<string> field as an input and does a one-hot encoding of it.
* [word_count](features/generic.md#word-count): how many words are in a string field.
* [relative_number](features/generic.md#relative-number): scales a numerical field to make it fit 0..1 range.
* [list_size](features/generic.md#list-size): size of string or numerical list.
* [time_diff](features/generic.md#time-difference): difference in seconds between current timestamp and the numerical field value.
* [field_match](features/text.md#field_match): match ranking field over item fields.
Expand Down
41 changes: 1 addition & 40 deletions doc/configuration/features/generic.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,47 +13,8 @@ You can use the `word_count` feature extractor to get the length of a string fie
```

## Relative number
More advanced feature type, which can scale numerical feature using different methods. Example config for a static scaling with
predefined min and max values and log transformation:
```yaml
- name: price
type: relative_number
method:
type: minmax
min: 0
max: 100
field: price
source: item
scope: item
```

Supported methods:
* *minmax*: uses `min` and `max` fields to scale
* *log_minmax*: uses `min` and `max` fields to scale, but the value is log-transformed before.
* *estimate_minmax*: using a sample of latest `pool_size` events (sampled with `sample_rate` rate), estimate
min and max values used for scaling
* *estimate_histogram*: using a sample of latest `pool_size` events (sampled with `sample_rate` rate), use a histogram scaling
over `bucket_count` buckets. So for a price field from the example above, histogram scaling will translate absolute value
into a percentile over a sampled pool of values.

Estimate methods are useful for rough scaling of values, when you cannot easily define min and max:
* `estimate_minmax` should be used when the value can be linearly scaled, and there are no outliers
* `estimate_histogram` can handle skewed distributions and outliers, but has quantized output: there is only `bucket_count`
possible output values.

Example config for an `estimate_histogram`:
```yaml
- name: price
type: relative_number
method:
type: estimate_histogram
pool_size: 100 // for a pool size of 100
sample_rate: 10 // we sample every 10th event in the pool
bucket_count: 5 // so value will be mapped to 0-20-40-60-80-100 percentiles
field: price
source: item
scope: item
```
> Update, v0.7.x: relative_number is deprecated and removed. Both XGBoost and LightGBM natively support this out of the box for all numeric features, so please use the [number](scalar.md#numerical-extractor) feature.
## List size

Expand Down

0 comments on commit 46fffb7

Please sign in to comment.