Skip to content

v0.16.0

Compare
Choose a tag to compare
@OnlyDeniko OnlyDeniko released this 20 Mar 10:01
  • It was introduced the support of the dateframes from the polars package. This is available in the following modules: data (Dataset, SequenceTokenizer, SequentialDataset) for working with transformers, metrics, preprocessing and splitters. The new format allows to achieve multiple acceleration of calculations relative to the Pandas and PySpark dataframes. You can see more details about usage in the examples.
  • Removed dependencies on seaborn and matplotlib. Removed functions replay.utils.distributions.plot_item_dist and replay.utils.distributions.plot_user_dist.
  • Added functions to get and set embeddings in transformers - get_all_embeddings, set_item_embeddings_by_size, set_item_embeddings_by_tensor, append_item_embeddings. You can see more details about their use in the examples.
  • Added a QueryEmbeddingsPredictionCallback to get query embeddings at the inference stage in transformers. You can see more details about usage in the examples.
  • Added support for numerical features in SequenceTokenizer and TorchSequentialDataset. It becomes possible to use numerical features inside transformers.
  • Auto padding for inference stage of transformer-based models in a single-user mode is supported.
  • Added a new KL UСB model based on https://arxiv.org/pdf/1102.2490.pdf.
  • Added a callback to calculate cardinality in TensorSchema. Now it is not necessary to pass the cardinality parameter, the value will be calculated automatically.
  • Added the core_count parameter to replay.utils.session_handler.get_spark_session. If nothing is specified, the env variables REPLAY_SPARK_CORE_COUNT and REPLAY_SPARK_MEMORY are taken into account. If they are not specified, the value is set to -1.
  • Corrected the behavior of the item_count parameter in ValidationMetricsCallback. If you are not going to calculate the Coverage metric, then you do not need to pass this parameter.
  • The calculation of the Coverage metric on Pandas and PySpark has been aligned.
  • Removed conversion from PySpark to Pandas in some models. Added the allow_collect_to_master parameter, False by default.
  • 100% test coverage has been achieved.
  • Undetectable type correction during fit in LabelEncoder. The problem occurred when using multiple tuples with null values.
  • Changes in the experimental part:
    • Python 3.10 is supported
    • Interface updates due to the d3rlpy version update
    • Adding a DesicionTransformer