All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
m
andu
probabilities are now reset toNone
rather than0
in EM iteration when they cannot be estimated
- Bug whereby Splink lowercased case expressions, see here
- Improve estimate comparison charts, including tooltips and better labels
- Added mousewheel zoom to bayes factor chart
- Added mousewheel zoom to splink score histogram
- Update estimate comparison chart to use different shapes for different estimates, making it possible to distinguish overlapping symbols
- m and u history charts now display barchart correctly
- Charts now feature improved tooltips, and have a cleaner appearance. Many are now zoomable
- Charts now display better in Jupyter Lab, especially the html file produced by
all_charts_write_html_file()
m
andu
probabilities charts can now be produced fromSettings
objects- The user can now combine settings objects using
ModelCombiner from splink.combine_models
A number of backwards incompatible changes have been made for Splink 1.0.
- The main
Splink
API is different. Instead ofSplink(...,df=df)
for dedupe andSplink(...,df_l=df_l,df_r=df_r)
for linking, the user provides an agumentdf_or_dfs
, which is either a single DataFrame or a list of DataFrames. This allows linking n>2 datasets. - When linking multiple dataframes, the user must now include a
source_dataset
column (default namesource_dataset
, configurable viasource_dataset_column_name
in the settings dict) - The
Params
class is now calledModel
in themodel.py
module. - The on-disk (json) format of the
Model
object has changed and is incompatible withParams
- The new
Model
class now uses the same representation for parameters as the Settings object, reducing duplicate code. Internal functions now havesettings
ormodel
as function arguments, never both. - Vega lite chart definitions now stored in json files in splink/files/chart_defs
- All case statement generation functions are now consistently named, with all names starting
sql_gen_case_stmt_
- Fixed
case_statements.sql_gen_case_smnt_strict_equality_2
which previously behaved differently to all other case functions - All case statements now have a default threshold of exact equality on their top gamma level