Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Iteration charts are missing final values #152

Closed
RobinL opened this issue Dec 14, 2020 · 1 comment · Fixed by #153
Closed

Iteration charts are missing final values #152

RobinL opened this issue Dec 14, 2020 · 1 comment · Fixed by #153

Comments

@RobinL
Copy link
Member

RobinL commented Dec 14, 2020

At the moment, the parameter history is added to during the maximisation step, not during the expectation step.

This doesn't make sense because the match probabilities are computed during the expectation step, and therefore can be considered 'used' at that point (and therefore added to the iteratino history.

This currently causes odd issues such as there being no iteratino history if you only run an expectation step

@RobinL
Copy link
Member Author

RobinL commented Dec 14, 2020

Robin Linacre 14 days ago
could it make sense to save params to iteration history when you run the expectation step?

Robin Linacre 14 days ago

  • yes - i think that does make sense because that's the point at which they're 'used'

Robin Linacre 14 days ago
and then don't save them out as part of the maximisation step

Robin Linacre 14 days ago
that would also mean that they get saved out 'one final time' i.e. the charts properly show the final values of parameters

Robin Linacre 14 days ago
at the moment i think the iteration charts are missing the final values of params (edited)

Robin Linacre 14 days ago
(when you iterate once, you want two items in the iteration chart) (edited)

Sam Lindsay 14 days ago
Yes. I had mistakenly thought the missing one was the first

Robin Linacre 14 days ago
me too!

Sam Lindsay 14 days ago
So to move the saving params to the expectation, do you mean moving the following:
new_lambda = _get_new_lambda(df_intermediate, spark)
pi_df_collected = _get_new_pi_df(df_intermediate, spark, params)
params._update_params(new_lambda, pi_df_collected)
With df_e replacing df_intermediate?

Sam Lindsay 14 days ago

new_lambda = _get_new_lambda(df_intermediate, spark)

splink/maximisation_step.py:113
new_lambda = _get_new_lambda(df_intermediate, spark)
https://github.com/moj-analytical-services/splink|moj-analytical-services/splinkmoj-analytical-services/splink | Added by GitHub

Robin Linacre 14 days ago
so i think it's deleting

Robin Linacre 14 days ago

self._save_params_to_iteration_history()

splink/params.py:323
self._save_params_to_iteration_history()
https://github.com/moj-analytical-services/splink|moj-analytical-services/splinkmoj-analytical-services/splink | Added by GitHub

Robin Linacre 14 days ago
and probably putting it here instead?

Robin Linacre 14 days ago

splink/expectation_step.py:64

https://github.com/moj-analytical-services/splink|moj-analytical-services/splinkmoj-analytical-services/splink | Added by GitHub

Sam Lindsay 14 days ago
Right, gotcha

Sam Lindsay 14 days ago
Yes, that looks exactly right

Sam Lindsay 14 days ago
So initial values are saved, with or without iterations. And final values are picked up by the expectation step outside the iteration loop
👍
1

Robin Linacre 14 days ago
💥

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant