Benchmark gap encoder early stopping #681

LeoGrin · 2023-07-28T14:02:16Z

Benchmark to evaluate the early stopping change made in #680, compared to the previous version of the gap encoder.

Based on #663 by @simonamaggio and #593 by @LilianBoulard

Compared to #663 , we don't compute the full score regularly, as this operation is very slow. Instead, we use an exponentially weighted average of each batch score. This is based on sklearn's MinibatchNMF's code.

Results

It seems that the new version is faster, while the KL score and balanced accuracy don't change. I think the speed can be improved further by tuning hyperparameters, this is the subject of another benchmark in #680. Furthermore, id columns like seqid take a long time to fit even with early stopping, as the score keeps decreasing, so #585 is still relevant even after this speedup.

LilianBoulard

Benchmark looks good, thanks for the contribution!
Maybe one minor modif before merging (ping me when you think this is ready): remove the print statements (and pass verbose=False).

benchmarks/bench_gap_es_score.py

Co-authored-by: Lilian <lilian@boulard.fr>

LeoGrin · 2023-07-31T13:30:32Z

@LilianBoulard Running with the same batch size, the speedup is smaller, but I think still useful.

LeoGrin · 2023-08-04T17:31:18Z

@LilianBoulard I think it's ready, WDYT?

jovan-stojanovic

LGTM, thanks!

* benchmark * fix bug due to mixed type * verbose * test * fix bug * add benchmark results * add balenced accuracy * Update benchmarks/bench_gap_es_score.py Co-authored-by: Lilian <lilian@boulard.fr> * remove prints * run with the same batch size * benchmark results with the same batch size --------- Co-authored-by: Lilian <lilian@boulard.fr>

LeoGrin added 7 commits July 27, 2023 19:54

benchmark

111cc8e

fix bug due to mixed type

cfb91df

verbose

000ff97

test

0fde022

fix bug

47a5f9a

add benchmark results

1c7d337

add balenced accuracy

7cc15fb

LeoGrin requested a review from LilianBoulard July 28, 2023 14:05

LeoGrin mentioned this pull request Jul 28, 2023

Gap encoder speedups #680

Merged

LilianBoulard reviewed Jul 31, 2023

View reviewed changes

benchmarks/bench_gap_es_score.py Show resolved Hide resolved

benchmarks/bench_gap_es_score.py Outdated Show resolved Hide resolved

benchmarks/bench_gap_es_score.py Outdated Show resolved Hide resolved

LeoGrin and others added 3 commits July 31, 2023 12:45

Update benchmarks/bench_gap_es_score.py

baf17b5

Co-authored-by: Lilian <lilian@boulard.fr>

remove prints

65e1ed0

run with the same batch size

dc9b16b

benchmark results with the same batch size

f2ecdd7

jovan-stojanovic approved these changes Aug 8, 2023

View reviewed changes

jovan-stojanovic merged commit 82d31c9 into skrub-data:main Aug 8, 2023
22 checks passed

LeoGrin mentioned this pull request Aug 18, 2023

Benchmark Early Stopping for the GapEncoder #663

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmark gap encoder early stopping #681

Benchmark gap encoder early stopping #681

LeoGrin commented Jul 28, 2023 •

edited

LilianBoulard left a comment

LeoGrin commented Jul 31, 2023 •

edited

LeoGrin commented Aug 4, 2023 •

edited

jovan-stojanovic left a comment

Benchmark gap encoder early stopping #681

Benchmark gap encoder early stopping #681

Conversation

LeoGrin commented Jul 28, 2023 • edited

Results

LilianBoulard left a comment

Choose a reason for hiding this comment

LeoGrin commented Jul 31, 2023 • edited

LeoGrin commented Aug 4, 2023 • edited

jovan-stojanovic left a comment

Choose a reason for hiding this comment

LeoGrin commented Jul 28, 2023 •

edited

LeoGrin commented Jul 31, 2023 •

edited

LeoGrin commented Aug 4, 2023 •

edited