ENH - More efficient `B.dot` and `B.T.dot` in Cox datafit #168

Badr-MOUFAD · 2023-06-14T15:18:42Z

In Cox datafit, It is possible to perform the operations evolving the matrix B in linear time and without having to form explicitly the matrix.

Up to a sorting permutation of tm, the matrix B has a block triangular structure involving matrices with coordinates equal to 1.
Hence applying B to a vector (slightly) resembles performing a cumulative sum on the coordinates of v.

Advantage

reduce the memory footprint of Cox datafit from O(n²) --> O(n) as we no longer store B
Evaluate gradient and Hessian in O(n) instead of O(n²)

Benchmarks

A link to benchmark results to showcase the timing improvements compared to the main branch.

Riding "la trottinette 🛴 à la @agramfort"

Badr-MOUFAD · 2023-06-14T15:21:25Z

Need to merge #167 beforehand

Badr-MOUFAD · 2023-06-14T15:57:31Z

The test fails because the gradient and Hessian disagree with scipy check_grad at atol=1e-3.
I'm not that familiar with truncation and roundoff errors to blame numerical errors for that.

However, I pushed a debug_script that checks independently the matrix-vector operations. It reveals that both operations are the same (difference of 1e-11).

@mathurinm, any clues?

…x-df-enhanced

…into cox-df-enhanced

skglm/datafits/single_task.py

mathurinm · 2023-06-16T13:29:04Z

skglm/datafits/single_task.py

@@ -654,38 +658,53 @@ def initialize(self, X, y):
        """Initialize the datafit attributes."""
        tm, s = y

-        tm_as_col = tm.reshape((-1, 1))
-        self.B = (tm >= tm_as_col).astype(X.dtype)
+        self.T_indices = np.argsort(tm)


skglm/datafits/single_task.py

Badr-MOUFAD added 6 commits June 13, 2023 13:47

add grad

1d7de4c

add unittest

5eb7ab8

fix normalization

2fdf5b3

clean ups

51dc2b5

implement EnhancedCox

d2e6cbb

clean ups

911cbab

Badr-MOUFAD added the Work In Progress label Jun 14, 2023

debug script to check ops

3d74243

Badr-MOUFAD added 5 commits June 15, 2023 09:55

stale commit

c249c7f

refactor B_dot to avoid cancellation errors

eb42b6a

clean ups

944e0e9

mm remarks

6ae23fa

use rtol

2279fdd

Badr-MOUFAD added Ready for review and removed Work In Progress labels Jun 15, 2023

Badr-MOUFAD added 3 commits June 15, 2023 15:47

Merge branch 'cox-l2' of https://github.com/Badr-MOUFAD/skglm into co…

5a52e4f

…x-df-enhanced

remove OldCox

15d20c4

Merge branch 'main' of https://github.com/scikit-learn-contrib/skglm …

10d47ce

…into cox-df-enhanced

Badr-MOUFAD requested review from mathurinm and QB3 June 15, 2023 14:22

more on docs

6f81ed9

mathurinm reviewed Jun 16, 2023

View reviewed changes

Badr-MOUFAD added 2 commits June 16, 2023 17:20

mm remarks

4e87789

fix bug indices

f376e5f

mathurinm approved these changes Jun 20, 2023

View reviewed changes

mathurinm merged commit 189d21e into scikit-learn-contrib:main Jun 20, 2023
4 checks passed

Badr-MOUFAD mentioned this pull request Jun 21, 2023

ENH improve speed of fitting Cox model by relying on the fast skglm solver CamDavidsonPilon/lifelines#1531

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH - More efficient `B.dot` and `B.T.dot` in Cox datafit #168

ENH - More efficient `B.dot` and `B.T.dot` in Cox datafit #168

Badr-MOUFAD commented Jun 14, 2023

Badr-MOUFAD commented Jun 14, 2023

Badr-MOUFAD commented Jun 14, 2023

mathurinm Jun 16, 2023

ENH - More efficient B.dot and B.T.dot in Cox datafit #168

ENH - More efficient B.dot and B.T.dot in Cox datafit #168

Conversation

Badr-MOUFAD commented Jun 14, 2023

Advantage

Benchmarks

Badr-MOUFAD commented Jun 14, 2023

Badr-MOUFAD commented Jun 14, 2023

mathurinm Jun 16, 2023

Choose a reason for hiding this comment

ENH - More efficient `B.dot` and `B.T.dot` in Cox datafit #168

ENH - More efficient `B.dot` and `B.T.dot` in Cox datafit #168