Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Oracle solver not converging on Scheetz2006 #44

Closed
jolars opened this issue Aug 18, 2022 · 3 comments
Closed

Oracle solver not converging on Scheetz2006 #44

jolars opened this issue Aug 18, 2022 · 3 comments

Comments

@jolars
Copy link
Owner

jolars commented Aug 18, 2022

Now I am seeing weird issues with Scheetz2006 and the oracle solver.

Check out the following example.

import matplotlib.pyplot as plt
import numpy as np
from benchopt.datasets import make_correlated_data
from scipy import stats

from slope.data import get_data
from slope.solvers import hybrid_cd, oracle_cd
from slope.utils import dual_norm_slope

dataset = "Scheetz2006"
if dataset == "simulated":
    X, y, _ = make_correlated_data(n_samples=10, n_features=10, random_state=0)
    # X = csc_matrix(X)
else:
    X, y = get_data(dataset)

fit_intercept = False

randnorm = stats.norm(loc=0, scale=1)
q = 0.1
reg = 0.01

alphas_seq = randnorm.ppf(1 - np.arange(1, X.shape[1] + 1) * q / (2 * X.shape[1]))

alpha_max = dual_norm_slope(X, (y - fit_intercept * np.mean(y)) / len(y), alphas_seq)

alphas = alpha_max * alphas_seq * reg

max_epochs = 10000
max_time = 60
tol = 1e-4

beta_cd, intercept_cd, primals_cd, gaps_cd, time_cd = hybrid_cd(
    X,
    y,
    alphas,
    fit_intercept=fit_intercept,
    max_epochs=max_epochs,
    verbose=True,
    tol=tol,
    max_time=max_time,
    cluster_updates=True,
)

beta_oracle, intercept_oracle, primals_oracle, gaps_oracle, time_oracle = oracle_cd(
    X,
    y,
    alphas,
    fit_intercept=fit_intercept,
    max_epochs=max_epochs,
    verbose=True,
    tol=tol,
    max_time=max_time,
    w_star=beta_cd
)

primals_star = np.min(np.hstack((np.array(primals_cd), np.array(primals_oracle))))

plt.clf()

plt.semilogy(time_cd, primals_cd - primals_star, label="cd")
plt.semilogy(time_oracle, primals_oracle - primals_star, label="cd_oracle")

plt.xlabel("Time (s)")

plt.ylabel("suboptimality")
plt.legend()
plt.title(dataset)
plt.show(block=False)

image

@Klopfe
Copy link
Collaborator

Klopfe commented Aug 19, 2022

I have tried to look a bit. My guess is that 1e-4 as tol is not enough to identify the true clusters and giving it as w_star does not work. But I have tried fixing it and haven't succeeded yet...

@jolars
Copy link
Owner Author

jolars commented Aug 25, 2022

As @mathurinm notes in #48, there is actually no problem. The oracle just doesn't have the right clusters in the example above.

@jolars jolars closed this as completed Aug 25, 2022
@jolars
Copy link
Owner Author

jolars commented Oct 11, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants