# Handling Sparse Data with LogisticNet

This notebook shows how `LogisticNet` handles sparse input data, such as text features or high-dimensional datasets.

## Setup
We generate a sparse synthetic dataset for testing.

In [1]:
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from scipy.sparse import csr_matrix
from glmpynet import LogisticNet

# Generate synthetic dataset and convert to sparse
X, y = make_classification(n_samples=200, n_features=100, n_classes=2, random_state=42)
X_sparse = csr_matrix(X)
X_train, X_test, y_train, y_test = train_test_split(X_sparse, y, test_size=0.2, random_state=42)

# Fit and predict with sparse data
model = LogisticNet(penalty='l1')
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f"Sparse Data Accuracy: {accuracy:.2f}")

# Check sparsity of coefficients
print(f"Number of non-zero coefficients: {np.sum(model.coef_ != 0)}")



Sparse Data Accuracy: 0.85


NameError: name 'np' is not defined

## Explanation
- The dataset is converted to a sparse CSR matrix to simulate high-dimensional data.
- `LogisticNet` with `penalty='l1'` promotes sparsity in coefficients.
- Accuracy is similar to dense data, but coefficient sparsity is key for `glmnet` comparison.
- With `glmnet`, expect enhanced sparsity and potentially better performance on sparse data.