# Classify Forest Cover Types with Regularized Logistic Regression

---

**Dataset**: *Forest Covertypes* (`sklearn.datasets.fetch_covtype`)
- Multi-class target: 7 forest cover types (classes 1–7)
- 54 numerical features (cartographic, soil, and wilderness data)
- Real-world dataset used in remote sensing & ecology

---

1. **Load the Forest CoverType Dataset**
   - Use `from sklearn.datasets import fetch_covtype`
   - Load the data and take a random sample of ~10,000 instances for faster training
   - Print the number of samples, features, and target classes

In [1]:
from sklearn.datasets import fetch_covtype
data = fetch_covtype()
X, y = data.data, data.target

2. **Preprocess the Data**
   - Standardize features using `StandardScaler`

3. **Train Multi-class Logistic Regression**
   - Use `LogisticRegression()`
   - Train with 5-fold cross-validation
   - Report accuracy and confusion matrix

4. **Apply Regularization**
   - Try different `C` values (e.g., `[0.01, 0.1, 1, 10]`)
   - Compare **L1** vs **L2** penalties
   - Review differences in model sparsity and accuracy

5. **Use GridSearchCV**
   - Perform grid search over `C` and `penalty` with 5-fold CV
   - Output best model and hyperparameters













6. **(Bonus)** Analyze Feature Importance
   - For L1-regularized models, which features are selected (non-zero coefficients)?
   - What do those features mean?

---

