In [4]:
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

# Encode the categorical target variable
encoder = LabelEncoder()
y_encoded = encoder.fit_transform(y)

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y_encoded, test_size=0.2, random_state=42)

# Initialize the linear regression model
model = LinearRegression()

# Fit the model to the training data
model.fit(X_train, y_train)

# Make predictions on the test data
y_pred = model.predict(X_test)

# Evaluate the model
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

# Display results
print(f"Mean Squared Error: {mse:.2f}")
print(f"R^2 Score: {r2:.2f}")



Mean Squared Error: 1.47
R^2 Score: 0.00


### Explanation of the Code

#### 1. **Encoding the Target Variable**
   - The `LabelEncoder` was used to convert the categorical target variable (`y`) into numerical labels, which is a necessary step since `LinearRegression` requires numerical inputs for both features and the target.
   - For example, if `y` contained categories like `["Hand1", "Hand2", "Hand3"]`, these would be encoded as `[0, 1, 2]`.

#### 2. **Splitting the Data**
   - The dataset was split into training and testing sets using `train_test_split`:
     - `80%` of the data was used to train the model.
     - `20%` was reserved for testing to evaluate its performance.

#### 3. **Fitting the Linear Regression Model**
   - A `LinearRegression` model was trained using the encoded numerical labels (`y_encoded`) as the target.
   - Linear regression tries to find the best-fitting line to minimize the difference between predicted and true values.

#### 4. **Making Predictions**
   - Predictions (`y_pred`) were made on the test set, resulting in continuous numerical values.

#### 5. **Evaluating the Model**
   - The model's performance was evaluated using two metrics:
     - **Mean Squared Error (MSE):** Measures the average squared difference between predicted and true values. Lower values are better.
     - **R² Score:** Indicates how well the model explains the variance in the target variable. A value close to 1 indicates a good fit, while 0 or negative values indicate poor performance.

---

### Results and Interpretation

- **Mean Squared Error (MSE):** 1.47  
  This suggests that, on average, the squared difference between the predicted and actual values is 1.47. While this is not extremely high, it doesn't provide meaningful insights because the problem involves categorical classes rather than continuous numerical values.

- **R² Score:** 0.00  
  An R² score of `0.00` means the model does not explain any of the variance in the target variable. Essentially, the linear regression model performs no better than predicting the mean value of the target.

---

### Why It Didn't Work

Linear regression is not suitable for this problem because:
1. **Categorical Target:** The target variable represents categories (e.g., poker hand types). Linear regression is designed for continuous numerical targets, not categorical ones.
2. **Model Assumptions:** Linear regression assumes a linear relationship between the features and the target. For categorical data, this assumption does not hold.
3. **Nature of the Problem:** Predicting poker hand types is inherently a classification problem, not a regression problem. The encoded numerical labels are not truly continuous, which misleads the model.

---

