# Interview Questions:

# 1. What are some common hyperparameters of decision tree models, and how do they affect the model's performance?

1) max_depth: Specifies the maximum depth of the tree.
* Effect: Limits overfitting by preventing the tree from growing too deep, which reduces model complexity but may increase bias.

2) min_samples_split: The minimum number of samples required to split an internal node.
* Effect: A higher value prevents small splits, increasing model robustness but potentially missing finer patterns.

3) min_samples_leaf: The minimum number of samples required to be in a leaf node.
* Effect: Ensures leaves have a minimum number of samples, preventing overfitting and smoothing the decision boundaries.

4) max_features: The number of features to consider when looking for the best split.
* Effect: Reduces model variance by limiting the features considered, making the model more robust to noise.

5) criterion: Determines the function to measure the quality of a split (e.g., gini or entropy for classification).
* Effect: Affects how splits are evaluated and may lead to different decision boundaries.

6) max_leaf_nodes: Limits the number of leaf nodes in the tree.
* Effect: Simplifies the model by capping the number of leaves, reducing overfitting.

7) random_state: Controls the randomness of the estimator (e.g., for splits).
* Effect: Ensures reproducibility for the same dataset.

# 2. What is the difference between the Label encoding and One-hot encoding?

1) Label Encoding

- Converts categories into integers.
- Single column with integer values.
- Imposes ordinal relationship between categories, which may be misleading.
- Does not increase dimensionality.
- Suitable for ordinal data or models that handle categorical data intrinsically (e.g., tree-based models).

2) One-Hot Encoding

- Converts categories into binary vectors.
- Multiple columns, one for each category.
- No ordinality; each category is treated independently.
- Increases dimensionality significantly for many categories.
- Preferred for nominal data or models requiring numerical inputs (e.g., linear regression, neural networks).