# Machine Learning Assignment

This notebook contains answers and code for the Machine Learning assignment questions related to Feature Engineering.

**Q1. What is a parameter?**

*Answer:*

In machine learning, a parameter is an internal variable whose value is estimated from the data during training. For example, in linear regression, the slope and intercept are parameters. Parameters help define how input data is transformed into the desired output.

**Q2. What is correlation?**

*Answer:*

Correlation is a statistical measure that expresses the extent to which two variables are linearly related. It ranges from -1 to +1. A value of +1 means a perfect positive relationship, -1 indicates a perfect negative relationship, and 0 implies no linear relationship.

**Q3. What does negative correlation mean?**

*Answer:*

Negative correlation means that as one variable increases, the other decreases. For example, as the number of hours spent watching TV increases, the grades of a student may decrease, showing a negative correlation.

**Q4. Define Machine Learning. What are the main components in Machine Learning?**

*Answer:*

Machine Learning (ML) is a field of artificial intelligence that enables systems to learn patterns from data and improve from experience without being explicitly programmed. Key components:
- Data
- Features
- Model
- Loss function
- Optimizer

**Q5. How does loss value help in determining whether the model is good or not?**

*Answer:*

The loss value indicates how far the predicted values are from the actual values. A lower loss value means the model is making better predictions. It helps in model evaluation and optimization.

**Q6. What are continuous and categorical variables?**

*Answer:*

- **Continuous variables**: Can take any numerical value (e.g., height, weight).
- **Categorical variables**: Represent categories or labels (e.g., gender, color).

**Q7. How do we handle categorical variables in Machine Learning? What are the common techniques?**

*Answer:*

Common techniques include:
- **Label Encoding**: Assigns a unique number to each category.
- **One-Hot Encoding**: Creates binary columns for each category.
- **Ordinal Encoding**: Assigns ordered numbers based on category hierarchy.

**Q8. What do you mean by training and testing a dataset?**

*Answer:*

- **Training dataset**: Used to train the model.
- **Testing dataset**: Used to evaluate the performance of the trained model on unseen data.

**Q9. What is sklearn.preprocessing?**

*Answer:*

`sklearn.preprocessing` is a module in Scikit-learn that provides functions to preprocess data. This includes scaling, normalization, encoding, and transformation of features.

**Q10. What is a Test set?**

*Answer:*

A test set is a portion of the dataset that is not used during training and is used to evaluate the performance and generalization of the trained model.

**Q11. How do you approach a Machine Learning problem?**

*Answer:*

Steps include:
1. Understand the problem
2. Collect and explore the data
3. Preprocess the data
4. Choose the model
5. Train the model
6. Evaluate the model
7. Tune hyperparameters
8. Deploy the model

**Q12. Why do we have to perform EDA before fitting a model to the data?**

*Answer:*

Exploratory Data Analysis (EDA) helps understand data distribution, identify outliers, detect missing values, and find relationships between features. This ensures better model selection and preprocessing.

**Q13. What is causation? Explain difference between correlation and causation with an example.**

*Answer:*

Causation means one variable causes a change in another. Correlation only shows a relationship. For example, ice cream sales and drowning incidents are correlated (both rise in summer), but one doesn’t cause the other.

**Q14. What is an Optimizer? What are different types of optimizers? Explain each with an example.**

*Answer:*

An optimizer adjusts the model parameters to minimize the loss function. Common optimizers:
- **SGD (Stochastic Gradient Descent)**: Updates weights using small batches.
- **Adam**: Adaptive learning rate, combines RMSprop and momentum.
- **RMSprop**: Uses moving average of squared gradients to normalize.

**Q15. What is sklearn.linear_model ?**

*Answer:*

`sklearn.linear_model` is a module in Scikit-learn that includes linear models like Linear Regression, Logistic Regression, Ridge, Lasso, etc., used for prediction tasks.

**Q16. What does model.fit() do? What arguments must be given?**

*Answer:*

`model.fit(X, y)` trains the model using input features `X` and target values `y`. It adjusts internal parameters to minimize the loss function.

**Q17. What does model.predict() do? What arguments must be given?**

*Answer:*

`model.predict(X)` uses the trained model to predict target values for input features `X`.

**Q18. What is feature scaling? How does it help in Machine Learning?**

*Answer:*

Feature scaling transforms features to be on a similar scale. It improves model performance and convergence speed, especially for models based on distance (e.g., KNN, SVM).

**Q19. Explain data encoding?**

*Answer:*

Data encoding converts categorical variables into numerical format so that machine learning models can process them. Techniques include label encoding, one-hot encoding, and ordinal encoding.

**Q20. How can you find correlation between variables in Python?**

*Answer:*

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd

# Sample dataset
df = sns.load_dataset('iris')
correlation_matrix = df.corr()

# Plot heatmap
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm')
plt.title('Correlation Heatmap')
plt.show()

**Q21. How do we perform scaling in Python?**

*Answer:*

In [None]:
from sklearn.preprocessing import StandardScaler
import numpy as np

# Sample data
data = np.array([[1, 2], [3, 6], [5, 10]])

# Perform scaling
scaler = StandardScaler()
scaled_data = scaler.fit_transform(data)
print("Scaled Data:\n", scaled_data)

**Q22. How do we split data for model fitting (training and testing) in Python?**

*Answer:*

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris

# Load dataset
iris = load_iris()
X = iris.data
y = iris.target

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

print("Train shape:", X_train.shape)
print("Test shape:", X_test.shape)