                                                #Feature Engineering

1. What is a parameter?
    A parameter is a value that defines a model's behavior in Machine Learning (ML) or statistics. 
    It is learned from the training data during model training.


2. What is correlation?
    Correlation measures the relationship between two variables, indicating how one variable changes 
    with respect to another.

Types of Correlation:
Positive Correlation: When one variable increases, the other also increases (e.g., height vs. weight).
Negative Correlation: When one variable increases, the other decreases (e.g., speed vs. travel time).
No Correlation: When no relationship exists between the variables.
Mathematical Measure:
Correlation Coefficient (r) ranges from -1 to +1:
r = +1 → Perfect positive correlation
r = -1 → Perfect negative correlation
r = 0 → No correlation

3. What does negative correlation mean?
    A negative correlation means that as one variable increases, the other decreases.

4. Define Machine Learning. What are the main components in Machine Learning?
    Machine Learning (ML) is a branch of artificial intelligence (AI) that enables systems to learn 
    patterns from data and make decisions without explicit programming.

5. How does loss value help in determining whether the model is good or not?
    The loss value represents the difference between the actual and predicted output.

6. What are continuous and categorical variables?
Continuous Variables:
Numerical values that can take infinite possibilities (e.g., height, temperature, price).
Example:
temperature = [23.5, 27.8, 30.1, 35.0]
Categorical Variables:
Discrete groups or categories (e.g., colors, cities, gender).
Example:
color = ['Red', 'Blue', 'Green']

7. How do we handle categorical variables in Machine Learning? What are the common techniques?
    Since ML models work with numbers, categorical variables must be encoded.

    Common Encoding Techniques:
    Label Encoding

    Assigns numerical labels to categories.
Example: {'Male': 0, 'Female': 1}
Good for: Ordinal Data (ordered categories).
One-Hot Encoding (OHE)

→ One-Hot Encoding:
Red:   [1, 0, 0]
Blue:  [0, 1, 0]
Green: [0, 0, 1]
Good for: Nominal Data (unordered categories).
Target Encoding

Replaces categories with the mean of the target variable.
Example:
City: [NY, LA, SF]
Purchase Rate: [NY = 0.6, LA = 0.7, SF = 0.4]
Frequency Encoding

8. What do you mean by training and testing a dataset?
    When building an ML model, we split the dataset into training and testing sets:

️a) Training Set (70-80% of data)
Used to train the model (learn patterns from data).
The model adjusts its parameters using this data.
️b) Testing Set (20-30% of data)
Used to evaluate how well the model generalizes.
If the model performs well on training data but poorly on testing data, it's overfitting.
# Example of Train-Test Split in Python:

from sklearn.model_selection import train_test_split

# Sample dataset
X = [[10], [20], [30], [40], [50]]  # Features
y = [1, 0, 1, 0, 1]  # Labels

# Split into 80% training and 20% testing
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

print("Training Data:", X_train)
print("Testing Data:", X_test)


9. What is sklearn.preprocessing?
    sklearn.preprocessing is a module in Scikit-Learn used for data preprocessing 
    (scaling, encoding, feature transformation).

from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
scaled_data = scaler.fit_transform([[200], [400], [600]])

10. What is a Test Set?
    A Test Set is a part of the dataset not used for training but for evaluating model performance.

#Typical Split:

Training Set: 70-80% (Used for learning)
Testing Set: 20-30% (Used for evaluation)

11. How do we split data for model fitting (training and testing) in Python?
    Use train_test_split() from sklearn.model_selection:

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


12. How do you approach a Machine Learning problem?
    Step-by-step approach:                                                                 
Define the problem (Regression / Classification).
Perform EDA (Data exploration & visualization).
Preprocess data (Handle missing values, scaling, encoding).
Split data (train_test_split()).
Choose a model (Logistic Regression, Decision Tree, etc.).
Train the model (model.fit(X_train, y_train)).
Evaluate performance (model.score(X_test, y_test)).
Optimize (Hyperparameter tuning).
Make predictions & deploy
                                                                 ️
13. Why do we have to perform EDA before fitting a model to the data?
    Exploratory Data Analysis (EDA) helps:
a) Detect missing values & outliers
b) Identify correlations between features
c) Choose the right features & transformations

#Example:
df.isnull().sum()  # Check missing values
df.corr()  # Check correlation

14. What is correlation?
Correlation measures the relationship between two variables.

a) Types:

Positive Correlation (r > 0): Both increase (Height vs. Weight).
Negative Correlation (r < 0): One increases, the other decreases (Temperature vs. Jacket Sales).
No Correlation (r = 0): No relationship.

15. What does negative correlation mean?
    A negative correlation means that as one variable increases, the other decreases.

#Example:

Temperature vs. Jacket Sales (Higher temp = Fewer sales).
Study Time vs. TV Time (More study = Less TV).
#Mathematically: If r is negative, the correlation is negative.

16. How can you find correlation between variables in Python?
    Using Pandas corr()
df.corr()
#Using Seaborn Heatmap

import seaborn as sns
sns.heatmap(df.corr(), annot=True, cmap="coolwarm")


17. What is Causation?
    Causation means that one variable directly affects another (cause and effect relationship).
#Difference Between Correlation & Causation:

Concept	Meaning	Example:
    Correlation	Two variables move together but may not be related	Ice cream sales &
    drowning incidents (both increase in summer).
Causation:
    One variable directly influences the other	More studying → Higher exam scores.
                                                                
️18. What is an Optimizer? What are different types of optimizers?
    An optimizer is an algorithm that adjusts model parameters to minimize loss and improve accuracy.

#Types of Optimizers:

Optimizer:
Gradient Descent	Updates weights in the direction of the lowest loss	Used in Linear Regression.
SGD (Stochastic Gradient Descent)	Updates weights using one random sample per step	Faster but noisier.
Adam (Adaptive Moment Estimation)	Combines momentum & adaptive learning	Works well for deep learning.
#Example (Using Adam Optimizer in TensorFlow):

import tensorflow as tf
optimizer = tf.keras.optimizers.Adam(learning_rate=0.01)

19. What is sklearn.linear_model?
    sklearn.linear_model is a module in Scikit-Learn that provides algorithms for linear-based models.

#Common Models:

Linear Regression (LinearRegression())
Logistic Regression (LogisticRegression())
# Example (Linear Regression in Python):

from sklearn.linear_model import LinearRegression

model = LinearRegression()

20. What does model.fit() do? What arguments must be given?
    The fit() method trains the model on given data.

 Arguments:
model.fit(X_train, y_train)
X_train: Features (input data).
y_train: Labels (target variable).

21. What does model.predict() do? What arguments must be given?
    The predict() method makes predictions using a trained model.

Arguments:

predictions = model.predict(X_test)
X_test: Input data for prediction.
️
22. What are continuous and categorical variables?	
    Continuous can take infinite values	Height, Salary, Temperature
    Categorical	Fixed categories or groups	Gender (Male/Female), Colors (Red/Blue)

23. What is Feature Scaling? How does it help in ML?
    Feature Scaling normalizes or standardizes data to improve model performance.

    Ensures all features have the same scale (avoids large differences).
    Helps models like SVM, KNN, Neural Networks perform better.
# Example:

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaled_data = scaler.fit_transform(X)

24. How do we perform scaling in Python?
    Two common methods:

Scaling Method	Purpose	Example
Min-Max Scaling	Scales values between 0 and 1	MinMaxScaler()
Standard Scaling	Centers data around mean 0, std 1	StandardScaler()
# Example (Min-Max Scaling in Python):

from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
scaled_data = scaler.fit_transform(X)

25. What is sklearn.preprocessing?
    sklearn.preprocessing is a module in Scikit-Learn that provides functions for data preprocessing (scaling, encoding, transformation).

    Common Functions:

StandardScaler(): Standardizes data.
MinMaxScaler(): Scales data between 0 and 1.
LabelEncoder(): Converts categories to numbers.
OneHotEncoder(): Converts categorical variables into binary columns.

26. How do we split data for model fitting (training and testing) in Python?
     Use train_test_split() from Scikit-Learn


from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
Parameters:

test_size=0.2: 20% of data goes to testing.
random_state=42: Ensures reproducibility.

27. Explain Data Encoding?
    Data encoding converts categorical data into numerical values for ML models.

Common Techniques:

Encoding Type	Purpose	Example
Label Encoding	Assigns numbers to categories	{'Male': 0, 'Female': 1}
One-Hot Encoding	Creates binary columns	Color → [Red, Blue, Green] → [1, 0, 0]
Target Encoding	Uses mean of target variable	City → [NY = 0.6, LA = 0.7]
#Example (One-Hot Encoding in Python):


from sklearn.preprocessing import OneHotEncoder

encoder = OneHotEncoder()
encoded_data = encoder.fit_transform(data)
