# Assignment Questions

1. **What is a parameter?**
- A parameter is a numerical or symbolic value that defines characteristics within a system, model, or function. Parameters help describe and control behavior in various contexts:
  - Mathematics & Statistics
  - Programming & Functions
  - Machine Learning & AI
  - Science & Engineering

2. **What is correlation?**
- Correlation is a statistical measure that describes the strength and direction of a relationship between two variables. It helps determine whether changes in one variable are associated with changes in another.

 **What does negative correlation mean?**
- Negative correlation means that as one variable increases, the other decreases. In other words, there is an inverse relationship between the two variables.

3. **Define Machine Learning. What are the main components in Machine Learning?**
- Machine Learning (ML) is a branch of artificial intelligence (AI) that enables computers to learn from data and make predictions or decisions without explicit programming. Instead of being manually coded for every task, ML models identify patterns and improve their performance over time.
- Main Components of Machine Learning:
  -

4. **How does loss value help in determining whether the model is good or not?**
- The loss value is a crucial metric in determining the performance of a machine learning model. It represents the difference between the model’s predictions and the actual target values. A lower loss value indicates that the model's predictions are closer to the true values, whereas a higher loss value means the model is making larger errors.
- Loss quantifies how far off the predictions are from the actual values.
- The model adjusts its parameters using loss to improve performance.
- Very low training loss but high validation loss (model memorizes data but generalizes poorly).
- Loss remains high on both training and validation sets (model fails to learn patterns).
- Loss provides a more granular view of errors than accuracy.

5. **What are continuous and categorical variables?**
- Continuous Variables: Variables that can take an infinite range of values within a given interval. It Represent measurable quantities. It Can be fractional or decimal (not limited to whole numbers). It Can be analyzed using statistical measures like mean, standard deviation, and regression.
- Categorical Variables: Variables that represent distinct groups or categories, often with no inherent numerical meaning. Can be nominal (unordered categories) or ordinal (ordered categories). Typically analyzed using frequency counts, chi-square tests, and classification models.


6. **How do we handle categorical variables in Machine Learning? What are the common techniques?**
- Handling categorical variables in Machine Learning is crucial because models require numerical input. There are several techniques used to convert categorical data into numerical representations while preserving relationships.
- Common Techniques for Handling Categorical Variables:
  - Label Encoding.
  - One-Hot Encoding (OHE).
  - Target Guided Ordinal Encoding

7. **What do you mean by training and testing a dataset?**
- In Machine Learning, data is split into training and testing datasets to evaluate model performance. This ensures the model learns properly and generalizes well to unseen data.
- 1. Training Dataset:
  - Used to teach the model by adjusting its parameters.
  - The model learns patterns, relationships, and structures from this data.
  - Example: A spam classifier is trained using thousands of labeled emails (spam vs. non-spam).
- 2. Testing Dataset:
  - Never used during training—it evaluates how well the model generalizes.
  - Helps identify overfitting (when the model memorizes training data but fails on new data).
  - Example: After training a spam classifier, it is tested on new, unseen emails.

8. **What is sklearn.preprocessing?**
- sklearn.preprocessing is a module in scikit-learn that provides various techniques to transform and scale data for better machine learning performance. It ensures datasets are properly formatted and prepared before feeding them into models.
- It sklearn.preprocessing is a module in scikit-learn that provides various techniques to transform and scale data for better machine learning performance. It ensures datasets are properly formatted and prepared before feeding them into models.
- It Handles categorical data efficiently for machine learning algorithms and Reduces bias from varying scales in datasets.

9. **What is a Test set?**
- A test set is a subset of a dataset that is used to evaluate the performance of a trained machine learning model. It is separate from the training set, ensuring that the model is tested on unseen data to check how well it generalizes.
- Characteristics of a Test Set:
  - Should represent the actual distribution of data.
  - Used for final performance validation.
  - Never used during training .
  - Helps measure real-world accuracy.

10. **How do we split data for model fitting (training and testing) in Python?**
- In Python, we typically split data into training and testing sets using train_test_split from scikit-learn. This ensures the model learns patterns from the training data and is evaluated on unseen testing data to check generalization.

In [None]:
# Example
from sklearn.model_selection import train_test_split
import numpy as np

X = np.array([[1], [2], [3], [4], [5], [6], [7], [8], [9], [10]])
y = np.array([2, 4, 6, 8, 10, 12, 14, 16, 18, 20])

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

print("Training Set (X):", X_train)
print("Training Labels (y):", y_train)
print("Testing Set (X):", X_test)
print("Testing Labels (y):", y_test)

Training Set (X): [[ 6]
 [ 1]
 [ 8]
 [ 3]
 [10]
 [ 5]
 [ 4]
 [ 7]]
Training Labels (y): [12  2 16  6 20 10  8 14]
Testing Set (X): [[9]
 [2]]
Testing Labels (y): [18  4]


  **How do you approach a Machine Learning problem?**
- We approach a Machine Learning Problems in a following ways:
  - Step1: Understanding the Problem Statement.
  - Step2: Extraction of Data.
  - Step3: Exploratory Data Analysis.
  - Step4: Data Preprocessing and Feature Engineering.
  - Step5: Choose a Model & Train.
  - Step6: Testing of the Model.
  - Step7: Deployment of the Model

11. **Why do we have to perform EDA before fitting a model to the data?**
- Exploratory Data Analysis (EDA) is a critical step before fitting a machine learning model because it helps uncover patterns, detect issues, and optimize model performance.
- It helps in:
  - Understanding the Dataset.
  - Detecting Data Quality Issues.
  - Identifying Feature Correlations.
  - Choosing the Right Preprocessing Techniques.
  - Selecting the Best Model Type.

12. **What is correlation?**
- Correlation is a statistical measure that expresses the strength and direction of a relationship between two variables. It helps determine whether changes in one variable are associated with changes in another.

13. **What does negative correlation mean?**
- Negative correlation occurs when one variable increases while the other decreases. It indicates an inverse relationship between the two variables.
- For Example:
  - Speed & Travel Time → The faster you drive, the less time it takes to reach your destination.
  - Exercise & Body Fat Percentage → More frequent workouts tend to lower body fat.

14. **How can you find correlation between variables in Python?**

In [1]:
#Using Pandas
import pandas as pd
data = {'A': [1, 2, 3, 4, 5], 'B': [2, 4, 6, 8, 10], 'C': [10, 9, 7, 6, 5]}
df = pd.DataFrame(data)
df.corr()

Unnamed: 0,A,B,C
A,1.0,1.0,-0.991241
B,1.0,1.0,-0.991241
C,-0.991241,-0.991241,1.0


In [2]:
# using numpy
import numpy as np
x = np.array([1, 2, 3, 4, 5])
y = np.array([10, 9, 7, 6, 5])
np.corrcoef(x,y)[0,1]

np.float64(-0.9912407071619304)

15. **What is causation? Explain difference between correlation and causation with an example.**
- Causation refers to a direct cause-and-effect relationship between two variables—meaning one variable directly affects another.
Correlation, on the other hand, simply indicates a relationship between two variables without proving cause.
- For example:
  - Observation: More ice cream sales occur in summer, and drowning incidents also increase.
  - Correlation: Ice cream sales and drowning incidents are positively correlated (both increase together).
  - Causation: Ice cream doesn’t cause drowning! The actual cause is warm weather—more people swim in summer, increasing the risk of drowning.
-  Correlation does NOT imply causation—additional research is needed to establish direct cause-and-effect.

16. **What is an Optimizer? What are different types of optimizers? Explain each with an example.**
- An optimizer in machine learning is an algorithm that adjusts model parameters (weights and biases) to minimize the loss function and improve accuracy. It plays a crucial role in training models efficiently and ensuring they generalize well to unseen data.
- Types:
  - Gradient Descent:
    - Basic optimization algorithm used to minimize the loss function.
    - Updates weights in the direction of negative gradient to reduce error.
    - Suitable for simple models but can be slow for large datasets.

In [3]:
import tensorflow as tf

optimizer = tf.keras.optimizers.SGD(learning_rate=0.01)

  - Momentum:
    -  Enhances SGD by adding a momentum term to smooth updates.
    - Helps escape local minima faster.
    - Helps in faster convergence compared to vanilla SGD.

In [4]:
optimizer = tf.keras.optimizers.SGD(learning_rate=0.01, momentum=0.9)

  - Adam (Adaptive Moment Estimation)
    -  Combines momentum and adaptive learning rates, making it one of the most popular optimizers.
    - Works well for non-stationary objectives, like deep neural networks.
    - Adjusts learning rates dynamically for each parameter.

In [5]:
optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)

17. **What is sklearn.linear_model ?**
- sklearn.linear_model is a module in Scikit-Learn that provides various linear models for regression and classification tasks. It includes algorithms that assume a linear relationship between input features and the target variable.
- Some Models:
  - Linear Regression (for continuous target variables).
  -  Logistic Regression (for classification tasks).
  - Ridge & Lasso Regression (for regularization).
  - Perceptron (for binary classification).

18. **What does model.fit() do? What arguments must be given?**
- In Scikit-Learn and TensorFlow/Keras, .fit() is the method used to train a machine learning model on a given dataset. It adjusts the model parameters (weights) using the provided data and learns patterns to make predictions.
- Arguments Required for model.fit():
  -  In Scikit-Learn (for Regression & Classification)

In [7]:
# model.fit(X_train, y_train)

- In TensorFlow/Keras (for Deep Learning)

In [8]:
# model.fit(X_train, y_train, epochs=10, batch_size=32)

19. **What does model.predict() do? What arguments must be given?**
- model.predict() is used to generate predictions from a trained machine learning model. After training a model using .fit(), we can use .predict() to make forecasts based on unseen input data.
- Arguments Required for model.predict():
  -  In Scikit-Learn (For Traditional ML Models)

In [9]:
#predictions = model.predict(X_test)

- In TensorFlow/Keras (For Deep Learning Models):

In [11]:
#predictions = model.predict(X_test, batch_size=32)

20. **What are continuous and categorical variables?**
- Continuous Variables (Numerical):
  - Variables that can take an infinite range of values within a given interval.
  - Represent measurable quantities (e.g., age, income, temperature).
  - Can be fractional or decimal (not limited to whole numbers).
  - Used in regression analysis to predict numerical outcomes.
- Categorical Variables (Qualitative):
  - Variables that represent distinct groups or categories, often with no inherent numerical meaning.
  -  Can be nominal (unordered categories) or ordinal (ordered categories).
  - Used in classification models to group items into predefined categories.
  - Typically analyzed using frequency counts, chi-square tests, and encoding techniques in ML.



21. **What is feature scaling? How does it help in Machine Learning?**
- Feature scaling is the process of normalizing or standardizing numerical data so that all features have a similar range of values.
- It helps prevent algorithms from being biased toward features with larger magnitudes, ensuring models perform optimally.

22. **How do we perform scaling in Python?**
- We Perform Scaling in Python by following methods:
  - Min-Max:
    - Neural Networks, Distance-based models (KNN, K-Means)

  - Standardization:
    - Linear models (Regression, SVM).

  - Robust Scaling:
    - Data with Outliers.

  - Power Transformation:
    - Skewed data requiring normality.

23. **What is sklearn.preprocessing?**
- sklearn.preprocessing is a module in Scikit-Learn that provides various techniques to transform and scale data for better machine learning performance. It ensures datasets are properly formatted and prepared before feeding them into models.

24. **How do we split data for model fitting (training and testing) in Python?**
- Splitting data into training and testing sets is a crucial step in machine learning to ensure models generalize well to unseen data. In Python, we typically use train_test_split from Scikit-Learn to divide data efficiently.
- Steps:
  - Import Necessary Libraries.
  - Create Sample Data.
  -  Split Data into Training & Testing Sets.

25. ** Explain data encoding?**
- Data encoding is the process of transforming categorical data into numerical formats so that machine learning models can process it effectively. Since most ML algorithms work with numerical data, encoding is crucial for handling text-based features.