# **THEORY QUESTIONS**

1. What is a parameter?
  - In the context of machine learning, a parameter is a configuration variable that is internal to the model and whose value can be estimated from data. For example, in linear regression, the coefficients of the independent variables are parameters. These are learned during training by minimizing the loss function, enabling the model to make accurate predictions.


2. What is correlation?
  - Correlation is a statistical measure that expresses the extent to which two variables are linearly related. It is measured using the correlation coefficient, which ranges from -1 to 1. A value close to 1 implies a strong positive relationship, while a value close to -1 indicates a strong negative relationship. A value near 0 means no linear correlation.
  
  
3. What does negative correlation mean?
  - Negative correlation means that as one variable increases, the other variable tends to decrease. This inverse relationship is quantified with a correlation coefficient that lies between 0 and -1. For instance, there may be a negative correlation between the amount of exercise and body weight, where more exercise is associated with lower body weight.

4. Define Machine Learning. What are the main components in Machine Learning?
  - Machine Learning is a subset of artificial intelligence that allows systems to learn and improve from experience without being explicitly programmed. The main components of ML include the dataset (input data), model (algorithm used for learning), loss function (measures model error), optimizer (reduces the error), and evaluation metrics (assess performance).



5. How does loss value help in determining whether the model is good or not?
 - The loss value indicates how far the predicted outputs of the model are from the actual values. A lower loss value signifies a better-performing model. During training, the model iteratively adjusts its parameters to minimize this loss. If the loss remains high even after several iterations, it may suggest underfitting or issues with the model design.

6. What are continuous and categorical variables?
  - Continuous variables are numerical values that can take an infinite number of values within a range, such as height, weight, or temperature. These are typically real numbers. Categorical variables, on the other hand, represent discrete groups or categories, such as gender, color, or blood type. Categorical data can be nominal (no natural order) or ordinal (with a natural order).


7. How do we handle categorical variables in Machine Learning? What are the common techniques?
 -  Categorical variables are transformed into numerical formats so that machine learning models can process them. Common techniques include Label Encoding (assigns each unique category an integer), One-Hot Encoding (creates binary columns for each category), and Ordinal Encoding (used when categories have an order). These transformations are often done using sklearn.preprocessing.


8. What do you mean by training and testing a dataset?
 -  Training a dataset refers to the process of teaching the machine learning model using known input-output pairs so that it can learn patterns. Testing involves using unseen data to evaluate the model’s performance. The main goal is to assess how well the model generalizes to new data.


9. What is sklearn.preprocessing?
  - sklearn.preprocessing is a module in Scikit-learn used for preparing data before feeding it into a machine learning model. It includes tools for normalization, standardization, encoding categorical variables, and more. These preprocessing steps are crucial for ensuring consistent data quality and improving model performance.


10. What is a Test set?
  - A test set is a subset of the dataset that is used to evaluate the final performance of a trained model. It contains data that the model has never seen before. The purpose of the test set is to provide an unbiased estimate of the model’s ability to generalize to new data.

11. How do we split data for model fitting (training and testing) in Python?
 -  In Python, data is typically split using the train_test_split() function from Scikit-learn. This function randomly divides the dataset into training and testing subsets, often with an 80:20 or 70:30 ratio.

 12. How do you approach a Machine Learning problem?
 -  Approaching an ML problem involves several steps:

1.Understand the problem and gather data.

2.Clean and preprocess the data.

3.Perform Exploratory Data Analysis (EDA).

4.Select appropriate features and models.

5.Train the model.

6.Evaluate performance using metrics.

7.Tune hyperparameters and validate.

8.Deploy and monitor the model.



13. Why do we have to perform EDA before fitting a model to the data?
 -  Exploratory Data Analysis (EDA) helps understand the underlying structure, patterns, and anomalies in the data. It includes visualization, statistical summaries, and correlation analysis. EDA helps in making informed decisions about feature selection, model choice, and data preprocessing, ultimately leading to better model accuracy.

 14. How can you find correlation between variables in Python?
Correlation between variables in Python can be found using the corr() method in pandas. For example:
import pandas as pd  
df.corr()  

15. What is causation? Explain the difference between correlation and causation with an example.
 -  Causation means that one event is the direct result of another. In contrast, correlation only shows that two variables move together, not necessarily that one causes the other. For example, there might be a strong correlation between ice cream sales and drowning incidents, but eating ice cream does not cause drowning—both increase during hot weather. Hence, while correlation is about relationships, causation implies a cause-effect link.




16. What is an Optimizer? What are different types of optimizers? Explain each with an example.
 -  An optimizer is an algorithm that adjusts the model’s parameters (like weights) to minimize the loss function. Common types of optimizers include:

  1.SGD (Stochastic Gradient Descent): Updates weights using a small batch.

  2.Adam (Adaptive Moment Estimation): Combines the advantages of SGD and RMSProp; widely used.

  3.RMSProp: Maintains per-parameter learning rates.


17. What is sklearn.linear_model?
 - sklearn.linear_model is a module in Scikit-learn that provides classes for linear models such as Linear Regression, Logistic Regression, Ridge, and Lasso. These models are used for regression and classification tasks. For instance, LinearRegression() is used to model relationships between continuous variables.


18. What does model.fit() do? What arguments must be given?
The model.fit() method trains the machine learning model on training data. It learns the relationship between input features (X) and target values (y). Required arguments include : model.fit(X_train, y_train)
Optional arguments can include epochs, batch_size, and validation_data depending on the framework used.


19. What does model.predict() do? What arguments must be given?
The model.predict() function uses the trained model to make predictions on new/unseen input data. It requires input features (X_test) as the argument:predictions = model.predict(X_test)
It returns the predicted values based on learned patterns from the training phase.

20. What is feature scaling? How does it help in Machine Learning?
  - Feature scaling is a technique to normalize or standardize the range of independent variables. It helps improve model convergence speed and accuracy, especially for algorithms that rely on distances like KNN, SVM, and gradient descent. Without scaling, models may assign higher importance to variables with larger ranges.

21. How do we perform scaling in Python?
 - Scaling in Python is commonly performed using Scikit-learn’s StandardScaler or MinMaxScaler:
rom sklearn.preprocessing import StandardScaler  
scaler = StandardScaler()  
X_scaled = scaler.fit_transform(X)
This transforms the data so that it has a mean of 0 and a standard deviation of 1.

22. What is sklearn.preprocessing?
  - sklearn.preprocessing is a module in Scikit-learn used to convert raw data into a suitable format before model training. It includes utilities for feature scaling, encoding categorical variables, imputing missing values, binarizing, and normalizing data.


23. How do we split data for model fitting (training and testing) in Python?
As mentioned earlier, use train_test_split() from sklearn.model_selection. Example:  from sklearn.model_selection import train_test_split  
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)


24. Explain data encoding.
Data encoding is the process of converting categorical data into a numerical format that machine learning algorithms can interpret. Techniques include:

  1.Label Encoding: Assigns each unique category a number.
  2.One-Hot Encoding: Converts categories into binary columns.
  3.Ordinal Encoding: Used when categories have a defined order.These help in converting text data into machine-readable form.

25. What is feature engineering?
  - Feature engineering is the process of using domain knowledge to create new features or modify existing ones to improve model performance. It includes operations like encoding, scaling, feature selection, and creating interaction terms. Well-engineered features often result in better predictive models and improved accuracy.













