**What is the difference between Logic and Fuzzy Logic?**

- **Logic** (often called classical or Boolean logic) is based on precise, binary reasoning. In classical logic, statements are <span style="color: red;">__either completely true or completely false (i.e., values are 1 or 0).__</span> This approach works well for problems that can be clearly defined with exact rules and boundaries.

- **Fuzzy Logic** extends classical logic by allowing for degrees of truth. Instead of only true or false, statements can have a <span style="color: red;">__value anywhere between 0 and 1, representing partial truth.__</span> Fuzzy logic is useful for dealing with uncertainty, vagueness, and situations where information is imprecise—such as describing something as "warm" or "tall." It is widely used in control systems, decision-making, and artificial intelligence to model real-world situations more naturally.


**What is Deterministic and Stochastic?**
- **Deterministic** refers to processes or systems where the outcome is completely determined by the initial conditions and rules. <span style="color: red;">__Given the same input, a deterministic system will always produce the same output.__</span> There is no randomness involved. For example, traditional computer programs and mathematical equations are deterministic.
- **Stochastic** refers to processes or systems that <span style="color: red;">__involve randomness or probability.__</span> The outcome is not fully predictable, even if the initial conditions are known, because there is some element of chance. Examples include rolling dice, weather patterns, and many machine learning algorithms that use random sampling or probabilistic models.


**What is Soft and Hard Computing?**

- **Hard Computing** refers to traditional computing methods that are based on precise, <span style="color: red;">__deterministic logic and algorithms.__</span> It requires exact input data and produces exact outputs. Examples include classical logic, binary arithmetic, and conventional programming.

- **Soft Computing** is <span style="color: red;">__stochastic__</span> i.e it is an approach that deals with approximate models and gives solutions to complex real-world problems where precision is not always possible. It incorporates techniques like fuzzy logic, neural networks, genetic algorithms, and probabilistic reasoning, allowing for tolerance of imprecision and uncertainty.


**What is Artificial Intelligence (AI)?**

Artificial Intelligence (AI) is a branch of computer science focused on creating systems or machines that can perform tasks that typically require human intelligence. These tasks include learning from experience, understanding natural language, recognizing patterns, solving problems, and making decisions.

AI encompasses a variety of techniques and approaches, such as machine learning, deep learning, natural language processing, robotics, and expert systems. The goal of AI is to develop systems that can <span style="color: red;">__mimic or simulate intelligent behavior, enabling computers to perform complex tasks in a way that appears "smart" or human-like.__</span>


**What is Machine Learning?**

Machine Learning is a subset of artificial intelligence (AI) that focuses on building systems that can learn from data and improve their performance over time without being explicitly programmed. Instead of following fixed rules, machine learning algorithms identify patterns and relationships in data to make predictions, classifications, or decisions.

There are different types of machine learning, including:
- **Supervised Learning:** The algorithm learns from labeled data, where the correct output is provided for each example.
- **Unsupervised Learning:** The algorithm finds patterns or groupings in data without labeled outputs.
- **Reinforcement Learning:** The algorithm learns by interacting with an environment and receiving feedback in the form of rewards or penalties.

Machine learning is widely used in applications such as image and speech recognition, recommendation systems, fraud detection, and more.


**What is Data Science?**

Data Science is an interdisciplinary field that uses scientific methods, algorithms, processes, and systems to extract knowledge and insights from structured and unstructured data. It combines techniques from statistics, computer science, mathematics, and domain expertise to analyze and interpret complex data.

Data Science involves several key steps, including data collection, cleaning, exploration, analysis, modeling, and visualization. The goal is to uncover patterns, make predictions, and support decision-making in a wide range of applications such as business, healthcare, finance, and more.

Data Science often leverages tools and techniques from machine learning, artificial intelligence, and big data technologies to handle large and complex datasets.


**Difference Between Data Science and Machine Learning**

- **Data Science** is a broad, interdisciplinary field that focuses on extracting knowledge and insights from data using a combination of statistics, mathematics, programming, and domain expertise. It covers the entire data pipeline, including data collection, cleaning, exploration, analysis, visualization, and interpretation to support decision-making.

- **Machine Learning** is a subset of artificial intelligence and a specialized area within data science. It involves developing algorithms and models that enable computers to learn from data and make predictions or decisions without being explicitly programmed.

**Key Differences:**
- Data science encompasses the whole process of working with data, while machine learning specifically focuses on building predictive models and algorithms.
- Data science uses a variety of tools and techniques (including machine learning) for data analysis, whereas machine learning is mainly concerned with model training, evaluation, and prediction.
- Data scientists may use machine learning as one of many tools to analyze data, but their work also involves tasks like data wrangling, visualization, and communicating results.

In summary, machine learning is a core component of data science, but data science is a broader field that includes many other skills and processes beyond just machine learning.


**Churn rate**

Customer Churn Rate = (Customers beginning of month – Customers end of month) / Customers beginning of month

(500-450)/500 = 50/500 = 10%

**What is categorical variable?**

Categorical variables are qalitative values than quantitave. And other wise called as labels.

There are broadly two types of categorical variables:
- 1. Nominal Variable: A nominal variable has <span style="color: red;">__no natural ordering to its categories.__</span> They have two or more categories. For example, Marital Status (Single, Married, Divorcee); Gender (Male, Female, Transgender), etc.
- 2. Ordinal Variable: A variable for which the categories <span style="color: red;">__can be placed in an order.__</span> For example, Customer Satisfaction (Excellent, Very Good, Good, Average, Bad), and so on


**What is cardinality?**

Cardinality refers to the <span style="color: red;">__uniqueness of data values contained in a column.__</span> High cardinality means that the column contains a large percentage of totally unique values. Low cardinality means that the column contains a lot of “repeats” in its data range.

**What is outliers?**

It can be the result of measurement or recording errors, or the unintended and truthful outcome resulting from the set’s definition.”

 You can find outliers using scatterplot

**What is an imbalanced dataset?**

An imbalanced dataset is a dataset where the distribution of examples across the <span style="color: red;">__different classes is not approximately equal.__</span> In other words, one class (or a few classes) has significantly more samples than the others.

This is common in classification problems, such as fraud detection, disease diagnosis, or rare event prediction, where the event of interest (positive class) is much less frequent than the negative class.

Imbalanced datasets can cause machine learning models to be biased toward the majority class, leading to poor performance on the minority class. Special techniques such as resampling, using different evaluation metrics, or applying specialized algorithms are often needed to address this issue.


**What is data leakage and its types?**

Data leakage (or "leakage") occurs when information from outside the training dataset is used to create the model, leading to overly optimistic performance estimates and poor generalization to new data. It happens when data that would not be available at prediction time is inadvertently used during model training.

**Types of Data Leakage:**

1. **Target Leakage (or Label Leakage):**
   - This occurs when the training data includes information about the target variable that would not be available at prediction time. For example, <span style="color: red;">__including a feature that is a future value or a direct proxy of the target.__</span> 
   - *Example:* Using a "discharge date" to predict whether a patient will be readmitted, when the discharge date is only known after the event.

2. **Train-Test Contamination:**
   - This happens when information from the test set leaks into the training set, often due to improper data splitting or preprocessing before splitting.
   - *Example:* Normalizing the entire dataset before splitting into train and test sets, causing information from the test set to influence the training process.

3. **Feature Leakage:**
   - Occurs when features are constructed using information that would <span style="color: red;">__not be available at prediction time, or when features are highly correlated with the target__</span>  due to data collection or processing errors.
   - *Example:* Including a feature that is derived from the target variable.

**How to Prevent Data Leakage:**
 - Always split your data into training and test sets before any preprocessing.
- Carefully review features to ensure none contain information that would not be available at prediction time.
- Be cautious with time-series data and avoid using future information.

Data leakage can lead to models that perform well in validation but fail in real-world scenarios, so it is important to identify and prevent it during the machine learning workflow.


**What is the difference between `fit` and `transform` methods in scikit-learn?**

In scikit-learn, many preprocessing and transformation classes (like `StandardScaler`, `PCA`, etc.) use the `fit` and `transform` methods as part of their workflow.

- **`fit`**: The `fit` method is used to learn or estimate parameters from the training data. For example, when you call `scaler.fit(X_train)`, the scaler computes the mean and standard deviation of the features in `X_train` and stores them internally.

- **`transform`**: The `transform` method uses the parameters learned during `fit` to modify the data. For example, `scaler.transform(X_test)` will use the mean and standard deviation computed from `X_train` to scale `X_test`.

- **`fit_transform`**: This is a convenience method that combines both steps: it first fits the transformer to the data, then transforms it.


![alt text](<Untitled picture.png>)