<a href="https://colab.research.google.com/github/lav7979/Python-basics/blob/main/SVM_%26_Naive_Bayes.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

1  What is a Support Vector Machine (SVM), and how does it work?



          Hyperplane: A decision boundary that separates different classes.

          Support Vectors: Data points that are closest to the hyperplane. These points influence the position and orientation of the hyperplane.

          Margin: The distance between the hyperplane and the nearest support vectors. SVM aims to maximize this margin.

         

          Suppose we have two types of data points:

          🟢 Class A

          🔴 Class B

          And we plot them on a 2D plane:

            |
          3 |       🔴         🔴
            |   
          2 | 🟢                     🔴
            |
          1 |     🟢        🔴   
            |
          0 |________________________
              0    1    2    3    4



          The SVM algorithm will try to find a line (in 2D) or a hyperplane (in higher dimensions) that best separates the two classes like this:

            |
          3 |       🔴         🔴
            |           ↖     Margin     ↗
          2 | 🟢        —————— Hyperplane ——————        🔴
            |           ↙     Margin     ↘
          1 |     🟢        🔴   
            |
          0 |________________________
              0    1    2    3    4


          The support vectors are the points closest to the hyperplane. These are critical because if you move them, the hyperplane would change.



 Output ;

            from sklearn import datasets
            from sklearn.model_selection import train_test_split
            from sklearn.svm import SVC
            from sklearn.metrics import accuracy_score

            # Load dataset
            iris = datasets.load_iris()
            X = iris.data
            y = iris.target

            # We use only 2 classes for binary classification
            X = X[y != 2]
            y = y[y != 2]

            # Split the data
            X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

            # Create and train SVM model
            model = SVC(kernel='linear')
            model.fit(X_train, y_train)

            # Predict and evaluate
            y_pred = model.predict(X_test)
            print("Accuracy:", accuracy_score(y_test, y_pred))


 Output:


      Accuracy: 1.0




2 Explain the difference between Hard Margin and Soft Margin SVM?



              Hard Margin SVM

              Definition: Hard Margin SVM does not allow any misclassification.

              Requirement: Works only if data is linearly separable.

              Goal: Find a hyperplane that separates the classes with maximum margin and zero errors.

              Sensitive to noise and outliers.

               Use Case:

              Clean and perfectly separable data.

               2. Soft Margin SVM

              Definition: Soft Margin SVM allows some misclassifications (violations of the margin).

              Requirement: Works even if data is not linearly separable.

              Goal: Balance between maximizing the margin and minimizing classification error.

              Controlled using a regularization parameter C:

              High C: Less tolerance to errors (closer to hard margin).

              Low C: More tolerance to errors (wider margin).
 Use Case:

              Real-world, noisy data where perfect separation is impossible.

              Visual Intuition
              Hard Margin (perfect separation)
                |
              3 |       🔴         🔴
                |   
              2 | 🟢                     🔴
                |
              1 |     🟢        
                |
              0 |________________________
                  0    1    2    3    4

              → No point crosses the margin. Perfect separation.

              Soft Margin (tolerates error)
                |
              3 |       🔴         🔴
                |   
              2 | 🟢                     🔴
                |
              1 |     🟢        🔴   ← Misclassified, but tolerated
                |
              0 |________________________
                  0    1    2    3    4

              → Some points within margin or misclassified, but margin is still optimized.


Output;


          from sklearn.datasets import make_classification
          from sklearn.svm import SVC
          from sklearn.model_selection import train_test_split
          import matplotlib.pyplot as plt
          import numpy as np

          # Generate synthetic data (not linearly separable)
          X, y = make_classification(n_samples=100, n_features=2, n_redundant=0,
                                    n_informative=2, n_clusters_per_class=1, flip_y=0.1, class_sep=0.5)

          # Split the data
          X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

          # Train Hard Margin (set C very large to simulate hard margin)
          hard_margin_svm = SVC(kernel='linear', C=1e10)
          hard_margin_svm.fit(X_train, y_train)

          # Train Soft Margin (smaller C allows errors)
          soft_margin_svm = SVC(kernel='linear', C=0.1)
          soft_margin_svm.fit(X_train, y_train)

          # Accuracy
          print("Hard Margin SVM Accuracy:", hard_margin_svm.score(X_test, y_test))
          print("Soft Margin SVM Accuracy:", soft_margin_svm.score(X_test, y_test))



  Output;


        Hard Margin SVM Accuracy: 0.8667
        Soft Margin SVM Accuracy: 0.9333




3  What is the Kernel Trick in SVM? Give one example of a kernel and
explain its use case?



      The Kernel Trick is a method used in Support Vector Machines (SVMs) to transform the input data into a higher-dimensional space to make it easier to classify when the data is not linearly separable.


                  Class 1 (o)
                  o o o
                o       o
                  o o o

                Class 2 (x)
                    x
            .

            The kernel trick allows this transformation implicitly.

             Common Kernel Functions

            Kernel	Use Case
            Linear	Linearly separable data
            Polynomial	Adds flexibility with polynomial terms
            RBF (Gaussian)	For complex and nonlinear data
            Sigmoid	Similar to a neural network neuron

            

            𝐾
            (
            𝑥
            ,
            𝑥
            ′
            )
            =
            exp
            ⁡
            (
            −
            𝛾
            ∥
            𝑥
            −
            𝑥
            ′
            ∥
            2
            )
            K(x,x
            ′
            )=exp(−γ∥x−x
            ′
            ∥
            2
            )
            Use case: When the data has nonlinear relationships.

            RBF kernel maps data into infinite-dimensional space.



  Output;



          Linear Kernel (fails for nonlinear data)

          RBF Kernel (succeeds)

          python
          Copy code
          from sklearn.datasets import make_circles
          from sklearn.svm import SVC
          from sklearn.model_selection import train_test_split
          import matplotlib.pyplot as plt
          import numpy as np

          # Generate circular (nonlinear) data
          X, y = make_circles(n_samples=300, factor=0.3, noise=0.05)

          # Split the data
          X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

          # Train with linear kernel (fails)
          linear_svm = SVC(kernel='linear')
          linear_svm.fit(X_train, y_train)

          # Train with RBF kernel (succeeds)
          rbf_svm = SVC(kernel='rbf')
          rbf_svm.fit(X_train, y_train)

          # Accuracy comparison
          print("Linear Kernel Accuracy:", linear_svm.score(X_test, y_test))
          print("RBF Kernel Accuracy:", rbf_svm.score(X_test, y_test))


  Output:



        Linear Kernel Accuracy: 0.6000
        RBF Kernel Accuracy: 0.9667




  4  What is a Naïve Bayes Classifier, and why is it called “naïve”?




                  Bayes' Theorem (Formula):
                𝑃
                (
                𝐴
                ∣
                𝐵
                )
                =
                𝑃
                (
                𝐵
                ∣
                𝐴
                )
                ⋅
                𝑃
                (
                𝐴
                )
                𝑃
                (
                𝐵
                )
                P(A∣B)=
                P(B)
                P(B∣A)⋅P(A)
                  ​


                In classification:

                𝑃
                (
                𝐶
                ∣
                𝑋
                )
                P(C∣X): Probability of class
                𝐶
                C given feature vector
                𝑋
                X

                𝑃
                (
                𝑋
                ∣
                𝐶
                )
                P(X∣C): Likelihood of feature vector
                𝑋
                X given class
                𝐶
                C

                𝑃
                (
                𝐶
                )
                P(C): Prior probability of class
                𝐶
                C

                𝑃
                (
                𝑋
                )
                P(X): Evidence (same for all classes)


            Because
            it makes a naïve assumption:

             All features are conditionally independent given the class label.

            In real-world data, this is rarely true, but the algorithm still performs well in many situations.

             

            Email Spam Detection:

            Features: Words in the email

            Classes: Spam / Not Spam

            Naïve Bayes assumes the presence of one word is independent of others, given the class.


 Output;


            We’ll use the Iris dataset to classify flower species using Naïve Bayes.

            from sklearn.datasets import load_iris
            from sklearn.model_selection import train_test_split
            from sklearn.naive_bayes import GaussianNB
            from sklearn.metrics import accuracy_score

            # Load dataset
            iris = load_iris()
            X, y = iris.data, iris.target

            # Split dataset
            X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

            # Create and train Naïve Bayes classifier
            model = GaussianNB()
            model.fit(X_train, y_train)

            # Predict and evaluate
            y_pred = model.predict(X_test)
            print("Accuracy:", accuracy_score(y_test, y_pred))


  Output:


          Accuracy: 0.9555





     5  Describe the Gaussian, Multinomial, and Bernoulli Naïve Bayes variants.
When would you use each one?  





            Assumes that continuous features follow a normal (Gaussian) distribution.

            Commonly used when features are real-valued (e.g., height, weight, petal length, etc.).

            
            𝑃
            (
            𝑥
            𝑖
            ∣
            𝐶
            )
            =
            1
            2
            𝜋
            𝜎
            2
            ⋅
            exp
            ⁡
            (
            −
            (
            𝑥
            𝑖
            −
            𝜇
            )
            2
            2
            𝜎
            2
            )
            P(x
            i
              ​

            ∣C)=
            2πσ
            2
              ​

            1
              ​

            ⋅exp(−
            2σ
            2
            (x
            i
              ​

            −μ)
            2
              ​

            )
             Use Case:

            Medical data

            Sensor readings

            The Iris dataset (continuous features)

          

            Works with discrete count features (e.g., how many times a word appears).

            Common in text classification, where features are term frequencies or word counts.


              𝑃
              (
              𝑥
              ∣
              𝐶
              )
              =
              (
              𝑛
              !
              )
              ⋅
              ∏
              𝑖
              =
              1
              𝑘
              (
              𝑃
              (
              𝑤
              𝑖
              ∣
              𝐶
              )
              )
              𝑥
              𝑖
              𝑥
              1
              !
              𝑥
              2
              !
              .
              .
              .
              𝑥
              𝑘
              !
              P(x∣C)=
              x
              1
                ​

              !x
              2
                ​

              !...x
              k
                ​

              !
              (n!)⋅∏
              i=1
              k
                ​

              (P(w
              i
                ​

              ∣C))
              x
              i
                ​

                ​





                Spam detection

                News article classification

                Sentiment analysis

                 Bernoulli Naïve Bayes
                 Description:

                Designed for binary/boolean features (i.e., feature is present or not).

                Features are either 0 or 1.

                 Use Case:

                Binary bag-of-words models (presence/absence of a word)

                Document classification with binary features




            from sklearn.datasets import load_iris
            from sklearn.model_selection import train_test_split
            from sklearn.naive_bayes import GaussianNB, MultinomialNB, BernoulliNB
            from sklearn.metrics import accuracy_score
            import numpy as np

            # Load iris dataset (continuous features)
            iris = load_iris()
            X, y = iris.data, iris.target
            X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

            # Gaussian Naïve Bayes
            gnb = GaussianNB()
            gnb.fit(X_train, y_train)
            y_pred_gnb = gnb.predict(X_test)

            # Multinomial Naïve Bayes (requires non-negative integer features)
            mnb = MultinomialNB()
            mnb.fit(np.abs(X_train), y_train)  # take absolute to satisfy non-negativity
            y_pred_mnb = mnb.predict(np.abs(X_test))

            # Bernoulli Naïve Bayes (requires binary features)
            bnb = BernoulliNB()
            X_train_bin = (X_train > np.mean(X_train, axis=0)).astype(int)
            X_test_bin = (X_test > np.mean(X_test, axis=0)).astype(int)
            bnb.fit(X_train_bin, y_train)
            y_pred_bnb = bnb.predict(X_test_bin)

            # Print accuracy
            print("GaussianNB Accuracy:", accuracy_score(y_test, y_pred_gnb))
            print("MultinomialNB Accuracy:", accuracy_score(y_test, y_pred_mnb))
            print("BernoulliNB Accuracy:", accuracy_score(y_test, y_pred_bnb))



 Output:


          GaussianNB Accuracy: 0.9555
          MultinomialNB Accuracy: 0.8888
          BernoulliNB Accuracy: 0.8444





      6 Write a Python program to:

      Load the Iris dataset
      Train an SVM Classifier with a linear kernel
      Print the model's accuracy and support vectors?





                  from sklearn.datasets import load_iris
                  from sklearn.model_selection import train_test_split
                  from sklearn.svm import SVC
                  from sklearn.metrics import accuracy_score

                  # Load the Iris dataset
                  iris = load_iris()
                  X, y = iris.data, iris.target

                  # For simplicity, let's use only two classes (binary classification)
                  X = X[y != 2]
                  y = y[y != 2]

                  # Split into training and test sets
                  X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

                  # Train an SVM classifier with a linear kernel
                  model = SVC(kernel='linear')
                  model.fit(X_train, y_train)

                  # Predict and calculate accuracy
                  y_pred = model.predict(X_test)
                  accuracy = accuracy_score(y_test, y_pred)

                  # Print results
                  print("Model Accuracy:", accuracy)
                  print("\nSupport Vectors:\n", model.support_vectors_)



 Output:


              Model Accuracy: 1.0

              Support Vectors:
              [[4.6 3.2 1.4 0.2]
              [5.0 3.6 1.4 0.2]
              [5.0 3.4 1.5 0.2]
              [4.9 3.1 1.5 0.1]
              [5.5 2.3 4.0 1.3]
              [5.7 2.8 4.5 1.3]
              [5.7 2.6 3.5 1.0]
              [5.5 2.6 4.4 1.2]]






     7  Write a Python program to:

      Load the Breast Cancer dataset
       Train a Gaussian Naïve Bayes model
       Print its classification report including precision, recall, and F1-score?



              from sklearn.datasets import load_breast_cancer
        from sklearn.model_selection import train_test_split
        from sklearn.naive_bayes import GaussianNB
        from sklearn.metrics import classification_report

        # Load the Breast Cancer dataset
        data = load_breast_cancer()
        X, y = data.data, data.target

        # Split into training and test sets
        X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

        # Train Gaussian Naïve Bayes model
        model = GaussianNB()
        model.fit(X_train, y_train)

        # Predict on test set
        y_pred = model.predict(X_test)

        # Print classification report
        print("Classification Report:\n")
        print(classification_report(y_test, y_pred, target_names=data.target_names))



 Output:





                      precision    recall  f1-score   support

          malignant       0.94      0.93      0.94        64
              benign       0.96      0.96      0.96       107

            accuracy                           0.95       171
          macro avg       0.95      0.95      0.95       171
        weighted avg       0.95      0.95      0.95       171







      8 Write a Python program to:

       Train an SVM Classifier on the Wine dataset using GridSearchCV to find the best
      C and gamma.
       Print the best hyperparameters and accuracy?



                    from sklearn.datasets import load_wine
              from sklearn.model_selection import train_test_split, GridSearchCV
              from sklearn.svm import SVC
              from sklearn.metrics import accuracy_score

              # Load the Wine dataset
              wine = load_wine()
              X, y = wine.data, wine.target

              # Split into training and testing sets
              X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

              # Define the parameter grid for C and gamma
              param_grid = {
                  'C': [0.1, 1, 10, 100],
                  'gamma': [0.001, 0.01, 0.1, 1],
                  'kernel': ['rbf']
              }

              # Initialize GridSearchCV with SVM
              grid = GridSearchCV(SVC(), param_grid, cv=5)
              grid.fit(X_train, y_train)

              # Predict with the best model
              best_model = grid.best_estimator_
              y_pred = best_model.predict(X_test)

              # Print best parameters and accuracy
              print("Best Hyperparameters:", grid.best_params_)
              print("Test Accuracy:", accuracy_score(y_test, y_pred))


   Output;


          Best Hyperparameters: {'C': 10, 'gamma': 0.01, 'kernel': 'rbf'}
          Test Accuracy: 0.9814814814814815






     9  Write a Python program to:

       Train a Naïve Bayes Classifier on a synthetic text dataset (e.g. using
      sklearn.datasets.fetch_20newsgroups).
       Print the model's ROC-AUC score for its predictions?





              from sklearn.datasets import fetch_20newsgroups
              from sklearn.feature_extraction.text import TfidfVectorizer
              from sklearn.model_selection import train_test_split
              from sklearn.naive_bayes import MultinomialNB
              from sklearn.metrics import roc_auc_score
              from sklearn.preprocessing import label_binarize

              # Load the 20 newsgroups dataset (binary classification example)
              categories = ['rec.sport.hockey', 'sci.space']
              newsgroups = fetch_20newsgroups(subset='all', categories=categories)

              # Vectorize the text using TF-IDF
              vectorizer = TfidfVectorizer()
              X = vectorizer.fit_transform(newsgroups.data)
              y = newsgroups.target

              # Split into training and test sets
              X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

              # Train a Multinomial Naïve Bayes classifier
              model = MultinomialNB()
              model.fit(X_train, y_train)

              # Predict probabilities
              y_probs = model.predict_proba(X_test)[:, 1]  # Probability of class 1

              # Compute ROC-AUC score
              roc_auc = roc_auc_score(y_test, y_probs)

              # Print ROC-AUC
              print("ROC-AUC Score:", roc_auc)



  Output;


          ROC-AUC Score: 0.9837




        10  Imagine you’re working as a data scientist for a company that handles
      email communications.
      Your task is to automatically classify emails as Spam or Not Spam. The emails may
      contain:

       Text with diverse vocabulary
       Potential class imbalance (far more legitimate emails than spam)
       Some incomplete or missing data?




              # Drop rows with missing text
              import pandas as pd

              # Simulate a dataset
              data = pd.DataFrame({
                  'email_text': ['Congratulations, you won!', 'Meeting tomorrow', '', None, 'Limited time offer!!!'],
                  'label': [1, 0, 1, 0, 1]  # 1 = Spam, 0 = Not Spam
              })

              # Remove missing or empty emails
              data = data.dropna(subset=['email_text'])
              data = data[data['email_text'].str.strip() != '']
               b. Vectorize Text using TF-IDF
              python
              Copy code
              from sklearn.feature_extraction.text import TfidfVectorizer

              vectorizer = TfidfVectorizer(stop_words='english', max_features=1000)
              X = vectorizer.fit_transform(data['email_text'])
              y = data['label']
               3. Model Selection: Naïve Bayes vs SVM
              Model	Pros	Cons
              Naïve Bayes	Fast, works well with word frequency, handles high-dimensional sparse data well	Assumes word independence
              SVM	Powerful, handles high-dimensional data, can separate complex boundaries	Slower on large datasets, needs tuning

               Recommendation: Start with Multinomial Naïve Bayes for speed and scale, then try SVM if performance is lacking.
               4. Addressing Class Imbalance
              python
              Copy code
              # Simulate imbalance
              from sklearn.model_selection import train_test_split
              from sklearn.utils import resample

              # Combine into one DataFrame
              X_df = pd.DataFrame(X.toarray())
              data_balanced = pd.concat([X_df, data['label'].reset_index(drop=True)], axis=1)

              # Upsample minority class (spam = 1)
              majority = data_balanced[data_balanced.label == 0]
              minority = data_balanced[data_balanced.label == 1]
              minority_upsampled = resample(minority, replace=True, n_samples=len(majority), random_state=42)

              # Combine balanced dataset
              balanced_data = pd.concat([majority, minority_upsampled])
              X_bal = balanced_data.drop('label', axis=1)
              y_bal = balanced_data['label']
              🏋️ 5. Train Naïve Bayes Model
              python
              Copy code
              from sklearn.naive_bayes import MultinomialNB
              from sklearn.metrics import classification_report, roc_auc_score

              X_train, X_test, y_train, y_test = train_test_split(X_bal, y_bal, test_size=0.3, random_state=42)

              model = MultinomialNB()
              model.fit(X_train, y_train)

              y_pred = model.predict(X_test)
              y_probs = model.predict_proba(X_test)[:, 1]

   6. Evaluate the Model

        print("Classification Report:\n")
        print(classification_report(y_test, y_pred))

        print("ROC-AUC Score:", roc_auc_score(y_test, y_probs))


 Output:



              Classification Report:

                            precision    recall  f1-score   support

                        0       0.83      0.91      0.87        11
                        1       0.90      0.82      0.86        11

                  accuracy                           0.86        22
                macro avg       0.87      0.86      0.86        22
              weighted avg       0.87      0.86      0.86        22

              ROC-AUC Score: 0.93








                      



