# Support Vector Machines (SVMs)
Support Vector Machines (SVMs) are supervised learning algorithms used for classification and regression. In classification, the goal is to find a decision boundary that separates data points into different classes. SVMs do this by finding the maximum margin hyperplane, which is the hyperplane that maximizes the distance between the closest data points of each class.



**Here are some real-world examples of how SVMs are used**:

1. Image recognition: SVMs are used to classify images into different categories, such as cats, dogs, and cars.
2. Text classification: SVMs are used to classify text documents, such as spam emails, news articles, and social media posts.
3. Fraud detection: SVMs are used to detect fraudulent transactions, such as credit card fraud and insurance fraud.

**Support Vector Machines (SVMs)** are powerful algorithms for **classification and regression**, and their effectiveness can be significantly influenced by the choice of kernel function. Here's a breakdown of various SVM kernels along with their strengths and weaknesses:

**1. Linear Kernel:**

Description: Projects data points onto a linear hyperplane for separation.
Strengths: Good for linearly separable data, interpretable decision boundaries, computationally efficient.
Weaknesses: Not suited for complex non-linear relationships, potentially underfits data with multiple overlapping classes.

**2. Radial Basis Function (RBF) Kernel:**


Description: Maps data to a higher-dimensional space via the radial basis function, creating smoother decision boundaries for non-linear relationships.
Strengths: Highly flexible for handling non-linear data, efficient for large datasets.
Weaknesses: Requires tuning the 'gamma' parameter, less interpretable compared to linear kernel, higher computational cost.

**3. Polynomial Kernel:**


Description: Raises the dot product of feature vectors to a specified degree, creating increasingly complex decision boundaries with higher polynomial degrees.
Strengths: Can capture intricate non-linear patterns, suitable for data with periodic or cyclical relationships.
Weaknesses: Requires careful tuning of the 'degree' and potentially the 'coefficient' parameter, prone to overfitting with high degrees, interpretability decreases with higher degrees.

**4. Sigmoid Kernel:**


Description: Similar to the tanh function, useful for data with periodic or cyclical patterns.
Strengths: Introduces non-linearity while remaining computationally efficient compared to RBF kernel, can handle noisy data.
Weaknesses: Sensitive to parameter tuning, less commonly used than RBF or polynomial kernels due to lower performance compared to RBF on most tasks.

**5. Laplace RBF Kernel:**


Description: Similar to the RBF kernel but with a heavier influence on closer data points, potentially leading to sparser models.
Strengths: Can be advantageous for high-dimensional data or when computational efficiency is a concern.
Weaknesses: Requires tuning the 'gamma' parameter like the RBF kernel, decision boundaries might be less smooth compared to regular RBF.

**6. ANOVA RBF Kernel:**


Description: Employs analysis of variance (ANOVA) principles to assign different weights to different features based on their discriminatory power.
Strengths: Can be effective for high-dimensional data with irrelevant or redundant features, improves interpretability by focusing on influential features.
Weaknesses: Requires careful feature selection and parameter tuning, potentially computationally expensive.
Choosing the right kernel:

**The optimal kernel for your SVM depends on various factors like:**

Data dimensionality and complexity: Linear kernels work well for linearly separable data, while RBF and polynomial kernels handle non-linearity effectively.
Computational efficiency: Linear and sigmoid kernels are faster to train than RBF and ANOVA kernels.
Interpretability: Linear kernels offer readily interpretable decision boundaries, while non-linear kernels can be less transparent.
Experimentation is key:

Try different kernels with parameter tuning to find the best fit for your specific data and task. There's no one-size-fits-all solution, and the best kernel depends on your unique data and goals.



=============================================================

# Periodic and cyclical relationships
**Periodic and cyclical relationships** refer to patterns in data that repeat over time or space at regular intervals. These patterns can be expressed in various ways:

**Periodic:**

**Occurs at fixed intervals**. Think of a sine wave, where the same peak and trough repeat consistently throughout the cycle.
Examples: Day and night cycle, heart rate, seasonal variations in temperature.

**Cyclical:**

**Similar to periodic, but may not have perfectly fixed intervals.** The pattern tends to repeat, but the time or distance between repetitions can vary slightly.
Examples: Stock market fluctuations, economic cycles, lunar phases.
Here are some specific characteristics of periodic and cyclical relationships:

**Repetition:** The key feature is the repeating pattern. Data points follow a similar trend as they move through the cycle.
Predictability: Based on the established pattern, future values can be predicted with some degree of accuracy.
**Frequency:** The rate at which the pattern repeats can be measured in various units, such as seconds, years, or rotations.
**Amplitude**: The magnitude of the variations within the cycle can be quantified and analyzed.

**Applications of understanding periodic and cyclical relationship**s:

**Forecasting:** Identifying and analyzing these patterns allows us to predict future trends in various fields, from weather forecasting to financial market analysis.

**Signal processing:** Filtering out noise and extracting relevant information from signals often involves understanding their underlying periodic or cyclical components.

**Data compression:** Exploiting the repetitive nature of these patterns can lead to efficient data compression techniques.

**Pattern recognition:** Many machine learning algorithms rely on identifying and learning from recurring patterns in data, and understanding periodic and cyclical relationships is crucial for their effectiveness.

In [8]:
import pandas as pd
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split

# Create a dictionary of fruit data
fruit_data = {
    "type": ["apple", "apple", "orange", "orange", "banana", "banana", "pineapple", "pineapple"],
    "weight": [150, 180, 120, 140, 100, 120, 200, 250],
    "diameter": [7, 8, 7.5, 8, 5, 6, 10, 12],
    "sweetness": [3, 4, 2, 3, 5, 4, 1, 2],
    "acidity": [1, 2, 3, 4, 1, 2, 0.5, 1],
}

# Create a Pandas DataFrame from the dictionary
fruits_df = pd.DataFrame(fruit_data)

# Separate features and target labels
features = fruits_df[["weight", "diameter", "sweetness", "acidity"]]
labels = fruits_df["type"]




Accuracy: 0.5000
Predicted type for new fruit: ['apple']




In [9]:
fruits_df.head()

Unnamed: 0,type,weight,diameter,sweetness,acidity
0,apple,150,7.0,3,1.0
1,apple,180,8.0,4,2.0
2,orange,120,7.5,2,3.0
3,orange,140,8.0,3,4.0
4,banana,100,5.0,5,1.0


In [10]:
# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(features, labels, test_size=0.2)

# Train an SVM model
model = SVC(kernel="linear", C=1.0)
model.fit(X_train, y_train)



In [11]:
# Predict fruit types for test data
y_pred = model.predict(X_test)

# Evaluate model accuracy
from sklearn.metrics import accuracy_score
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.4f}")

# Predict the type of a new fruit
new_fruit = [145, 7,  3,  2]
predicted_type = model.predict([new_fruit])
print(f"Predicted type for new fruit: {predicted_type}")

Accuracy: 0.5000
Predicted type for new fruit: ['apple']




In [13]:
# Define kernel options
kernels = ["linear", "rbf", "poly"]

# Loop through kernels and create models
models = []
for kernel in kernels:
    if kernel == "rbf":
        model = SVC(kernel=kernel, gamma=0.1, C=1.0)  # Tune gamma for RBF
    elif kernel == "poly":
        model = SVC(kernel=kernel, degree=2, C=1.0)  # Tune degree for Polynomial
    else:
        model = SVC(kernel=kernel, C=1.0)  # Use default parameters for linear
    models.append(model)

# Fit each model on the same data
for i, model in enumerate(models):
    model.fit(X_train, y_train)
    # Predict the type of a new fruit
    new_fruit = [145, 7,  3,  2]
    predicted_type = model.predict([new_fruit])
    print(f"Predicted type for new fruit: {predicted_type}")


# Evaluate and compare models (e.g., accuracy, ROC AUC)
# Select the best-performing model or analyze differences between results

Predicted type for new fruit: ['apple']
Predicted type for new fruit: ['apple']
Predicted type for new fruit: ['banana']




In [2]:
import pandas as pd
import numpy as np

np.random.seed(1)

def generate_data(n_samples):
    x1 = np.random.uniform(0, 2, size=n_samples)
    x2 = np.sin(5 * np.pi * x1) + np.random.rand(n_samples)
    y = np.where(x2 > 1, "green", "blue")
    return pd.DataFrame({"x1": x1, "x2": x2, "label": y})

df = generate_data(100)
print(df.head())


         x1        x2 label
0  0.834044  0.836281  blue
1  1.440649 -0.068944  blue
2  0.000229  0.889535  blue
3  0.604665  0.284055  blue
4  0.293512 -0.086276  blue


In [3]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   x1      100 non-null    float64
 1   x2      100 non-null    float64
 2   label   100 non-null    object 
dtypes: float64(2), object(1)
memory usage: 2.5+ KB


This code splits the data, trains models for each kernel, calculates test accuracy, and prints information for each kernel. You can further analyze the results by visualizing decision boundaries and adjusting hyperparamete

In [6]:
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

kernels = ["linear", "rbf", "poly", "sigmoid"]
models = []
accuracies = []

for kernel in kernels:
    # Split data into training and testing sets
    X_train, X_test, y_train, y_test = train_test_split(df[["x1", "x2"]], df["label"], test_size=0.2)

    # Create and train an SVM model with specific kernel
    model = SVC(kernel=kernel)
    model.fit(X_train, y_train)
    models.append(model)

    # Evaluate accuracy on test data
    y_pred = model.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    accuracies.append(accuracy)

    # Print information for each kernel
    print(f"\nKernel: {kernel}")
    print(f"Accuracy: {accuracy:.4f}")





Kernel: linear
Accuracy: 0.9000

Kernel: rbf
Accuracy: 0.9500

Kernel: poly
Accuracy: 0.9500

Kernel: sigmoid
Accuracy: 0.7000


# Kernel Examples:

1. Linear Kernel:

In [21]:
# Accuracy might be low due to non-linear data
model = SVC(kernel="linear")
model.fit(X_train, y_train)


In [23]:
pred=model.predict([[2,4]])
print(pred[0])

green




2. RBF Kernel:

In [24]:
# Introduces non-linearity, likely higher accuracy than linear
model = SVC(kernel="rbf", gamma=0.5)  # Tune gamma parameter
model.fit(X_train, y_train)


In [25]:
pred=model.predict([[2,4]])
print(pred[0])

blue




3. Polynomial Kernel:

In [26]:
# Higher degree might overfit, tune degree and coefficient
model = SVC(kernel="poly", degree=2, coef0=1)
model.fit(X_train, y_train)


In [27]:
pred=model.predict([[2,4]])
print(pred[0])

green




4. Sigmoid Kernel:

In [28]:
# Sensitive to parameter tuning, consider alternatives
model = SVC(kernel="sigmoid", gamma=0.1)
model.fit(X_train, y_train)


In [29]:
pred=model.predict([[2,4]])
print(pred[0])

green




In [None]:
# Requires careful feature selection and parameter tuning
model = SVC(kernel="precomputed", gamma=0.5)
model.fit(X_train, y_train)
