# Machine Learning Algorithms and Their Examples (Part 2)

In this notebook, we will cover more machine learning algorithms along with examples of how to use them. We'll use popular Python libraries such as scikit-learn to implement these algorithms.

## 8. Principal Component Analysis (PCA)

PCA is a dimensionality reduction technique used to reduce the number of features in a dataset.

In [1]:
# Example: Principal Component Analysis using scikit-learn
from sklearn.decomposition import PCA
import pandas as pd

# Sample data
data = {
    'feature1': [1, 2, 3, 4, 5],
    'feature2': [2, 4, 6, 8, 10],
    'feature3': [5, 4, 3, 2, 1]
}
df = pd.DataFrame(data)

# Apply PCA
pca = PCA(n_components=2)
reduced_data = pca.fit_transform(df)
print(f'Reduced data: {reduced_data}')

Reduced data: [[ 4.89897949e+00  3.84592537e-16]
 [ 2.44948974e+00 -1.28197512e-16]
 [-0.00000000e+00 -0.00000000e+00]
 [-2.44948974e+00  1.28197512e-16]
 [-4.89897949e+00  2.56395025e-16]]


## 9. Gradient Boosting

Gradient Boosting is an ensemble technique that builds multiple decision trees sequentially to improve performance.

In [6]:
# Example: Gradient Boosting using scikit-learn
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
# Sample data
data = {
    'feature': [1, 2, 3, 4, 5],
    'target': [0, 0, 0, 1, 1]
}
df = pd.DataFrame(data)

# Split data
X = df[['feature']]
y = df['target']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train model
model = GradientBoostingClassifier()
model.fit(X_train, y_train)

# Predict and evaluate
predictions = model.predict(X_test)
accuracy = accuracy_score(y_test, predictions)
print(f'Accuracy: {accuracy}')
print(f'Predictions: {predictions}')

Accuracy: 1.0
Predictions: [0]


## 10. Naive Bayes

Naive Bayes classifiers are simple probabilistic classifiers based on Bayes' theorem.

In [7]:
# Example: Naive Bayes using scikit-learn
from sklearn.naive_bayes import GaussianNB

# Sample data
data = {
    'feature1': [1, 2, 3, 4, 5],
    'feature2': [2, 4, 6, 8, 10],
    'target': [0, 0, 0, 1, 1]
}
df = pd.DataFrame(data)

# Split data
X = df[['feature1', 'feature2']]
y = df['target']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train model
model = GaussianNB()
model.fit(X_train, y_train)

# Predict and evaluate
predictions = model.predict(X_test)
accuracy = accuracy_score(y_test, predictions)
print(f'Accuracy: {accuracy}')
print(f'Predictions: {predictions}')

Accuracy: 1.0
Predictions: [0]


## 11. XGBoost

XGBoost is an optimized gradient boosting library designed to be highly efficient and flexible.

In [8]:
# Example: XGBoost using xgboost library
import xgboost as xgb
from sklearn.metrics import accuracy_score

# Sample data
data = {
    'feature1': [1, 2, 3, 4, 5],
    'feature2': [2, 4, 6, 8, 10],
    'target': [0, 0, 0, 1, 1]
}
df = pd.DataFrame(data)

# Split data
X = df[['feature1', 'feature2']]
y = df['target']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train model
model = xgb.XGBClassifier()
model.fit(X_train, y_train)

# Predict and evaluate
predictions = model.predict(X_test)
accuracy = accuracy_score(y_test, predictions)
print(f'Accuracy: {accuracy}')
print(f'Predictions: {predictions}')

Accuracy: 1.0
Predictions: [0]


## 12. AdaBoost

AdaBoost is an ensemble technique that combines multiple weak classifiers to form a strong classifier.

In [9]:
# Example: AdaBoost using scikit-learn
from sklearn.ensemble import AdaBoostClassifier

# Sample data
data = {
    'feature1': [1, 2, 3, 4, 5],
    'feature2': [2, 4, 6, 8, 10],
    'target': [0, 0, 0, 1, 1]
}
df = pd.DataFrame(data)

# Split data
X = df[['feature1', 'feature2']]
y = df['target']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train model
model = AdaBoostClassifier()
model.fit(X_train, y_train)

# Predict and evaluate
predictions = model.predict(X_test)
accuracy = accuracy_score(y_test, predictions)
print(f'Accuracy: {accuracy}')
print(f'Predictions: {predictions}')

Accuracy: 1.0
Predictions: [0]


## 13. LightGBM

LightGBM is a gradient boosting framework that uses tree-based learning algorithms.

In [10]:
# Example: LightGBM using lightgbm library
import lightgbm as lgb
from sklearn.metrics import accuracy_score

# Sample data
data = {
    'feature1': [1, 2, 3, 4, 5],
    'feature2': [2, 4, 6, 8, 10],
    'target': [0, 0, 0, 1, 1]
}
df = pd.DataFrame(data)

# Split data
X = df[['feature1', 'feature2']]
y = df['target']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train model
model = lgb.LGBMClassifier()
model.fit(X_train, y_train)

# Predict and evaluate
predictions = model.predict(X_test)
accuracy = accuracy_score(y_test, predictions)
print(f'Accuracy: {accuracy}')
print(f'Predictions: {predictions}')

Dask dataframe query planning is disabled because dask-expr is not installed.

You can install it with `pip install dask[dataframe]` or `conda install dask`.
This will raise in a future version.



[LightGBM] [Info] Number of positive: 2, number of negative: 2
[LightGBM] [Info] Total Bins 0
[LightGBM] [Info] Number of data points in the train set: 4, number of used features: 0
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.500000 -> initscore=0.000000
Accuracy: 1.0
Predictions: [0]


## 14. CatBoost

CatBoost is a high-performance open-source library for gradient boosting on decision trees.

In [12]:
# Example: CatBoost using catboost library
!pip install catboost
from catboost import CatBoostClassifier

# Sample data
data = {
    'feature1': [1, 2, 3, 4, 5],
    'feature2': [2, 4, 6, 8, 10],
    'target': [0, 0, 0, 1, 1]
}
df = pd.DataFrame(data)

# Split data
X = df[['feature1', 'feature2']]
y = df['target']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train model
model = CatBoostClassifier(verbose=0)
model.fit(X_train, y_train)

# Predict and evaluate
predictions = model.predict(X_test)
accuracy = accuracy_score(y_test, predictions)
print(f'Accuracy: {accuracy}')
print(f'Predictions: {predictions}')

Collecting catboost
  Downloading catboost-1.2.5-cp310-cp310-manylinux2014_x86_64.whl.metadata (1.2 kB)
Downloading catboost-1.2.5-cp310-cp310-manylinux2014_x86_64.whl (98.2 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m98.2/98.2 MB[0m [31m10.0 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: catboost
Successfully installed catboost-1.2.5
Accuracy: 1.0
Predictions: [0]
