# More Commonly Used Functions in scikit-learn

In this notebook, we will cover even more commonly used functions in the scikit-learn library, excluding those already covered in the previous notebooks.

## 1. RobustScaler

The `RobustScaler` function is used to scale features using statistics that are robust to outliers.

In [None]:
# Example: RobustScaler
from sklearn.preprocessing import RobustScaler
import pandas as pd

# Sample data
data = {
    'feature1': [1, 2, 3, 4, 100],
    'feature2': [10, 20, 30, 40, 50]
}
df = pd.DataFrame(data)

# Scale features
scaler = RobustScaler()
scaled_data = scaler.fit_transform(df)

print(f'Scaled data: {scaled_data}')

## 2. PowerTransformer

The `PowerTransformer` function is used to apply a power transform to make data more Gaussian-like.

In [None]:
# Example: PowerTransformer
from sklearn.preprocessing import PowerTransformer
import pandas as pd

# Sample data
data = {
    'feature1': [1, 2, 3, 4, 5],
    'feature2': [10, 20, 30, 40, 50]
}
df = pd.DataFrame(data)

# Apply power transform
transformer = PowerTransformer()
transformed_data = transformer.fit_transform(df)

print(f'Transformed data: {transformed_data}')

## 3. QuantileTransformer

The `QuantileTransformer` function is used to transform features to follow a uniform or normal distribution.

In [None]:
# Example: QuantileTransformer
from sklearn.preprocessing import QuantileTransformer
import pandas as pd

# Sample data
data = {
    'feature1': [1, 2, 3, 4, 5],
    'feature2': [10, 20, 30, 40, 50]
}
df = pd.DataFrame(data)

# Apply quantile transform
transformer = QuantileTransformer(output_distribution='normal')
transformed_data = transformer.fit_transform(df)

print(f'Transformed data: {transformed_data}')

## 4. Binarizer

The `Binarizer` function is used to binarize data (set feature values to 0 or 1) based on a threshold.

In [None]:
# Example: Binarizer
from sklearn.preprocessing import Binarizer
import pandas as pd

# Sample data
data = {
    'feature1': [1, 2, 3, 4, 5],
    'feature2': [10, 20, 30, 40, 50]
}
df = pd.DataFrame(data)

# Binarize features
binarizer = Binarizer(threshold=25)
binarized_data = binarizer.fit_transform(df)

print(f'Binarized data: {binarized_data}')

## 5. FeatureHasher

The `FeatureHasher` function is used to transform categorical features into a sparse matrix of occurrence counts.

In [None]:
# Example: FeatureHasher
from sklearn.feature_extraction import FeatureHasher

# Sample data
data = [{'dog': 1, 'cat': 2}, {'dog': 2, 'run': 5}]

# Apply feature hashing
hasher = FeatureHasher(n_features=10, input_type='dict')
hashed_features = hasher.fit_transform(data)

print(f'Hashed features: {hashed_features.toarray()}')

## 6. Normalizer

The `Normalizer` function is used to normalize samples individually to unit norm.

In [None]:
# Example: Normalizer
from sklearn.preprocessing import Normalizer
import pandas as pd

# Sample data
data = {
    'feature1': [1, 2, 3, 4, 5],
    'feature2': [10, 20, 30, 40, 50]
}
df = pd.DataFrame(data)

# Normalize features
normalizer = Normalizer()
normalized_data = normalizer.fit_transform(df)

print(f'Normalized data: {normalized_data}')

## 7. KMeans

The `KMeans` function is used for clustering data into K distinct clusters.

In [None]:
# Example: KMeans
from sklearn.cluster import KMeans
import pandas as pd

# Sample data
data = {
    'feature1': [1, 2, 3, 4, 5],
    'feature2': [10, 20, 30, 40, 50]
}
df = pd.DataFrame(data)

# Apply KMeans clustering
kmeans = KMeans(n_clusters=2)
kmeans.fit(df)

print(f'Cluster centers: {kmeans.cluster_centers_}')

## 8. DBSCAN

The `DBSCAN` function is used for clustering data based on density.

In [None]:
# Example: DBSCAN
from sklearn.cluster import DBSCAN
import pandas as pd

# Sample data
data = {
    'feature1': [1, 2, 3, 4, 5, 8, 8, 25],
    'feature2': [10, 20, 30, 40, 50, 80, 85, 25]
}
df = pd.DataFrame(data)

# Apply DBSCAN clustering
dbscan = DBSCAN(eps=10, min_samples=2)
clusters = dbscan.fit_predict(df)

print(f'Clusters: {clusters}')

## 9. Mean Squared Error (MSE)

The `mean_squared_error` function is used to evaluate the accuracy of a regression model.

In [None]:
# Example: Mean Squared Error
from sklearn.metrics import mean_squared_error

# Sample data
y_true = [3, -0.5, 2, 7]
y_pred = [2.5, 0.0, 2, 8]

# Compute MSE
mse = mean_squared_error(y_true, y_pred)
print(f'Mean Squared Error: {mse}')

## 10. Silhouette Score

The `silhouette_score` function is used to evaluate the quality of clustering.

In [None]:
# Example: Silhouette Score
from sklearn.metrics import silhouette_score
from sklearn.cluster import KMeans
import pandas as pd

# Sample data
data = {
    'feature1': [1, 2, 3, 4, 5],
    'feature2': [10, 20, 30, 40, 50]
}
df = pd.DataFrame(data)

# Apply KMeans clustering
kmeans = KMeans(n_clusters=2)
labels = kmeans.fit_predict(df)

# Compute Silhouette Score
score = silhouette_score(df, labels)
print(f'Silhouette Score: {score}')