# Additional Commonly Used Functions in scikit-learn

In this notebook, we will cover additional commonly used functions in the scikit-learn library, excluding those already covered in the previous notebook.

## 1. LabelEncoder

The `LabelEncoder` function is used to encode target labels with value between 0 and n_classes-1.

In [None]:
# Example: LabelEncoder
from sklearn.preprocessing import LabelEncoder

# Sample data
data = {'category': ['A', 'B', 'A', 'C']}
df = pd.DataFrame(data)

# Encode target labels
encoder = LabelEncoder()
encoded_labels = encoder.fit_transform(df['category'])

print(f'Encoded labels: {encoded_labels}')

## 2. MinMaxScaler

The `MinMaxScaler` function is used to scale features to a given range, usually between 0 and 1.

In [None]:
# Example: MinMaxScaler
from sklearn.preprocessing import MinMaxScaler

# Sample data
data = {
    'feature1': [1, 2, 3, 4, 5],
    'feature2': [10, 20, 30, 40, 50]
}
df = pd.DataFrame(data)

# Scale features
scaler = MinMaxScaler()
scaled_data = scaler.fit_transform(df)

print(f'Scaled data: {scaled_data}')

## 3. PolynomialFeatures

The `PolynomialFeatures` function is used to generate polynomial and interaction features.

In [None]:
# Example: PolynomialFeatures
from sklearn.preprocessing import PolynomialFeatures

# Sample data
data = {'feature': [1, 2, 3]}
df = pd.DataFrame(data)

# Generate polynomial features
poly = PolynomialFeatures(degree=2)
poly_features = poly.fit_transform(df)

print(f'Polynomial features: {poly_features}')

## 4. TfidfVectorizer

The `TfidfVectorizer` function is used to convert a collection of raw documents to a matrix of TF-IDF features.

In [None]:
# Example: TfidfVectorizer
from sklearn.feature_extraction.text import TfidfVectorizer

# Sample data
documents = ["This is the first document.", "This is the second document."]

# Convert to TF-IDF features
vectorizer = TfidfVectorizer()
tfidf_matrix = vectorizer.fit_transform(documents)

print(f'TF-IDF matrix: {tfidf_matrix.toarray()}')

## 5. SelectKBest

The `SelectKBest` function is used for feature selection by selecting the k best features based on a scoring function.

In [None]:
# Example: SelectKBest
from sklearn.feature_selection import SelectKBest, f_classif

# Sample data
data = {
    'feature1': [1, 2, 3, 4, 5],
    'feature2': [2, 4, 6, 8, 10],
    'target': [0, 0, 1, 1, 1]
}
df = pd.DataFrame(data)

# Select the best features
X = df[['feature1', 'feature2']]
y = df['target']
selector = SelectKBest(score_func=f_classif, k=1)
best_features = selector.fit_transform(X, y)

print(f'Best features: {best_features}')

## 6. StratifiedKFold

The `StratifiedKFold` function is used for cross-validation ensuring that each fold has the same proportion of class labels.

In [None]:
# Example: StratifiedKFold
from sklearn.model_selection import StratifiedKFold

# Sample data
data = {
    'feature1': [1, 2, 3, 4, 5, 6],
    'feature2': [2, 4, 6, 8, 10, 12],
    'target': [0, 0, 1, 1, 0, 1]
}
df = pd.DataFrame(data)

# Stratified K-Fold cross-validation
X = df[['feature1', 'feature2']]
y = df['target']
skf = StratifiedKFold(n_splits=3)
for train_index, test_index in skf.split(X, y):
    X_train, X_test = X.iloc[train_index], X.iloc[test_index]
    y_train, y_test = y.iloc[train_index], y.iloc[test_index]
    print(f'Train indices: {train_index}, Test indices: {test_index}')

## 7. Mean Absolute Error (MAE)

The `mean_absolute_error` function is used to evaluate the accuracy of a regression model.

In [None]:
# Example: Mean Absolute Error
from sklearn.metrics import mean_absolute_error

# Sample data
y_true = [3, -0.5, 2, 7]
y_pred = [2.5, 0.0, 2, 8]

# Compute MAE
mae = mean_absolute_error(y_true, y_pred)
print(f'Mean Absolute Error: {mae}')

## 8. ROC Curve

The `roc_curve` function is used to compute the Receiver Operating Characteristic (ROC) curve.

In [None]:
# Example: ROC Curve
from sklearn.metrics import roc_curve

# Sample data
y_true = [0, 0, 1, 1]
y_scores = [0.1, 0.4, 0.35, 0.8]

# Compute ROC curve
fpr, tpr, thresholds = roc_curve(y_true, y_scores)
print(f'FPR: {fpr}, TPR: {tpr}, Thresholds: {thresholds}')