# Random Forest Classifiers in Scikit-learn

In this notebook, we'll explore how to use Random Forest Classifiers from the scikit-learn library. Random Forests are a popular ensemble learning method that combines multiple decision trees to make predictions.

First, let's import the necessary libraries. We'll use scikit-learn for the Random Forest Classifier and dataset, and matplotlib for visualization.

In [None]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt

Now, let's create a synthetic dataset for classification using scikit-learn's make_classification function.

In [None]:
X, y = make_classification(n_samples=1000, n_features=20, n_informative=2, n_redundant=0, random_state=42)
print(f"Dataset shape: {X.shape}")
print(f"Number of classes: {len(set(y))}")

Let's split our dataset into training and testing sets. This will allow us to evaluate our model's performance on unseen data.

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
print(f"Training set shape: {X_train.shape}")
print(f"Testing set shape: {X_test.shape}")

Now, let's create and train our Random Forest Classifier. We'll use 100 trees in our forest.

In [None]:
rf_classifier = RandomForestClassifier(n_estimators=100, random_state=42)
rf_classifier.fit(X_train, y_train)
print("Random Forest Classifier trained.")

Let's evaluate our model's performance on the test set.

In [None]:
accuracy = rf_classifier.score(X_test, y_test)
print(f"Accuracy on test set: {accuracy:.2f}")

Random Forests can provide feature importance scores. Let's visualize the importance of each feature.

In [None]:
importances = rf_classifier.feature_importances_
plt.bar(range(len(importances)), importances)
plt.title("Feature Importances")
plt.xlabel("Feature Index")
plt.ylabel("Importance")
plt.show()

Finally, let's make predictions on a few samples from our test set and compare them with the true labels.

In [None]:
predictions = rf_classifier.predict(X_test[:5])
print("Predictions:", predictions)
print("True labels:", y_test[:5])