# Random Forest-iris

### Key Features:
Random Forest Classifier: A high-performance model trained with 10,000 estimators to estimate feature importance.
Feature Selection: Using SelectFromModel to select and retain only the most important features based on a threshold.
Performance Comparison: Compare model performance before and after feature selection to evaluate the impact of using fewer features.
Cross-validation and Model Tuning: Optional cross-validation and hyperparameter tuning to optimize the model’s performance.


### Libraries Used:
scikit-learn
pandas
numpy



### How to Run:
Load the Iris dataset and split it into training and test sets.
Train a Random Forest Classifier and calculate feature importances.
Select important features based on a set threshold and retrain the model.
Evaluate and compare model performance before and after feature selection.
Feel free to explore different thresholds, hyperparameters, and cross-validation strategies for better accuracy!


In [33]:
#Loading necessary libraries and Iris dataset:
import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn import datasets,tree
from sklearn.feature_selection import SelectFromModel

In [34]:
#Splitting the data into training and test sets:
iris=datasets.load_iris()
features=iris.feature_names
x=iris.data
y=iris.target


In [35]:
features

['sepal length (cm)',
 'sepal width (cm)',
 'petal length (cm)',
 'petal width (cm)']

In [36]:
#Splitting the data into training and test sets:
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.4,random_state=14)

In [37]:
#Training the RandomForestClassifier on all features:
clf = RandomForestClassifier(n_estimators=10000, n_jobs=-1, random_state = 14)

In [38]:

clf.fit(x_train, y_train)

In [39]:
clf.score(x_test, y_test)

0.95

In [40]:
#This part computes the feature importance and presents it in a DataFrame.
feature_importances = pd.DataFrame(clf.feature_importances_, index = features, columns=['importance']).sort_values('importance', ascending=False)
feature_importances

Unnamed: 0,importance
petal width (cm),0.459124
petal length (cm),0.400622
sepal length (cm),0.11811
sepal width (cm),0.022145


In [41]:
#Feature selection using SelectFromModel:
sfm = SelectFromModel(clf, threshold = 0.15)

In [42]:
sfm.fit(x_train, y_train)

In [43]:

X_important_train = sfm.transform(x_train)
X_important_test = sfm.transform(x_test)

In [44]:
X_important_train.shape

(90, 2)

In [45]:
X_important_test.shape

(60, 2)

In [47]:
#Training a new RandomForestClassifier on the selected features:


clf_important = RandomForestClassifier(n_estimators=10000, n_jobs=-1, random_state = 14)

In [48]:
clf_important.fit(X_important_train, y_train)

In [49]:
clf_important.score(X_important_test, y_test)

0.9666666666666667

In [50]:
#Counting the number of estimators
len(clf_important.estimators_)

10000