# Random Forest Algorithm - Iris Dataset
## Classification

## Step: Loading the Iris Dataset
The Iris dataset is a well-known dataset in the machine learning community. It consists of 150 samples of iris flowers, with 50 samples of each of three different species

In [2]:
from sklearn.datasets import load_iris
iris_df = load_iris()
print("iris dataset is {}".format(iris_df.DESCR))
print("iris data size is {}".format(iris_df.data.shape))

ModuleNotFoundError: No module named 'sklearn'

In [2]:
X, y = iris_df.data, iris_df.target
print("iris target size is {}".format(iris_df.target.shape))

iris target size is (150,)


__Remarks__:
1. `iris_df.data` accesses the features (input variables) of the Iris dataset, which are the measurements of the flower's attributes (like sepal length, sepal width, petal length, and petal width). These are stored in the data attribute of the `iris_df` object.
2. `iris_df.target` accesses the target labels (output variable) of the Iris dataset, which correspond to the species of the flower (setosa, versicolor, or virginica). These are stored in the target attribute.
3. Thus, the line `X, y = iris_df.data, iris_df.target` assigns:
   1. `X`: the feature matrix containing the flower measurements (input data).
   2. `y`: the target array containing the species labels (output data).

## Step: Splitting Data into Training and Testing Sets
It is important to split the data into training and testing sets to evaluate the performance of the machine learning model. The following code can be used to split the data

In [3]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

__Remarks__:
1. `test_size=0.3` specifies that 30% of the dataset will be used for testing, while the remaining 70% will be used for training.
2. `random_state=42` is a seed for the random number generator to ensure reproducibility. Setting random_state to a fixed number (like 42) ensures that the same split will occur every time the code is run.

## Step : Building Random Forest Classifier

In [4]:
from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier(n_estimators=100)
model.fit(X_train, y_train)

In [5]:
y_pred = model.predict(X_test)

## Step : Evaluating Model Performance

In [6]:
from sklearn.metrics import accuracy_score
print(f'Accuracy: {accuracy_score(y_test, y_pred)}')

Accuracy: 1.0
