### Classifying iris flowers into three species based on their features

This code builds a simple machine learning model to classify iris flowers into one of three species based on their features. 

The Iris dataset is a well-known dataset in machine learning, containing measurements of four features (sepal length, sepal width, petal length, and petal width) of iris flowers from three different species: Setosa, Versicolor, and Virginica.

Here's a breakdown of what the code does:

1. It imports necessary libraries, including scikit-learn (a popular machine learning library) for dataset loading, data splitting, model training, and accuracy evaluation.

2. It loads the Iris dataset from scikit-learn using `load_iris()`.

3. It splits the dataset into features (X) and target labels (y). `X` contains the feature measurements, and `y` contains the target species labels (0 for Setosa, 1 for Versicolor, and 2 for Virginica).

4. It splits the data into training and testing sets using the `train_test_split` function. This is done to evaluate the model's performance on unseen data.

5. It creates a machine learning model, in this case, a Random Forest Classifier, and trains the model using the training data.

6. It uses the trained model to make predictions on the testing data.

7. It calculates the accuracy of the model by comparing the predicted labels to the actual labels in the testing data and prints the accuracy score.

The primary goal of this code is to demonstrate a basic example of a machine learning classification task.

In [2]:
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

In [3]:
iris = datasets.load_iris()

###### The Iris dataset is a well-known dataset in machine learning and statistics. It was introduced by the British biologist and statistician Ronald A. Fisher in 1936 and has become a common dataset for teaching and practicing various machine learning and data analysis techniques.

*The Iris dataset consists of the following components:*

1. **Features**:
   - Sepal Length (in centimeters)
   - Sepal Width (in centimeters)
   - Petal Length (in centimeters)
   - Petal Width (in centimeters)

2. **Target**:
   - Species
     - Setosa
     - Versicolor
     - Virginica

The dataset contains a total of 150 samples (iris flowers), with each species having 50 samples. The features represent measurements of the sepal and petal of each iris flower, and the target represents the species of the iris flower.

In [6]:
from sklearn.datasets import load_iris

iris = load_iris()

# Accessing the first 10 rows of the feature data
feature_data = iris.data[:10]

# Accessing the corresponding target labels
target_labels = iris.target[:10]

print("Feature data (first 10 rows):")
print(feature_data)

print("Target labels (first 10 elements):")
print(target_labels)


Feature data (first 10 rows):
[[5.1 3.5 1.4 0.2]
 [4.9 3.  1.4 0.2]
 [4.7 3.2 1.3 0.2]
 [4.6 3.1 1.5 0.2]
 [5.  3.6 1.4 0.2]
 [5.4 3.9 1.7 0.4]
 [4.6 3.4 1.4 0.3]
 [5.  3.4 1.5 0.2]
 [4.4 2.9 1.4 0.2]
 [4.9 3.1 1.5 0.1]]
Target labels (first 10 elements):
[0 0 0 0 0 0 0 0 0 0]


In [7]:
X, y = iris.data, iris.target

In [8]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [9]:
model = RandomForestClassifier()
model.fit(X_train, y_train)

*Feature Importances: the feature importances calculated by the Random Forest model. Feature importances show which features has the most impact on the model's decision-making.*

In [11]:
# List of feature names 
feature_names = ["sepal_length", "sepal_width", "petal_length", "petal_width"]

feature_importances = model.feature_importances_
print("Feature Importances:")
for feature, importance in zip(feature_names, feature_importances):
    print(f"{feature}: {importance}")


Feature Importances:
sepal_length: 0.080221304519472
sepal_width: 0.03229055951394592
petal_length: 0.43326923749471485
petal_width: 0.4542188984718673


*Number of Estimators (Trees): the number of decision trees in the Random Forest ensemble.
Displaying the number of estimators (trees) in a Random Forest ensemble helps in understanding the model's complexity, performance, and generalization capabilities. It allows to optimize the trade-off between model complexity and resource utilization and experiment with different ensemble sizes to find the best balance for the specific machine learning task.*

In [12]:
n_estimators = model.n_estimators
print(f"Number of Estimators (Trees): {n_estimators}")

Number of Estimators (Trees): 100


*Model Accuracy: the accuracy of the model on your test data*

In [13]:
accuracy = model.score(X_test, y_test)
print(f"Model Accuracy: {accuracy}")

Model Accuracy: 1.0


*Model Parameters: the parameters that were used to initialize the model. This includes parameters like n_estimators, max_depth, and more*

In [14]:
model_parameters = model.get_params()
print("Model Parameters:")
for param, value in model_parameters.items():
    print(f"{param}: {value}")

Model Parameters:
bootstrap: True
ccp_alpha: 0.0
class_weight: None
criterion: gini
max_depth: None
max_features: sqrt
max_leaf_nodes: None
max_samples: None
min_impurity_decrease: 0.0
min_samples_leaf: 1
min_samples_split: 2
min_weight_fraction_leaf: 0.0
n_estimators: 100
n_jobs: None
oob_score: False
random_state: None
verbose: 0
warm_start: False


In [15]:
predictions = model.predict(X_test)

In [16]:
accuracy = accuracy_score(y_test, predictions)
print(f'Accuracy: {accuracy}')

Accuracy: 1.0
