# Iris Dataset Encoding Script

This script allows us to preprocess the Iris dataset by encoding the 'species' column into numerical values. The dataset is loaded from a CSV file, and the 'species' column is encoded using the 'target' column, which stores numerical representations of the species. The encoded data is then saved into a new CSV file.

## Features

1. **Loading the Dataset**  
   We load the Iris dataset using the `pandas` library. The dataset should include two key columns:
   - `species`: Contains the names of the species in the dataset.
   - `target`: Contains numerical codes representing species. This column will be used to encode the species names.

2. **Checking for Target Column**  
   The script checks if the `target` column exists in the dataset. If it is found, we proceed with the encoding. If it is not present, the script notifies us that the column is missing.

3. **Label Encoding**  
   We perform label encoding, which converts the species names in the `species` column into numerical codes based on the values in the `target` column. Once the encoding is completed, the `target` column is no longer needed and is dropped from the dataset.

4. **Saving the Encoded Dataset**  
   After encoding, the updated dataset is saved as `encoded_iris.csv`. The file will contain all the original columns, except the `target` column, and the `species` column will now hold the encoded numerical values.

5. **Error Handling**  
   The script includes error handling for cases where the file cannot be saved, or the `target` column is missing. If any issues arise during saving, the error is caught and printed for debugging.

## Steps to Run the Script

1. Ensure the dataset includes the required columns (`species` and `target`).
2. Replace `"path/to/iris_dataset.csv"` with the actual path to the dataset in the script.
3. Run the script, and the output will display the first few rows of the original and encoded datasets.
4. The encoded dataset will be saved as `encoded_iris.csv`.

## Example Output

### Original Dataset (Before Encoding):

| sepal_length | sepal_width | petal_length | petal_width | species    | target |
|--------------|-------------|--------------|-------------|------------|--------|
| 5.1          | 3.5         | 1.4          | 0.2         | setosa     | 0      |
| 4.9          | 3.0         | 1.4          | 0.2         | setosa     | 0      |

### Encoded Dataset (After Encoding):

| sepal_length | sepal_width | petal_length | petal_width | species |
|--------------|-------------|--------------|-------------|---------|
| 5.1          | 3.5         | 1.4          | 0.2         | 0       |
| 4.9          | 3.0         | 1.4          | 0.2         | 0       |

## Requirements

- **pandas**: Ensure you have the `pandas` library installed by running the following command:

```bash
  pip install pandas
```

In [3]:
import pandas as pd

# Path to the Iris dataset CSV file
# Ensure the dataset has columns like 'species' and 'target' for encoding to proceed
iris_data = pd.read_csv("iris_dataset.csv")

# Display basic information about the dataset
print("Original Iris Dataset:\n")
print(iris_data.head())  # Display the first five rows of the dataset
print("\nColumns in the dataset:", iris_data.columns.tolist())  # Show all column names

# Check if the 'target' column exists
# The 'target' column typically stores numerical representations of species in datasets
if 'target' in iris_data.columns:
    # Encode the 'species' column using label encoding from the 'target' column
    # This converts species names into integer codes based on 'target' values
    iris_data['species'] = iris_data['target'].astype('category').cat.codes

    # Remove the 'target' column as it is no longer needed after encoding
    iris_data.drop(columns=['target'], axis=1, inplace=True)

    # Save the newly encoded dataset into a new CSV file
    try:
        # 'encoded_iris.csv' is the new file that will contain the updated dataset
        iris_data.to_csv("encoded_iris.csv", index=False)
        print("\nEncoded Iris Dataset saved successfully.")
    except Exception as e:
        # In case there's an error saving the file, we catch and display the error message
        print(f"Error saving the file: {e}")

    # Display the first few rows of the newly encoded dataset
    print("\nEncoded Iris Dataset:\n")
    print(iris_data.head())
else:
    # If the 'target' column doesn't exist, print a message
    print("The 'target' column does not exist in the dataset.")


Original Iris Dataset:

   sepal length (cm)  sepal width (cm)  petal length (cm)  petal width (cm)  \
0                5.1               3.5                1.4               0.2   
1                4.9               3.0                1.4               0.2   
2                4.7               3.2                1.3               0.2   
3                4.6               3.1                1.5               0.2   
4                5.0               3.6                1.4               0.2   

        target  
0  Iris-setosa  
1  Iris-setosa  
2  Iris-setosa  
3  Iris-setosa  
4  Iris-setosa  

Columns in the dataset: ['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)', 'target']

Encoded Iris Dataset saved successfully.

Encoded Iris Dataset:

   sepal length (cm)  sepal width (cm)  petal length (cm)  petal width (cm)  \
0                5.1               3.5                1.4               0.2   
1                4.9               3.0                1.4