<a href="https://colab.research.google.com/github/magotronico/DataAnalysis_and_AI/blob/main/data_science_practice/IrisCalssifier.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Clasificador de Especies de Iris

Este codigo hace la clasificacion de 3 species de iris usando de parametros el largo y ancho de sus Petalos y Sepalos.

Este Notebook se centra en la aplicación práctica de las bases de clasificación sin la utilización de una metodología conocida. Lo utilizado fue una metodología empírica con un poco de bases en `weights` y `comparaciones`.

Por las mismas razones, en este ejercicio, no se usan librerias especializadas como pandas o skitlearn para el manejo de base de datos o algoritmos.


**Equipo 2:**
- Dilan González Castañeda A00831905
- André Ulises Zenteno Ruiz A00835044
- Carolina Murillo Guajardo A00834868
- Karim Omar Martínez Bazaldúa  A00832999

## Importancia de cada parametro

Obtencion de pesos para darle un mayor valor al parametro que describe la mayor cantidad de resultados correctos.

In [None]:
import csv
import math
import random

def get_species(df):
    species = []
    for row in df:
        if row[4] not in species:
            species.append(row[4])
    return species

def calculate_mean(values):
    return sum(values) / len(values)

def calculate_std_dev(values, mean):
    variance = sum((x - mean) ** 2 for x in values) / (len(values) - 1)
    return math.sqrt(variance)

def calculate_mean_std(values):
    mean = calculate_mean(values)
    std_dev = calculate_std_dev(values, mean)
    return mean, std_dev

def range_per_species(df, species_list):
    ranges = {}
    for species in species_list:
        species_ranges = {}
        for i in range(4):  # 4 parameters (1: sepal length, 2: sepal width, 3: petal length, 4: petal width)
            values = [float(row[i]) for row in df if row[4] == species]
            mean, std_dev = calculate_mean_std(values)
            species_ranges[f'param_{i+1}'] = (mean - std_dev, mean + std_dev)
        ranges[species] = species_ranges
    return ranges

def evaluate_classification(df, species_ranges):
    results = {species: {f'param_{i+1}': {"correct": 0, "incorrect": 0} for i in range(4)} for species in species_ranges}
    for row in df:
        actual_species = row[4]
        for i in range(4):  # 4 parameters
            param_range = species_ranges[actual_species][f'param_{i+1}']
            if param_range[0] <= float(row[i]) <= param_range[1]:
                results[actual_species][f'param_{i+1}']["correct"] += 1
            else:
                results[actual_species][f'param_{i+1}']["incorrect"] += 1
    return results

def print_results(results):
    for species, params in results.items():
        print(f"Species: {species}")
        for param, counts in params.items():
            print(f"  {param}:")
            print(f"    Correctly classified: {counts['correct']}")
            print(f"    Incorrectly classified: {counts['incorrect']}")
        total_correct = sum(counts['correct'] for counts in params.values())
        total_incorrect = sum(counts['incorrect'] for counts in params.values())
        print(f"  Total Correctly classified: {total_correct}")
        print(f"  Total Incorrectly classified: {total_incorrect}")
        print(f"  Total: {total_correct + total_incorrect}\n")

if __name__ == '__main__':
    df = []

    with open('/content/Iris.csv', newline='') as csvfile:
        spamreader = csv.reader(csvfile, delimiter=',', quotechar='|')
        next(spamreader)  # Skip the header row
        for row in spamreader:
            df.append(row[1:])

    species_list = get_species(df)
    species_ranges = range_per_species(df, species_list)
    results = evaluate_classification(df, species_ranges)
    print_results(results)


Species: Iris-setosa
  param_1:
    Correctly classified: 31
    Incorrectly classified: 19
  param_2:
    Correctly classified: 32
    Incorrectly classified: 18
  param_3:
    Correctly classified: 40
    Incorrectly classified: 10
  param_4:
    Correctly classified: 35
    Incorrectly classified: 15
  Total Correctly classified: 138
  Total Incorrectly classified: 62
  Total: 200

Species: Iris-versicolor
  param_1:
    Correctly classified: 35
    Incorrectly classified: 15
  param_2:
    Correctly classified: 33
    Incorrectly classified: 17
  param_3:
    Correctly classified: 37
    Incorrectly classified: 13
  param_4:
    Correctly classified: 35
    Incorrectly classified: 15
  Total Correctly classified: 140
  Total Incorrectly classified: 60
  Total: 200

Species: Iris-virginica
  param_1:
    Correctly classified: 35
    Incorrectly classified: 15
  param_2:
    Correctly classified: 35
    Incorrectly classified: 15
  param_3:
    Correctly classified: 35
    Incorrectl

Aqui se observa que los parametros con mayor cantidad de clasificados correctos se reptiten entre las species y se ordenan de mayor a menor como:

```
'param_3'
'param_4'
'param_1'
'param_2'
```

Definimos pesos arbitrarios que en tecnicas mas avanzadas de algoritmos tienen un sentido matematico, en este caso la logica para definirlos radica en proporciones a la cantidad de clasificaciones correctas que tuvo cada uno con respecto al total:
```
weights = {
        'param_1': 0.17,
        'param_2': 0.13,
        'param_3': 0.38,
        'param_4': 0.32
    }
```

Definimos un umbral de 65% arbitrario que en el futuro mejorara al aprender mejores tecnicas

# Pruebas de nuestro algoritmo
Para probar nuestro algoritmo, utilizaremos 100 registros al azar de la misma base y se regresa el nivel de certeza así como un csv con 3 columnas extras de:


*   Clasificacion real
*   Clasificacion pronosticada
*   Porcentaje de seguridad del pronostico



In [None]:
def test_random_records(df, species_ranges, num_records=100, umbral=0.65):
    random_records = random.sample(df, min(num_records, len(df)))

    # Define weights
    weights = {
        'param_1': 0.17,
        'param_2': 0.13,
        'param_3': 0.38,
        'param_4': 0.32
    }

    results = []

    # Iterate through random records
    for record in random_records:  # num_records records
        scores = {}

        # Calculate scores for each species
        for species in species_ranges: # 4 species
            weighted_score = 0
            total_weight = sum(weights.values())

            # Iterate through parameters
            for i in range(4):  # 4 parameters
                param_key = f'param_{i+1}'
                param_value = float(record[i])
                param_range = species_ranges[species][param_key]

                # Check if the parameter is within the specified range
                if param_range[0] <= param_value <= param_range[1]:
                    weighted_score += weights[param_key]

            # Calculate the score percentage
            score_percentage = weighted_score / total_weight
            scores[species] = score_percentage

        # Find the species with the highest score
        predicted_species = max(scores, key=scores.get)
        highest_score = scores[predicted_species]

        results.append([record, predicted_species, highest_score])


    # Calculate accuracy
    correct_predictions = sum(1 for result in results if result[0][4] == result[1])
    accuracy = (correct_predictions / len(results)) * 100
    print(f"\nAccuracy: {accuracy:.2f}%")
    print(f"Correct classifications: {correct_predictions} out of {len(results)}\n")

    # Print results
    print("SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,ActualSpecies,PredictedSpecies,Score")
    for result in results:
        print(f"{result[0][0]},{result[0][1]},{result[0][2]},{result[0][3]},{result[0][4]},{result[1]},{result[2]:.2f}")

    with open('results.csv', 'w', newline='') as csvfile:
        writer = csv.writer(csvfile)
        writer.writerow(['SepalLengthCm', 'SepalWidthCm', 'PetalLengthCm', 'PetalWidthCm', 'ActualSpecies', 'PredictedSpecies', 'Score'])
        for result in results:
            writer.writerow([result[0][0], result[0][1], result[0][2], result[0][3], result[0][4], result[1], result[2]])


if __name__ == '__main__':
    df = []

    with open('/content/Iris.csv', newline='') as csvfile:
        spamreader = csv.reader(csvfile, delimiter=',', quotechar='|')
        next(spamreader)  # Skip the header row
        for row in spamreader:
            df.append(row[1:])

    species_list = get_species(df)
    species_ranges = range_per_species(df, species_list)

    # Test random records
    test_random_records(df, species_ranges)


Accuracy: 92.00%
Correct classifications: 92 out of 100

SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,ActualSpecies,PredictedSpecies,Score
6.7,3.3,5.7,2.5,Iris-virginica,Iris-virginica,0.55
5.5,2.4,3.7,1.0,Iris-versicolor,Iris-versicolor,0.17
6.0,2.9,4.5,1.5,Iris-versicolor,Iris-versicolor,1.00
6.4,3.2,4.5,1.5,Iris-versicolor,Iris-versicolor,0.87
4.8,3.4,1.6,0.2,Iris-setosa,Iris-setosa,1.00
5.1,2.5,3.0,1.1,Iris-versicolor,Iris-setosa,0.17
5.1,3.8,1.6,0.2,Iris-setosa,Iris-setosa,0.87
4.5,2.3,1.3,0.3,Iris-setosa,Iris-setosa,0.70
6.2,2.9,4.3,1.3,Iris-versicolor,Iris-versicolor,1.00
6.7,3.1,5.6,2.4,Iris-virginica,Iris-virginica,0.68
6.3,2.8,5.1,1.5,Iris-virginica,Iris-virginica,0.68
5.2,3.5,1.5,0.2,Iris-setosa,Iris-setosa,1.00
5.0,3.5,1.6,0.6,Iris-setosa,Iris-setosa,0.68
4.3,3.0,1.1,0.1,Iris-setosa,Iris-versicolor,0.13
5.6,2.8,4.9,2.0,Iris-virginica,Iris-virginica,0.45
7.2,3.0,5.8,1.6,Iris-virginica,Iris-virginica,0.68
6.3,2.7,4.9,1.8,Iris-virginica,Iris-virginica,0.62
4.9,3.1,1.