## Metadata Exploration Insights

In [73]:
import pandas as pd
metadata = pd.read_csv("data/taxonomy.csv")

In [74]:
metadata.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 206 entries, 0 to 205
Data columns (total 5 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   primary_label    206 non-null    object
 1   inat_taxon_id    206 non-null    int64 
 2   scientific_name  206 non-null    object
 3   common_name      206 non-null    object
 4   class_name       206 non-null    object
dtypes: int64(1), object(4)
memory usage: 8.2+ KB


In [75]:
metadata.shape

(206, 5)

In [76]:
metadata.head()

Unnamed: 0,primary_label,inat_taxon_id,scientific_name,common_name,class_name
0,1139490,1139490,Ragoniella pulchella,Ragoniella pulchella,Insecta
1,1192948,1192948,Oxyprora surinamensis,Oxyprora surinamensis,Insecta
2,1194042,1194042,Copiphora colombiae,Copiphora colombiae,Insecta
3,126247,126247,Leptodactylus insularum,Spotted Foam-nest Frog,Amphibia
4,1346504,1346504,Neoconocephalus brachypterus,Neoconocephalus brachypterus,Insecta


In [77]:
metadata["class_name"].value_counts()

class_name
Aves        146
Amphibia     34
Insecta      17
Mammalia      9
Name: count, dtype: int64

In [78]:
diffs = metadata[metadata["primary_label"].astype(str) != metadata["inat_taxon_id"].astype(str)]
diffs['class_name'].value_counts()

class_name
Aves    146
Name: count, dtype: int64

In [79]:
diffs_name = metadata[metadata["scientific_name"].astype(str) != metadata["common_name"].astype(str)]
diffs_name["class_name"].value_counts()

class_name
Aves        146
Amphibia     34
Mammalia      9
Insecta       3
Name: count, dtype: int64

## Metadata Summary – Key Findings

- Total entries in metadata: **206 species**
- Metadata contains 5 main columns:
  - `primary_label` – unique identifier used in prediction
  - `inat_taxon_id` – iNaturalist reference ID
  - `scientific_name` – Latin binomial species name
  - `common_name` – human-friendly name
  - `class_name` – biological class

### Class Distribution:
- **Aves** (Birds): 146 species
- **Amphibia** (Amphibians): 34 species
- **Insecta** (Insects): 17 species
- **Mammalia** (Mammals): 9 species

### Label Consistency Check:
- `primary_label` and `inat_taxon_id` are **visually similar**
- After converting to `str`, found mismatches for all **146 Aves** entries
- This inconsistency only affects **Bird species**

### Naming Check:
- In **14 species**, `scientific_name` and `common_name` are **identical**
- All 14 of these cases belong to the **Insecta** class

