# Ant Community Census – Object‑Oriented CSV Processing

This notebook works with two related CSV tables from the annual ant community census (1977–2009). The first table, `ant_species.csv`, is a reference list that stores taxonomic information and the short species codes used in the field. The second table, `ant_bait.csv`, contains the bait‑pile observations: for each bait station (identified by plot and stake), in a given month and year, the field crew recorded which species visited the bait and how many individuals were counted.

The goal here is not only to read the files, but to model them in an object‑oriented way. Each CSV structure is mapped to a Python class. Every row becomes an instance of the corresponding class, letting us use methods and attributes instead of raw dictionaries. This provides three benefits. First, it makes the code easier to understand because each concept in the data (a species record or a bait observation) has a clear home. Second, it allows us to attach behavior to the data, such as printing a nicely formatted label. Third, it enables safer downstream analyses, because types and invariants (e.g., abundance is an integer; species is a valid `AntSpecies`) are enforced in one place.

The work is divided into small, modular steps. Each step appears in its own cell so you can execute and inspect them independently:

1. Import libraries.
2. Define the `AntSpecies` class and its constructor/printing logic.
3. Define the `AntBait` class and its constructor/printing logic.
4. Write loader functions for each CSV to create lists of objects.
5. Replace the bait `species` code with a real `AntSpecies` object.
6. Write a function to compute average abundance for a chosen species code.
7. Demonstrate that everything works on the provided data.

Throughout the notebook, markdown explanations describe what each code block does and why. The explanations are detailed so that someone new to Python or to object‑oriented data modeling can follow the reasoning step by step.

In [None]:
# Step 1: Imports

import csv
from pathlib import Path
from typing import Dict, Iterable, List, Optional, Union

## Step 2 – Modeling `ant_species.csv` as a Class

The file `ant_species.csv` is essentially a lookup table. Each line describes a species that might appear in the bait census, and the most important key is `speciescode`. Field observers used these codes in their notebooks because they are short and standardized (for example, `cono bico`). The remaining columns store taxonomic context (genus, species epithet, tribe, subfamily) and sometimes alternative names or identification notes.

To represent this structure in Python, we create an `AntSpecies` class with one attribute per CSV column. The constructor (`__init__`) receives all attributes as parameters so that an instance can be built directly from a CSV row. Even if some values are missing in the raw file (for example, many rows have no `altgenus`), the constructor still accepts them; we normalize empty strings to `None` when reading.

The class also needs a method that allows easy printing of a species object. In Python, the conventional way to support “pretty printing” is to implement `__str__`. When `print(species_obj)` is called, Python uses `__str__` to produce a human‑readable label. The requirement asks for the format “genus species”, e.g., `Camponotus festinatus`. That is exactly what `__str__` returns here. If any part is missing, the method falls back to the species code so that printing still yields something informative.

We implement the constructor explicitly (instead of relying on an automatic generator) to satisfy the assignment requirement that all attributes are received as parameters. The class stays lightweight: it is a clean container for the data plus a small amount of behavior (string formatting). Later, bait observations will reference these objects instead of raw strings, giving us a strongly connected dataset.

In [None]:
# Step 2: AntSpecies class

class AntSpecies:
    def __init__(
        self,
        speciescode: str,
        genus: str,
        altgenus: Optional[str],
        species: str,
        altspecies: Optional[str],
        tribe: Optional[str],
        subfamily: Optional[str],
        IDissues: Optional[str],
    ) -> None:
        self.speciescode = speciescode
        self.genus = genus
        self.altgenus = altgenus
        self.species = species
        self.altspecies = altspecies
        self.tribe = tribe
        self.subfamily = subfamily
        self.IDissues = IDissues

    def __str__(self) -> str:
        # Print as "Genus species" (e.g., Camponotus festinatus)
        if self.genus and self.species:
            return f"{self.genus} {self.species}".strip()
        return self.speciescode

    def __repr__(self) -> str:
        return f"AntSpecies({self.speciescode!r})"

## Step 3 – Modeling `ant_bait.csv` as a Class

The bait dataset records observations at bait piles left for ants to forage. Each row in `ant_bait.csv` corresponds to one species detected at one bait station in a given survey period. The columns are: `month`, `year`, `plot`, `stake`, `species`, and `abundance`. The first four locate the bait in space and time; the last two describe which ant species was present and how many individuals were counted.

We map this structure to an `AntBait` class. Again, the constructor receives all attributes as parameters. We convert numeric fields (`year`, `plot`, `stake`, `abundance`) into integers so that they behave correctly in calculations. The `month` field is left as a string because it is categorical.

The printing requirement for bait objects is slightly different. We need a label like “species id – month, year: abundance”, for example `cono bico - July, 2009: 2`. The key detail is that “species id” refers to the field code, not the full Latin binomial. This matters because bait data uses the codes, and many ecological analyses group by those codes. Therefore, `__str__` checks whether `self.species` is an `AntSpecies` object (after we perform the replacement step). If so, it prints `self.species.speciescode`. If for some reason the mapping is missing, it falls back to the raw string.

At this moment we allow the `species` attribute to be either a string or an `AntSpecies`. This is a pragmatic choice: it lets the class be constructed directly from the CSV before we link species codes to their reference objects. Right after loading, we will replace the string with the actual `AntSpecies` instance using a lookup dictionary. Once that replacement is done, every bait observation has a rich object reference, allowing us to access taxonomic details when needed.

In [None]:
# Step 3: AntBait class

class AntBait:
    def __init__(
        self,
        month: str,
        year: Union[int, str],
        plot: Union[int, str],
        stake: Union[int, str],
        species: Union[str, AntSpecies],
        abundance: Union[int, str],
    ) -> None:
        self.month = month
        self.year = int(year)
        self.plot = int(plot)
        self.stake = int(stake)
        self.species = species
        self.abundance = int(abundance)

    def __str__(self) -> str:
        # Print as "species id - month, year: abundance"
        if isinstance(self.species, AntSpecies):
            species_id = self.species.speciescode
        else:
            species_id = str(self.species)
        return f"{species_id} - {self.month}, {self.year}: {self.abundance}"

    def __repr__(self) -> str:
        return (
            "AntBait(" 
            f"{self.month!r}, {self.year}, {self.plot}, {self.stake}, {self.species!r}, {self.abundance})"
        )

## Step 4 – Loader Functions for Each CSV

With the data models in place, the next step is to read each CSV and turn rows into objects. The assignment allows the loaders to be either methods on the classes or standalone functions. Here we choose standalone functions because they keep I/O concerns separate from the core data models. This makes the classes easier to reuse in other contexts.

### Loading species

The `load_species` function takes a filename, opens the file with Python’s built‑in `csv.DictReader`, and iterates over rows. Each row is a dictionary mapping column names to strings. We pass those strings into the `AntSpecies` constructor, cleaning whitespace and converting empty strings to `None` for optional fields. Every `AntSpecies` instance is appended to a list called `species_list`. At the end, the list is returned.

### Loading bait observations

The bait loader has one extra responsibility: linking each bait record to its species description. The CSV contains only the short species code, so we pass a `species_lookup` dictionary into `load_baits`. This dictionary maps species codes to `AntSpecies` objects (it is built immediately after loading the species file). While iterating through bait rows, we look up the code in the dictionary. If a match exists, we pass the object into the `AntBait` constructor. This means the resulting `AntBait.species` attribute is already the correct `AntSpecies` object—fulfilling the “replace species attribute” requirement in one step.

If a code is missing from the lookup, we keep the raw code string. The fallback avoids crashes and makes problems visible: if you print such an observation, you will see the unknown code. In a production project we might also log a warning, but for this educational task the fallback is sufficient.

The overall pattern is consistent for both files: read a line, create an object, add it to a list, and return the list.

In [None]:
# Step 4: CSV loader functions

def load_species(filename: Union[str, Path]) -> List[AntSpecies]:
    '''Read ant_species.csv and return a list of AntSpecies objects.'''
    species_list: List[AntSpecies] = []
    filename = Path(filename)

    with filename.open(newline='', encoding='utf-8') as f:
        reader = csv.DictReader(f)
        for row in reader:
            sp = AntSpecies(
                speciescode=row['speciescode'].strip(),
                genus=row['genus'].strip(),
                altgenus=(row.get('altgenus') or '').strip() or None,
                species=row['species'].strip(),
                altspecies=(row.get('altspecies') or '').strip() or None,
                tribe=(row.get('tribe') or '').strip() or None,
                subfamily=(row.get('subfamily') or '').strip() or None,
                IDissues=(row.get('IDissues') or '').strip() or None,
            )
            species_list.append(sp)

    return species_list


def load_baits(
    filename: Union[str, Path],
    species_lookup: Dict[str, AntSpecies],
) -> List[AntBait]:
    '''Read ant_bait.csv and return a list of AntBait objects.

    The species field is replaced by the AntSpecies object from species_lookup.
    '''
    bait_list: List[AntBait] = []
    filename = Path(filename)

    with filename.open(newline='', encoding='utf-8') as f:
        reader = csv.DictReader(f)
        for row in reader:
            code = row['species'].strip()
            species_obj = species_lookup.get(code, code)  # fallback to raw code

            bait = AntBait(
                month=row['month'].strip(),
                year=row['year'],
                plot=row['plot'],
                stake=row['stake'],
                species=species_obj,
                abundance=row['abundance'],
            )
            bait_list.append(bait)

    return bait_list

## Step 5 – Loading the Provided Data

Now we use the loader functions on the actual files. The workflow is:

1. Load the species reference table.
2. Build a dictionary mapping codes to `AntSpecies` objects.
3. Load the bait observations using that dictionary.

After these steps, `species_list` contains one object per species record, and `bait_list` contains one object per bait observation. Importantly, every bait object’s `species` attribute points to the correct `AntSpecies` instance. We can verify this in two simple ways. First, printing a species should display its Latin name. Second, printing a bait should display its code, month, year, and abundance. Finally, to show that the linkage works, we can take a bait’s species and print its genus and species epithet. These small sanity checks build confidence that later analysis will be using well‑structured objects rather than brittle strings.

The data set is large, so in the demo we only print the first few items from each list. The point is to see formatting and types, not to review the entire census.

In [None]:
# Step 5: Load data and build lookup

species_list = load_species('ant_species.csv')
species_by_code = {sp.speciescode: sp for sp in species_list}

bait_list = load_baits('ant_bait.csv', species_by_code)

# Quick sanity checks
print('First 5 species objects:')
for sp in species_list[:5]:
    print('  ', sp)

print()  # blank line between sections
print('First 5 bait objects:')
for bait in bait_list[:5]:
    print('  ', bait)

# Demonstrate that bait.species is an AntSpecies object
example_bait = bait_list[0]
print()
print('Example bait species object:', example_bait.species)
print('Genus:', example_bait.species.genus)

## Step 6 – Computing Average Abundance for a Species

Ecological analyses often ask how common a species is across samples. A simple summary statistic is the average abundance: across all bait observations for one species, what is the mean number of individuals recorded? Because each bait pile can contain multiple species, and because abundance varies across space and time, this metric gives a first‑pass view of dominance or rarity.

We implement a function `average_abundance` that accepts (1) a collection of baits and (2) the target species code. The collection can be a list, tuple, or dictionary. To support all three types, we normalize the input: if it is a dictionary, we iterate over its values; otherwise we iterate over it directly. This design follows Python’s “duck typing” philosophy—anything iterable of `AntBait` objects will work.

Inside the loop, we determine the species code for each bait. If `bait.species` is an `AntSpecies` object, the code is `bait.species.speciescode`. If it is a raw string (fallback case), the code is the string itself. When the code matches the target, we add the abundance to a running total and increment a counter. Finally, we compute `total / count`. If the species is absent from the collection, the function returns `nan` (not a number). Returning `nan` is useful because it signals “no data” without crashing; many numerical libraries can handle it gracefully.

The example below computes the mean abundance for a few common species codes. You can replace the code with any other string from the species table to explore different taxa.

In [None]:
# Step 6: Average abundance function

def average_abundance(
    baits: Union[Iterable[AntBait], Dict[object, AntBait]],
    species_code: str,
) -> float:
    '''Return the average abundance for a given species code.'''
    if isinstance(baits, dict):
        iterable = baits.values()
    else:
        iterable = baits

    total = 0
    count = 0

    for bait in iterable:
        code = (
            bait.species.speciescode
            if isinstance(bait.species, AntSpecies)
            else str(bait.species)
        )
        if code == species_code:
            total += bait.abundance
            count += 1

    return total / count if count else float('nan')


# Demo: average abundance for a couple of species
for code in ['cono bico', 'novo cock', 'phei sita']:
    avg = average_abundance(bait_list, code)
    print(f"Average abundance for {code}: {avg:.2f}")

## Step 7 – Optional Extensions and Checks

Because the bait observations now carry a full `AntSpecies` object, we can build richer ecological summaries with very little extra code. For example, you could compute average abundance not only by species code but also by higher taxonomic ranks. A simple loop over `bait_list` could group observations by `bait.species.subfamily` or `bait.species.tribe`, revealing whether certain lineages dominate the community in particular years.

Another natural extension is to study temporal change. The census spans more than three decades, so one might want to plot yearly abundance trends for a focal species. With the current classes, that becomes a matter of filtering baits by `speciescode`, then aggregating `abundance` by `year`. You could also compare pre‑ and post‑monsoon years or detect long‑term increases and declines.

Spatial patterns are equally accessible. Each bait record includes `plot` and `stake`, which act as coordinates within the study area. Grouping by plot could show whether some sites consistently support higher richness or abundance. Combining plot‑level summaries with the species metadata may help explain habitat preferences or competitive dynamics.

Finally, it is good practice to validate object mappings. In an exploratory setting you might check that every bait species code exists in `ant_species.csv`, or count how many unmapped codes were kept as strings. Adding such checks guards against silent data issues and makes the analysis reproducible. The current notebook already handles missing codes gracefully, so you can extend it safely without worrying that a rare typo will break your workflow.

## Conclusion

This notebook demonstrates a clean, object‑oriented workflow for working with the ant community census data. We mapped each CSV schema to a dedicated Python class, wrote explicit constructors that accept every attribute, and implemented printing behavior tailored to the requirements. Then we built loader functions that read CSV rows, instantiate objects, store them in lists, and return those lists. By passing a species lookup into the bait loader, we replaced the raw species codes in `ant_bait.csv` with full `AntSpecies` objects, tightly linking observation data to taxonomic metadata. Finally, we added a reusable function to compute average abundance for any species code across an arbitrary collection of baits.

With these pieces in place, many further analyses become straightforward: filtering baits by subfamily, plotting abundance over time, or studying co‑occurrence patterns among species. The key takeaway is that thoughtful data modeling at the start makes later ecological questions easier and safer to answer.