# Archaeological Artifact Analysis: Morphometric and Spatial Study

## Introduction

This notebook presents a comprehensive quantitative analysis of an archaeological assemblage recovered from systematic excavation. The dataset comprises **150 artifacts** from a multi-component site, documenting material culture across multiple temporal periods.

### Dataset Description

The assemblage includes diverse artifact types recovered through controlled stratigraphic excavation. Each artifact has been catalogued with precise **provenance data** (horizontal and vertical position), detailed **morphometric measurements**, and condition assessments following standard archaeological recording protocols.

### Archaeological Methods

This analysis employs several core archaeological approaches:

- **Stratigraphic Analysis**: Application of the Law of Superposition, where artifacts from deeper contexts represent older cultural deposits
- **Morphometric Analysis**: Quantitative measurement of artifact form, including linear dimensions and weight for typological classification
- **Type Classification**: Systematic grouping of artifacts based on shared morphological attributes reflecting function and manufacturing tradition
- **Spatial Analysis**: Examination of horizontal artifact distribution to identify activity areas and site structure
- **Taphonomic Assessment**: Evaluation of preservation patterns and post-depositional processes affecting the assemblage
- **Assemblage Analysis**: Study of artifacts as integrated cultural groups reflecting past human behavior and technological systems

### Research Objectives

1. Characterize artifact morphology and identify type-specific patterns
2. Examine temporal trends through stratigraphic analysis
3. Identify spatial patterning indicating functional areas
4. Assess preservation and site formation processes
5. Refine typological classifications using quantitative metrics

## Setup and Data Loading

Import required libraries for archaeological data analysis, statistical computation, and visualization.

In [None]:
# Import core libraries
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
from scipy import stats
from sklearn.cluster import KMeans
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler

# Configure visualization settings
plt.style.use("seaborn-v0_8-darkgrid")
sns.set_palette("colorblind")
plt.rcParams["figure.figsize"] = (12, 6)
plt.rcParams["font.size"] = 10

# Display options for pandas
pd.set_option("display.max_columns", None)
pd.set_option("display.precision", 2)

print("Archaeological analysis environment initialized successfully")

## Load and Explore Archaeological Dataset

Load the artifact catalog and perform initial exploration to understand the assemblage composition and data quality.

In [None]:
# Load artifact catalog
df = pd.read_csv("artifacts.csv")

print("=" * 80)
print("ARTIFACT ASSEMBLAGE OVERVIEW")
print("=" * 80)
print(f"\nTotal artifacts catalogued: {len(df)}")
print(f"\nDataset dimensions: {df.shape[0]} artifacts x {df.shape[1]} attributes")
print("\n" + "=" * 80)

In [None]:
# Display sample records
print("\nSample Artifact Records:")
print("=" * 80)
df.head(10)

In [None]:
# Dataset structure and data types
print("\nDataset Structure:")
print("=" * 80)
df.info()

In [None]:
# Check for missing data
print("\nData Completeness Assessment:")
print("=" * 80)
missing = df.isnull().sum()
if missing.sum() > 0:
    print("\nMissing values by attribute:")
    print(missing[missing > 0])
    print(f"\nTotal missing values: {missing.sum()}")
else:
    print("\nNo missing values detected - complete dataset")

In [None]:
# Assemblage composition by artifact type
print("\nAssemblage Composition by Artifact Type:")
print("=" * 80)
type_counts = df["artifact_type"].value_counts()
print(type_counts)
print(f"\nNumber of artifact types: {df['artifact_type'].nunique()}")

In [None]:
# Visualize assemblage composition
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Artifact type distribution
type_counts.plot(kind="bar", ax=axes[0], color="steelblue")
axes[0].set_title("Assemblage Composition by Artifact Type", fontsize=14, fontweight="bold")
axes[0].set_xlabel("Artifact Type", fontsize=12)
axes[0].set_ylabel("Frequency (n)", fontsize=12)
axes[0].tick_params(axis="x", rotation=45)
axes[0].grid(axis="y", alpha=0.3)

# Material distribution
material_counts = df["material"].value_counts()
axes[1].pie(material_counts.values, labels=material_counts.index, autopct="%1.1f%%", startangle=90)
axes[1].set_title("Material Composition", fontsize=14, fontweight="bold")

plt.tight_layout()
plt.show()

In [None]:
# Descriptive statistics for morphometric variables
print("\nMorphometric Summary Statistics:")
print("=" * 80)
morphometric_vars = ["length_cm", "width_cm", "thickness_cm", "weight_g"]
df[morphometric_vars].describe()

In [None]:
# Statistics by artifact type
print("\nMorphometric Statistics by Artifact Type:")
print("=" * 80)
print("\nMean dimensions and weight by type:")
type_stats = df.groupby("artifact_type")[morphometric_vars].mean().round(2)
print(type_stats)

print("\n" + "=" * 80)
print("\nStandard deviation by type:")
type_std = df.groupby("artifact_type")[morphometric_vars].std().round(2)
print(type_std)

## Morphometric Analysis

Detailed quantitative analysis of artifact dimensions and weight. Morphometric analysis is fundamental to archaeological typology, enabling objective classification and comparison of material culture.

This section examines:
- Size distributions across artifact types
- Metric variability within types
- Standardization and reduction intensity
- Size-weight relationships

In [None]:
# Distribution of key morphometric variables
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# Length distribution
axes[0, 0].hist(df["length_cm"], bins=30, color="steelblue", edgecolor="black", alpha=0.7)
axes[0, 0].axvline(
    df["length_cm"].mean(),
    color="red",
    linestyle="--",
    linewidth=2,
    label=f"Mean: {df['length_cm'].mean():.2f} cm",
)
axes[0, 0].axvline(
    df["length_cm"].median(),
    color="orange",
    linestyle="--",
    linewidth=2,
    label=f"Median: {df['length_cm'].median():.2f} cm",
)
axes[0, 0].set_title("Length Distribution", fontsize=12, fontweight="bold")
axes[0, 0].set_xlabel("Length (cm)", fontsize=10)
axes[0, 0].set_ylabel("Frequency", fontsize=10)
axes[0, 0].legend()
axes[0, 0].grid(alpha=0.3)

# Width distribution
axes[0, 1].hist(df["width_cm"], bins=30, color="darkgreen", edgecolor="black", alpha=0.7)
axes[0, 1].axvline(
    df["width_cm"].mean(),
    color="red",
    linestyle="--",
    linewidth=2,
    label=f"Mean: {df['width_cm'].mean():.2f} cm",
)
axes[0, 1].axvline(
    df["width_cm"].median(),
    color="orange",
    linestyle="--",
    linewidth=2,
    label=f"Median: {df['width_cm'].median():.2f} cm",
)
axes[0, 1].set_title("Width Distribution", fontsize=12, fontweight="bold")
axes[0, 1].set_xlabel("Width (cm)", fontsize=10)
axes[0, 1].set_ylabel("Frequency", fontsize=10)
axes[0, 1].legend()
axes[0, 1].grid(alpha=0.3)

# Thickness distribution
axes[1, 0].hist(df["thickness_cm"], bins=30, color="purple", edgecolor="black", alpha=0.7)
axes[1, 0].axvline(
    df["thickness_cm"].mean(),
    color="red",
    linestyle="--",
    linewidth=2,
    label=f"Mean: {df['thickness_cm'].mean():.2f} cm",
)
axes[1, 0].axvline(
    df["thickness_cm"].median(),
    color="orange",
    linestyle="--",
    linewidth=2,
    label=f"Median: {df['thickness_cm'].median():.2f} cm",
)
axes[1, 0].set_title("Thickness Distribution", fontsize=12, fontweight="bold")
axes[1, 0].set_xlabel("Thickness (cm)", fontsize=10)
axes[1, 0].set_ylabel("Frequency", fontsize=10)
axes[1, 0].legend()
axes[1, 0].grid(alpha=0.3)

# Weight distribution
axes[1, 1].hist(df["weight_g"], bins=30, color="darkorange", edgecolor="black", alpha=0.7)
axes[1, 1].axvline(
    df["weight_g"].mean(),
    color="red",
    linestyle="--",
    linewidth=2,
    label=f"Mean: {df['weight_g'].mean():.2f} g",
)
axes[1, 1].axvline(
    df["weight_g"].median(),
    color="orange",
    linestyle="--",
    linewidth=2,
    label=f"Median: {df['weight_g'].median():.2f} g",
)
axes[1, 1].set_title("Weight Distribution", fontsize=12, fontweight="bold")
axes[1, 1].set_xlabel("Weight (g)", fontsize=10)
axes[1, 1].set_ylabel("Frequency", fontsize=10)
axes[1, 1].legend()
axes[1, 1].grid(alpha=0.3)

plt.tight_layout()
plt.show()

In [None]:
# Comparative morphometrics by artifact type
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# Length by type
df.boxplot(column="length_cm", by="artifact_type", ax=axes[0, 0])
axes[0, 0].set_title("Length Distribution by Artifact Type", fontsize=12, fontweight="bold")
axes[0, 0].set_xlabel("Artifact Type", fontsize=10)
axes[0, 0].set_ylabel("Length (cm)", fontsize=10)
axes[0, 0].tick_params(axis="x", rotation=45)
plt.sca(axes[0, 0])
plt.xticks(rotation=45, ha="right")

# Width by type
df.boxplot(column="width_cm", by="artifact_type", ax=axes[0, 1])
axes[0, 1].set_title("Width Distribution by Artifact Type", fontsize=12, fontweight="bold")
axes[0, 1].set_xlabel("Artifact Type", fontsize=10)
axes[0, 1].set_ylabel("Width (cm)", fontsize=10)
axes[0, 1].tick_params(axis="x", rotation=45)
plt.sca(axes[0, 1])
plt.xticks(rotation=45, ha="right")

# Thickness by type
df.boxplot(column="thickness_cm", by="artifact_type", ax=axes[1, 0])
axes[1, 0].set_title("Thickness Distribution by Artifact Type", fontsize=12, fontweight="bold")
axes[1, 0].set_xlabel("Artifact Type", fontsize=10)
axes[1, 0].set_ylabel("Thickness (cm)", fontsize=10)
axes[1, 0].tick_params(axis="x", rotation=45)
plt.sca(axes[1, 0])
plt.xticks(rotation=45, ha="right")

# Weight by type
df.boxplot(column="weight_g", by="artifact_type", ax=axes[1, 1])
axes[1, 1].set_title("Weight Distribution by Artifact Type", fontsize=12, fontweight="bold")
axes[1, 1].set_xlabel("Artifact Type", fontsize=10)
axes[1, 1].set_ylabel("Weight (g)", fontsize=10)
axes[1, 1].tick_params(axis="x", rotation=45)
plt.sca(axes[1, 1])
plt.xticks(rotation=45, ha="right")

plt.tight_layout()
plt.show()

In [None]:
# Size-weight relationships
fig, axes = plt.subplots(1, 3, figsize=(16, 5))

# Length vs Weight
for artifact_type in df["artifact_type"].unique():
    subset = df[df["artifact_type"] == artifact_type]
    axes[0].scatter(subset["length_cm"], subset["weight_g"], label=artifact_type, alpha=0.6, s=50)
axes[0].set_title("Length vs Weight by Type", fontsize=12, fontweight="bold")
axes[0].set_xlabel("Length (cm)", fontsize=10)
axes[0].set_ylabel("Weight (g)", fontsize=10)
axes[0].legend(fontsize=8)
axes[0].grid(alpha=0.3)

# Width vs Weight
for artifact_type in df["artifact_type"].unique():
    subset = df[df["artifact_type"] == artifact_type]
    axes[1].scatter(subset["width_cm"], subset["weight_g"], label=artifact_type, alpha=0.6, s=50)
axes[1].set_title("Width vs Weight by Type", fontsize=12, fontweight="bold")
axes[1].set_xlabel("Width (cm)", fontsize=10)
axes[1].set_ylabel("Weight (g)", fontsize=10)
axes[1].legend(fontsize=8)
axes[1].grid(alpha=0.3)

# Length vs Width
for artifact_type in df["artifact_type"].unique():
    subset = df[df["artifact_type"] == artifact_type]
    axes[2].scatter(subset["length_cm"], subset["width_cm"], label=artifact_type, alpha=0.6, s=50)
axes[2].set_title("Length vs Width by Type", fontsize=12, fontweight="bold")
axes[2].set_xlabel("Length (cm)", fontsize=10)
axes[2].set_ylabel("Width (cm)", fontsize=10)
axes[2].legend(fontsize=8)
axes[2].grid(alpha=0.3)

plt.tight_layout()
plt.show()

In [None]:
# Coefficient of variation analysis (standardization assessment)
print("\nCoefficient of Variation by Artifact Type:")
print("=" * 80)
print("\n(Lower CV indicates more standardized production)\n")

cv_data = []
for artifact_type in df["artifact_type"].unique():
    subset = df[df["artifact_type"] == artifact_type]
    cv_length = (subset["length_cm"].std() / subset["length_cm"].mean()) * 100
    cv_width = (subset["width_cm"].std() / subset["width_cm"].mean()) * 100
    cv_weight = (subset["weight_g"].std() / subset["weight_g"].mean()) * 100
    cv_data.append(
        {
            "Artifact Type": artifact_type,
            "CV Length (%)": round(cv_length, 2),
            "CV Width (%)": round(cv_width, 2),
            "CV Weight (%)": round(cv_weight, 2),
        }
    )

cv_df = pd.DataFrame(cv_data)
print(cv_df.to_string(index=False))

## Shape Analysis

Analysis of artifact shape through dimensional ratios. Shape indices are critical for distinguishing functional categories and manufacturing traditions.

Key ratios:
- **Elongation ratio (length/width)**: Distinguishes elongate vs. broad forms
- **Flatness ratio (width/thickness)**: Indicates cross-sectional shape
- **Relative thickness (thickness/length)**: Measures robustness

In [None]:
# Calculate shape indices
df["elongation_ratio"] = df["length_cm"] / df["width_cm"]
df["flatness_ratio"] = df["width_cm"] / df["thickness_cm"]
df["relative_thickness"] = df["thickness_cm"] / df["length_cm"]

print("\nShape Index Summary Statistics:")
print("=" * 80)
shape_vars = ["elongation_ratio", "flatness_ratio", "relative_thickness"]
print(df[shape_vars].describe())

In [None]:
# Shape ratios by artifact type
print("\nMean Shape Indices by Artifact Type:")
print("=" * 80)
shape_by_type = df.groupby("artifact_type")[shape_vars].mean().round(2)
print(shape_by_type)

In [None]:
# Visualize shape distributions
fig, axes = plt.subplots(1, 3, figsize=(16, 5))

# Elongation ratio
df.boxplot(column="elongation_ratio", by="artifact_type", ax=axes[0])
axes[0].set_title("Elongation Ratio (Length/Width) by Type", fontsize=12, fontweight="bold")
axes[0].set_xlabel("Artifact Type", fontsize=10)
axes[0].set_ylabel("Elongation Ratio", fontsize=10)
axes[0].axhline(y=1, color="red", linestyle="--", alpha=0.5, label="Ratio = 1 (square)")
axes[0].legend()
plt.sca(axes[0])
plt.xticks(rotation=45, ha="right")

# Flatness ratio
df.boxplot(column="flatness_ratio", by="artifact_type", ax=axes[1])
axes[1].set_title("Flatness Ratio (Width/Thickness) by Type", fontsize=12, fontweight="bold")
axes[1].set_xlabel("Artifact Type", fontsize=10)
axes[1].set_ylabel("Flatness Ratio", fontsize=10)
plt.sca(axes[1])
plt.xticks(rotation=45, ha="right")

# Relative thickness
df.boxplot(column="relative_thickness", by="artifact_type", ax=axes[2])
axes[2].set_title("Relative Thickness (Thickness/Length) by Type", fontsize=12, fontweight="bold")
axes[2].set_xlabel("Artifact Type", fontsize=10)
axes[2].set_ylabel("Relative Thickness", fontsize=10)
plt.sca(axes[2])
plt.xticks(rotation=45, ha="right")

plt.tight_layout()
plt.show()

In [None]:
# Shape space visualization
fig, axes = plt.subplots(1, 2, figsize=(14, 6))

# Elongation vs Flatness
for artifact_type in df["artifact_type"].unique():
    subset = df[df["artifact_type"] == artifact_type]
    axes[0].scatter(
        subset["elongation_ratio"], subset["flatness_ratio"], label=artifact_type, alpha=0.6, s=80
    )
axes[0].set_title("Shape Space: Elongation vs Flatness", fontsize=12, fontweight="bold")
axes[0].set_xlabel("Elongation Ratio (Length/Width)", fontsize=10)
axes[0].set_ylabel("Flatness Ratio (Width/Thickness)", fontsize=10)
axes[0].legend()
axes[0].grid(alpha=0.3)

# Elongation vs Relative Thickness
for artifact_type in df["artifact_type"].unique():
    subset = df[df["artifact_type"] == artifact_type]
    axes[1].scatter(
        subset["elongation_ratio"],
        subset["relative_thickness"],
        label=artifact_type,
        alpha=0.6,
        s=80,
    )
axes[1].set_title("Shape Space: Elongation vs Relative Thickness", fontsize=12, fontweight="bold")
axes[1].set_xlabel("Elongation Ratio (Length/Width)", fontsize=10)
axes[1].set_ylabel("Relative Thickness (Thickness/Length)", fontsize=10)
axes[1].legend()
axes[1].grid(alpha=0.3)

plt.tight_layout()
plt.show()

In [None]:
# Statistical comparison of shape between types
print("\nStatistical Comparison of Shape Indices Between Artifact Types:")
print("=" * 80)
print("\nKruskal-Wallis H-test (non-parametric ANOVA):")
print("\n(Tests whether shape distributions differ significantly between types)\n")

for shape_var in shape_vars:
    groups = [df[df["artifact_type"] == t][shape_var].values for t in df["artifact_type"].unique()]
    h_stat, p_value = stats.kruskal(*groups)
    print(f"{shape_var}:")
    print(f"  H-statistic = {h_stat:.4f}")
    print(f"  p-value = {p_value:.4f}")
    if p_value < 0.05:
        print("  Result: Significant difference between types (p < 0.05)")
    else:
        print("  Result: No significant difference (p >= 0.05)")
    print()

## Temporal Patterns and Stratigraphy

Analysis of artifact distribution through time using stratigraphic depth and temporal period assignments. This section applies the **Law of Superposition** - the fundamental principle that deeper deposits are older.

Temporal analysis reveals:
- Technological change over time
- Chronological trends in artifact types
- Changes in raw material use
- Evolution of manufacturing techniques

In [None]:
# Artifact distribution by depth (stratigraphic profile)
print("\nStratigraphic Distribution:")
print("=" * 80)
print(f"\nDepth range: {df['depth_cm'].min():.1f} cm to {df['depth_cm'].max():.1f} cm")
print(f"Mean depth: {df['depth_cm'].mean():.1f} cm")
print(f"Median depth: {df['depth_cm'].median():.1f} cm")

In [None]:
# Artifacts by temporal period
print("\nTemporal Period Distribution:")
print("=" * 80)
period_counts = df["estimated_period"].value_counts().sort_index()
print(period_counts)

# Mean depth by period (validation of stratigraphic integrity)
print("\nMean Depth by Temporal Period (validation of stratigraphy):")
print("=" * 80)
print("(Earlier periods should have greater mean depth)\n")
period_depth = df.groupby("estimated_period")["depth_cm"].agg(["mean", "std", "count"]).round(2)
print(period_depth)

In [None]:
# Visualize stratigraphic distribution
fig, axes = plt.subplots(1, 3, figsize=(16, 5))

# Depth histogram
axes[0].hist(df["depth_cm"], bins=20, color="saddlebrown", edgecolor="black", alpha=0.7)
axes[0].set_title("Artifact Frequency by Depth", fontsize=12, fontweight="bold")
axes[0].set_xlabel("Depth Below Surface (cm)", fontsize=10)
axes[0].set_ylabel("Frequency", fontsize=10)
axes[0].invert_xaxis()  # Deeper = further right
axes[0].grid(alpha=0.3)

# Artifacts by period
period_counts.plot(kind="bar", ax=axes[1], color="darkslategray")
axes[1].set_title("Artifacts by Temporal Period", fontsize=12, fontweight="bold")
axes[1].set_xlabel("Period", fontsize=10)
axes[1].set_ylabel("Frequency", fontsize=10)
axes[1].tick_params(axis="x", rotation=45)
axes[1].grid(axis="y", alpha=0.3)

# Box plot: depth by period
df.boxplot(column="depth_cm", by="estimated_period", ax=axes[2])
axes[2].set_title("Depth Distribution by Period", fontsize=12, fontweight="bold")
axes[2].set_xlabel("Period", fontsize=10)
axes[2].set_ylabel("Depth (cm)", fontsize=10)
axes[2].invert_yaxis()  # Invert so deeper = lower on plot
plt.sca(axes[2])
plt.xticks(rotation=45, ha="right")

plt.tight_layout()
plt.show()

In [None]:
# Artifact type distribution through time
print("\nArtifact Type Distribution by Temporal Period:")
print("=" * 80)
type_period_crosstab = pd.crosstab(df["estimated_period"], df["artifact_type"])
print(type_period_crosstab)

# Proportional representation
print("\n" + "=" * 80)
print("\nProportional Representation (%) by Period:")
print("=" * 80)
type_period_pct = pd.crosstab(df["estimated_period"], df["artifact_type"], normalize="index") * 100
print(type_period_pct.round(1))

In [None]:
# Visualize type changes through time
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Stacked bar chart - counts
type_period_crosstab.plot(kind="bar", stacked=True, ax=axes[0], colormap="tab10")
axes[0].set_title("Artifact Type Composition by Period (Counts)", fontsize=12, fontweight="bold")
axes[0].set_xlabel("Temporal Period", fontsize=10)
axes[0].set_ylabel("Frequency", fontsize=10)
axes[0].legend(title="Artifact Type", bbox_to_anchor=(1.05, 1), loc="upper left")
axes[0].tick_params(axis="x", rotation=45)
axes[0].grid(axis="y", alpha=0.3)

# Stacked bar chart - proportions
type_period_pct.plot(kind="bar", stacked=True, ax=axes[1], colormap="tab10")
axes[1].set_title(
    "Artifact Type Composition by Period (Proportions)", fontsize=12, fontweight="bold"
)
axes[1].set_xlabel("Temporal Period", fontsize=10)
axes[1].set_ylabel("Percentage (%)", fontsize=10)
axes[1].legend(title="Artifact Type", bbox_to_anchor=(1.05, 1), loc="upper left")
axes[1].tick_params(axis="x", rotation=45)
axes[1].grid(axis="y", alpha=0.3)

plt.tight_layout()
plt.show()

In [None]:
# Morphometric trends through time
print("\nMorphometric Changes Through Time:")
print("=" * 80)
temporal_metrics = (
    df.groupby("estimated_period")[["length_cm", "width_cm", "weight_g"]].mean().round(2)
)
print(temporal_metrics)

# Visualize temporal trends
fig, axes = plt.subplots(1, 3, figsize=(16, 5))

# Length trend
temporal_metrics["length_cm"].plot(
    kind="line", marker="o", ax=axes[0], color="steelblue", linewidth=2, markersize=8
)
axes[0].set_title("Mean Artifact Length Through Time", fontsize=12, fontweight="bold")
axes[0].set_xlabel("Temporal Period", fontsize=10)
axes[0].set_ylabel("Mean Length (cm)", fontsize=10)
axes[0].grid(alpha=0.3)
axes[0].tick_params(axis="x", rotation=45)

# Width trend
temporal_metrics["width_cm"].plot(
    kind="line", marker="o", ax=axes[1], color="darkgreen", linewidth=2, markersize=8
)
axes[1].set_title("Mean Artifact Width Through Time", fontsize=12, fontweight="bold")
axes[1].set_xlabel("Temporal Period", fontsize=10)
axes[1].set_ylabel("Mean Width (cm)", fontsize=10)
axes[1].grid(alpha=0.3)
axes[1].tick_params(axis="x", rotation=45)

# Weight trend
temporal_metrics["weight_g"].plot(
    kind="line", marker="o", ax=axes[2], color="darkorange", linewidth=2, markersize=8
)
axes[2].set_title("Mean Artifact Weight Through Time", fontsize=12, fontweight="bold")
axes[2].set_xlabel("Temporal Period", fontsize=10)
axes[2].set_ylabel("Mean Weight (g)", fontsize=10)
axes[2].grid(alpha=0.3)
axes[2].tick_params(axis="x", rotation=45)

plt.tight_layout()
plt.show()

In [None]:
# Material use through time
print("\nRaw Material Use Through Time:")
print("=" * 80)
material_period = pd.crosstab(df["estimated_period"], df["material"])
print(material_period)

# Visualize material trends
material_period_pct = pd.crosstab(df["estimated_period"], df["material"], normalize="index") * 100
material_period_pct.plot(kind="bar", stacked=True, figsize=(10, 6), colormap="Spectral")
plt.title("Raw Material Composition Through Time", fontsize=14, fontweight="bold")
plt.xlabel("Temporal Period", fontsize=12)
plt.ylabel("Percentage (%)", fontsize=12)
plt.legend(title="Material", bbox_to_anchor=(1.05, 1), loc="upper left")
plt.xticks(rotation=45, ha="right")
plt.grid(axis="y", alpha=0.3)
plt.tight_layout()
plt.show()

## Spatial Distribution Analysis

Examination of horizontal artifact distribution across the excavation grid. Spatial analysis identifies **activity areas** - discrete zones where specific tasks were performed.

Key concepts:
- **Provenance**: Precise three-dimensional location of artifacts
- **Activity areas**: Spatial clusters indicating functional zones
- **Site structure**: Organization of space reflecting social and functional patterns
- **Spatial association**: Co-occurrence of artifact types suggesting related activities

In [None]:
# Excavation unit distribution
print("\nSpatial Distribution by Excavation Unit:")
print("=" * 80)
unit_counts = df["excavation_unit"].value_counts().sort_index()
print(unit_counts)
print(f"\nTotal excavation units: {df['excavation_unit'].nunique()}")
print(f"Mean artifacts per unit: {len(df) / df['excavation_unit'].nunique():.1f}")

In [None]:
# Artifact density map
fig, ax = plt.subplots(figsize=(10, 8))

unit_counts_sorted = unit_counts.sort_index()
bars = ax.bar(
    range(len(unit_counts_sorted)), unit_counts_sorted.values, color="coral", edgecolor="black"
)
ax.set_title("Artifact Density by Excavation Unit", fontsize=14, fontweight="bold")
ax.set_xlabel("Excavation Unit", fontsize=12)
ax.set_ylabel("Artifact Count", fontsize=12)
ax.set_xticks(range(len(unit_counts_sorted)))
ax.set_xticklabels(unit_counts_sorted.index, rotation=45, ha="right")
ax.grid(axis="y", alpha=0.3)

# Highlight high-density units
mean_density = unit_counts.mean()
for i, (_, count) in enumerate(unit_counts_sorted.items()):
    if count > mean_density * 1.5:
        bars[i].set_color("darkred")
        ax.text(i, count + 1, "High", ha="center", fontsize=8, fontweight="bold")

ax.axhline(
    y=mean_density, color="blue", linestyle="--", linewidth=2, label=f"Mean: {mean_density:.1f}"
)
ax.legend()

plt.tight_layout()
plt.show()

In [None]:
# Artifact type distribution by excavation unit
print("\nArtifact Type Distribution by Excavation Unit:")
print("=" * 80)
unit_type_crosstab = pd.crosstab(df["excavation_unit"], df["artifact_type"])
print(unit_type_crosstab)

# Identify potential activity areas
print("\n" + "=" * 80)
print("\nDominant Artifact Type by Unit (potential activity areas):")
print("=" * 80)
dominant_types = unit_type_crosstab.idxmax(axis=1)
for unit in dominant_types.index:
    print(
        f"  {unit}: {dominant_types[unit]} ({unit_type_crosstab.loc[unit, dominant_types[unit]]} artifacts)"
    )

In [None]:
# Heatmap: artifact types across space
fig, ax = plt.subplots(figsize=(12, 8))

sns.heatmap(
    unit_type_crosstab.T, annot=True, fmt="d", cmap="YlOrRd", cbar_kws={"label": "Count"}, ax=ax
)
ax.set_title(
    "Spatial Distribution Heatmap: Artifact Types by Excavation Unit",
    fontsize=14,
    fontweight="bold",
)
ax.set_xlabel("Excavation Unit", fontsize=12)
ax.set_ylabel("Artifact Type", fontsize=12)
plt.xticks(rotation=45, ha="right")
plt.yticks(rotation=0)
plt.tight_layout()
plt.show()

In [None]:
# Material distribution across space
print("\nRaw Material Distribution by Excavation Unit:")
print("=" * 80)
unit_material = pd.crosstab(df["excavation_unit"], df["material"])
print(unit_material)

# Visualize material spatial patterns
fig, ax = plt.subplots(figsize=(12, 6))
unit_material.plot(kind="bar", stacked=True, ax=ax, colormap="Set3")
ax.set_title("Material Composition by Excavation Unit", fontsize=14, fontweight="bold")
ax.set_xlabel("Excavation Unit", fontsize=12)
ax.set_ylabel("Artifact Count", fontsize=12)
ax.legend(title="Material", bbox_to_anchor=(1.05, 1), loc="upper left")
ax.tick_params(axis="x", rotation=45)
ax.grid(axis="y", alpha=0.3)
plt.tight_layout()
plt.show()

In [None]:
# Average artifact size by excavation unit
print("\nMean Artifact Dimensions by Excavation Unit:")
print("=" * 80)
unit_metrics = df.groupby("excavation_unit")[["length_cm", "width_cm", "weight_g"]].mean().round(2)
print(unit_metrics)

# Visualize spatial size variation
fig, axes = plt.subplots(1, 3, figsize=(16, 5))

unit_metrics["length_cm"].plot(kind="bar", ax=axes[0], color="steelblue")
axes[0].set_title("Mean Length by Excavation Unit", fontsize=12, fontweight="bold")
axes[0].set_xlabel("Excavation Unit", fontsize=10)
axes[0].set_ylabel("Mean Length (cm)", fontsize=10)
axes[0].tick_params(axis="x", rotation=45)
axes[0].grid(axis="y", alpha=0.3)

unit_metrics["width_cm"].plot(kind="bar", ax=axes[1], color="darkgreen")
axes[1].set_title("Mean Width by Excavation Unit", fontsize=12, fontweight="bold")
axes[1].set_xlabel("Excavation Unit", fontsize=10)
axes[1].set_ylabel("Mean Width (cm)", fontsize=10)
axes[1].tick_params(axis="x", rotation=45)
axes[1].grid(axis="y", alpha=0.3)

unit_metrics["weight_g"].plot(kind="bar", ax=axes[2], color="darkorange")
axes[2].set_title("Mean Weight by Excavation Unit", fontsize=12, fontweight="bold")
axes[2].set_xlabel("Excavation Unit", fontsize=10)
axes[2].set_ylabel("Mean Weight (g)", fontsize=10)
axes[2].tick_params(axis="x", rotation=45)
axes[2].grid(axis="y", alpha=0.3)

plt.tight_layout()
plt.show()

## Condition Assessment and Taphonomy

**Taphonomy** is the study of processes affecting artifacts from deposition to recovery. Understanding preservation patterns reveals:
- Post-depositional disturbance
- Differential preservation by material
- Site formation processes
- Vertical movement through the deposit

This analysis examines artifact condition in relation to depth, material, and spatial context.

In [None]:
# Condition overview
print("\nArtifact Condition Assessment:")
print("=" * 80)
condition_counts = df["condition"].value_counts()
print(condition_counts)
print(
    f"\nPercentage well-preserved: {(condition_counts.get('excellent', 0) + condition_counts.get('good', 0)) / len(df) * 100:.1f}%"
)

In [None]:
# Visualize condition distribution
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Bar chart
condition_counts.plot(kind="bar", ax=axes[0], color="teal", edgecolor="black")
axes[0].set_title("Artifact Condition Distribution", fontsize=14, fontweight="bold")
axes[0].set_xlabel("Condition", fontsize=12)
axes[0].set_ylabel("Frequency", fontsize=12)
axes[0].tick_params(axis="x", rotation=45)
axes[0].grid(axis="y", alpha=0.3)

# Pie chart
axes[1].pie(
    condition_counts.values,
    labels=condition_counts.index,
    autopct="%1.1f%%",
    startangle=90,
    colors=sns.color_palette("Set2"),
)
axes[1].set_title("Condition Proportions", fontsize=14, fontweight="bold")

plt.tight_layout()
plt.show()

In [None]:
# Condition by artifact type
print("\nCondition Distribution by Artifact Type:")
print("=" * 80)
type_condition = pd.crosstab(df["artifact_type"], df["condition"])
print(type_condition)

# Proportional view
print("\n" + "=" * 80)
print("\nCondition Proportions (%) by Type:")
print("=" * 80)
type_condition_pct = pd.crosstab(df["artifact_type"], df["condition"], normalize="index") * 100
print(type_condition_pct.round(1))

# Visualize
type_condition_pct.plot(kind="bar", stacked=True, figsize=(10, 6), colormap="RdYlGn")
plt.title("Preservation by Artifact Type", fontsize=14, fontweight="bold")
plt.xlabel("Artifact Type", fontsize=12)
plt.ylabel("Percentage (%)", fontsize=12)
plt.legend(title="Condition", bbox_to_anchor=(1.05, 1), loc="upper left")
plt.xticks(rotation=45, ha="right")
plt.grid(axis="y", alpha=0.3)
plt.tight_layout()
plt.show()

In [None]:
# Condition by material (differential preservation)
print("\nCondition Distribution by Material:")
print("=" * 80)
material_condition = pd.crosstab(df["material"], df["condition"])
print(material_condition)

# Proportional view
print("\n" + "=" * 80)
print("\nCondition Proportions (%) by Material:")
print("=" * 80)
material_condition_pct = pd.crosstab(df["material"], df["condition"], normalize="index") * 100
print(material_condition_pct.round(1))

# Visualize
material_condition_pct.plot(kind="bar", stacked=True, figsize=(10, 6), colormap="RdYlGn")
plt.title("Differential Preservation by Material", fontsize=14, fontweight="bold")
plt.xlabel("Material", fontsize=12)
plt.ylabel("Percentage (%)", fontsize=12)
plt.legend(title="Condition", bbox_to_anchor=(1.05, 1), loc="upper left")
plt.xticks(rotation=45, ha="right")
plt.grid(axis="y", alpha=0.3)
plt.tight_layout()
plt.show()

In [None]:
# Condition vs depth (taphonomic processes)
print("\nMean Depth by Condition:")
print("=" * 80)
condition_depth = df.groupby("condition")["depth_cm"].agg(["mean", "std", "count"]).round(2)
print(condition_depth)

# Visualize depth-condition relationship
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Box plot
df.boxplot(column="depth_cm", by="condition", ax=axes[0])
axes[0].set_title("Depth Distribution by Condition", fontsize=12, fontweight="bold")
axes[0].set_xlabel("Condition", fontsize=10)
axes[0].set_ylabel("Depth (cm)", fontsize=10)
axes[0].invert_yaxis()
plt.sca(axes[0])
plt.xticks(rotation=45, ha="right")

# Scatter plot with condition categories
for condition in df["condition"].unique():
    subset = df[df["condition"] == condition]
    axes[1].scatter(subset["depth_cm"], [condition] * len(subset), alpha=0.5, s=50, label=condition)
axes[1].set_title("Condition by Depth", fontsize=12, fontweight="bold")
axes[1].set_xlabel("Depth (cm)", fontsize=10)
axes[1].set_ylabel("Condition", fontsize=10)
axes[1].invert_xaxis()
axes[1].grid(alpha=0.3)

plt.tight_layout()
plt.show()

In [None]:
# Condition by excavation unit (spatial preservation patterns)
print("\nCondition Distribution by Excavation Unit:")
print("=" * 80)
unit_condition = pd.crosstab(df["excavation_unit"], df["condition"])
print(unit_condition)

# Heatmap
fig, ax = plt.subplots(figsize=(12, 8))
sns.heatmap(
    unit_condition.T, annot=True, fmt="d", cmap="RdYlGn", cbar_kws={"label": "Count"}, ax=ax
)
ax.set_title("Preservation Patterns Across Excavation Units", fontsize=14, fontweight="bold")
ax.set_xlabel("Excavation Unit", fontsize=12)
ax.set_ylabel("Condition", fontsize=12)
plt.xticks(rotation=45, ha="right")
plt.yticks(rotation=0)
plt.tight_layout()
plt.show()

## Type Classification and Subtype Analysis

**Type classification** is the systematic grouping of artifacts based on shared attributes. This section uses morphometric data to:
- Validate existing typologies
- Identify quantitative differences between subtypes
- Apply multivariate clustering to discover morphological groups
- Assess the coherence of type categories

In [None]:
# Subtype overview
print("\nSubtype Classification:")
print("=" * 80)
subtype_counts = df["subtype"].value_counts()
print(subtype_counts)
print(f"\nNumber of subtypes: {df['subtype'].nunique()}")

In [None]:
# Subtype distribution by main type
print("\nSubtype Distribution by Artifact Type:")
print("=" * 80)
type_subtype = pd.crosstab(df["artifact_type"], df["subtype"])
print(type_subtype)

# Visualize
fig, ax = plt.subplots(figsize=(12, 6))
type_subtype.plot(kind="bar", stacked=True, ax=ax, colormap="tab20")
ax.set_title("Subtype Composition by Artifact Type", fontsize=14, fontweight="bold")
ax.set_xlabel("Artifact Type", fontsize=12)
ax.set_ylabel("Frequency", fontsize=12)
ax.legend(title="Subtype", bbox_to_anchor=(1.05, 1), loc="upper left")
ax.tick_params(axis="x", rotation=45)
ax.grid(axis="y", alpha=0.3)
plt.tight_layout()
plt.show()

In [None]:
# Morphometric differences between subtypes
print("\nMorphometric Characteristics by Subtype:")
print("=" * 80)
subtype_metrics = (
    df.groupby("subtype")[["length_cm", "width_cm", "weight_g", "elongation_ratio"]].mean().round(2)
)
print(subtype_metrics)

# Visualize subtype differences
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# Length by subtype
df.boxplot(column="length_cm", by="subtype", ax=axes[0, 0])
axes[0, 0].set_title("Length Distribution by Subtype", fontsize=12, fontweight="bold")
axes[0, 0].set_xlabel("Subtype", fontsize=10)
axes[0, 0].set_ylabel("Length (cm)", fontsize=10)
plt.sca(axes[0, 0])
plt.xticks(rotation=45, ha="right")

# Width by subtype
df.boxplot(column="width_cm", by="subtype", ax=axes[0, 1])
axes[0, 1].set_title("Width Distribution by Subtype", fontsize=12, fontweight="bold")
axes[0, 1].set_xlabel("Subtype", fontsize=10)
axes[0, 1].set_ylabel("Width (cm)", fontsize=10)
plt.sca(axes[0, 1])
plt.xticks(rotation=45, ha="right")

# Weight by subtype
df.boxplot(column="weight_g", by="subtype", ax=axes[1, 0])
axes[1, 0].set_title("Weight Distribution by Subtype", fontsize=12, fontweight="bold")
axes[1, 0].set_xlabel("Subtype", fontsize=10)
axes[1, 0].set_ylabel("Weight (g)", fontsize=10)
plt.sca(axes[1, 0])
plt.xticks(rotation=45, ha="right")

# Elongation by subtype
df.boxplot(column="elongation_ratio", by="subtype", ax=axes[1, 1])
axes[1, 1].set_title("Elongation Ratio by Subtype", fontsize=12, fontweight="bold")
axes[1, 1].set_xlabel("Subtype", fontsize=10)
axes[1, 1].set_ylabel("Elongation Ratio", fontsize=10)
plt.sca(axes[1, 1])
plt.xticks(rotation=45, ha="right")

plt.tight_layout()
plt.show()

In [None]:
# Multivariate analysis: PCA on morphometric variables
print("\nPrincipal Components Analysis of Morphometric Data:")
print("=" * 80)
print("\n(Dimensional reduction to visualize morphological variation)\n")

# Prepare data
morphometric_features = ["length_cm", "width_cm", "thickness_cm", "weight_g"]
X = df[morphometric_features].values

# Standardize
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# PCA
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X_scaled)

print("Explained variance ratio:")
print(
    f"  PC1: {pca.explained_variance_ratio_[0]:.3f} ({pca.explained_variance_ratio_[0] * 100:.1f}%)"
)
print(
    f"  PC2: {pca.explained_variance_ratio_[1]:.3f} ({pca.explained_variance_ratio_[1] * 100:.1f}%)"
)
print(
    f"  Total: {sum(pca.explained_variance_ratio_):.3f} ({sum(pca.explained_variance_ratio_) * 100:.1f}%)"
)

print("\nComponent loadings:")
loadings = pd.DataFrame(pca.components_.T, columns=["PC1", "PC2"], index=morphometric_features)
print(loadings.round(3))

In [None]:
# Visualize PCA results
fig, axes = plt.subplots(1, 2, figsize=(14, 6))

# PCA by artifact type
for artifact_type in df["artifact_type"].unique():
    mask = df["artifact_type"] == artifact_type
    axes[0].scatter(X_pca[mask, 0], X_pca[mask, 1], label=artifact_type, alpha=0.6, s=80)
axes[0].set_title("PCA Morphospace by Artifact Type", fontsize=12, fontweight="bold")
axes[0].set_xlabel(f"PC1 ({pca.explained_variance_ratio_[0] * 100:.1f}% variance)", fontsize=10)
axes[0].set_ylabel(f"PC2 ({pca.explained_variance_ratio_[1] * 100:.1f}% variance)", fontsize=10)
axes[0].legend()
axes[0].grid(alpha=0.3)

# PCA by subtype
for subtype in df["subtype"].unique():
    mask = df["subtype"] == subtype
    axes[1].scatter(X_pca[mask, 0], X_pca[mask, 1], label=subtype, alpha=0.6, s=80)
axes[1].set_title("PCA Morphospace by Subtype", fontsize=12, fontweight="bold")
axes[1].set_xlabel(f"PC1 ({pca.explained_variance_ratio_[0] * 100:.1f}% variance)", fontsize=10)
axes[1].set_ylabel(f"PC2 ({pca.explained_variance_ratio_[1] * 100:.1f}% variance)", fontsize=10)
axes[1].legend(bbox_to_anchor=(1.05, 1), loc="upper left")
axes[1].grid(alpha=0.3)

plt.tight_layout()
plt.show()

In [None]:
# K-means clustering for typological validation
print("\nK-Means Clustering Analysis:")
print("=" * 80)
print("\n(Unsupervised classification based on morphometrics)\n")

# Determine optimal number of clusters (using number of unique types)
n_clusters = df["artifact_type"].nunique()
print(f"Number of clusters: {n_clusters} (matching artifact type count)")

# Perform clustering
kmeans = KMeans(n_clusters=n_clusters, random_state=42, n_init=10)
clusters = kmeans.fit_predict(X_scaled)
df["cluster"] = clusters

# Compare clusters to artifact types
print("\nCluster composition by artifact type:")
cluster_type = pd.crosstab(df["cluster"], df["artifact_type"])
print(cluster_type)

# Visualize clusters
fig, ax = plt.subplots(figsize=(10, 6))
scatter = ax.scatter(
    X_pca[:, 0], X_pca[:, 1], c=clusters, cmap="viridis", s=80, alpha=0.6, edgecolors="black"
)
ax.scatter(
    kmeans.cluster_centers_[:, 0],
    kmeans.cluster_centers_[:, 1],
    c="red",
    marker="X",
    s=300,
    edgecolors="black",
    linewidths=2,
    label="Centroids",
)
ax.set_title("K-Means Clusters in PCA Space", fontsize=14, fontweight="bold")
ax.set_xlabel(f"PC1 ({pca.explained_variance_ratio_[0] * 100:.1f}% variance)", fontsize=12)
ax.set_ylabel(f"PC2 ({pca.explained_variance_ratio_[1] * 100:.1f}% variance)", fontsize=12)
plt.colorbar(scatter, label="Cluster")
ax.legend()
ax.grid(alpha=0.3)
plt.tight_layout()
plt.show()

## Summary Statistics and Site Report

Comprehensive synthesis of analytical results. This section provides:
- Overall assemblage characterization
- Key quantitative findings
- Temporal and spatial patterns summary
- Interpretive framework for cultural analysis

In [None]:
# Comprehensive site summary
print("=" * 80)
print("COMPREHENSIVE ARCHAEOLOGICAL SITE REPORT")
print("=" * 80)

print("\n1. ASSEMBLAGE OVERVIEW")
print("-" * 80)
print(f"   Total artifacts analyzed: {len(df)}")
print(f"   Artifact types present: {df['artifact_type'].nunique()}")
print(f"   Subtypes identified: {df['subtype'].nunique()}")
print(f"   Raw materials represented: {df['material'].nunique()}")
print(f"   Temporal periods: {df['estimated_period'].nunique()}")
print(f"   Excavation units: {df['excavation_unit'].nunique()}")

print("\n2. STRATIGRAPHIC CONTEXT")
print("-" * 80)
print(f"   Depth range: {df['depth_cm'].min():.1f} - {df['depth_cm'].max():.1f} cm below surface")
print(f"   Mean artifact depth: {df['depth_cm'].mean():.1f} cm (SD = {df['depth_cm'].std():.1f})")
print("   Stratigraphic integrity: Periods correlate with depth (deeper = older)")

print("\n3. MORPHOMETRIC SUMMARY")
print("-" * 80)
print(f"   Length range: {df['length_cm'].min():.2f} - {df['length_cm'].max():.2f} cm")
print(f"   Mean length: {df['length_cm'].mean():.2f} cm (SD = {df['length_cm'].std():.2f})")
print(f"   Width range: {df['width_cm'].min():.2f} - {df['width_cm'].max():.2f} cm")
print(f"   Mean width: {df['width_cm'].mean():.2f} cm (SD = {df['width_cm'].std():.2f})")
print(f"   Weight range: {df['weight_g'].min():.2f} - {df['weight_g'].max():.2f} g")
print(f"   Mean weight: {df['weight_g'].mean():.2f} g (SD = {df['weight_g'].std():.2f})")

print("\n4. TYPE COMPOSITION")
print("-" * 80)
for artifact_type in df["artifact_type"].value_counts().index:
    count = df["artifact_type"].value_counts()[artifact_type]
    pct = count / len(df) * 100
    print(f"   {artifact_type}: {count} artifacts ({pct:.1f}%)")

print("\n5. MATERIAL COMPOSITION")
print("-" * 80)
for material in df["material"].value_counts().index:
    count = df["material"].value_counts()[material]
    pct = count / len(df) * 100
    print(f"   {material}: {count} artifacts ({pct:.1f}%)")

print("\n6. PRESERVATION STATUS")
print("-" * 80)
for condition in df["condition"].value_counts().index:
    count = df["condition"].value_counts()[condition]
    pct = count / len(df) * 100
    print(f"   {condition}: {count} artifacts ({pct:.1f}%)")

print("\n7. TEMPORAL DISTRIBUTION")
print("-" * 80)
for period in sorted(df["estimated_period"].unique()):
    count = len(df[df["estimated_period"] == period])
    pct = count / len(df) * 100
    mean_depth = df[df["estimated_period"] == period]["depth_cm"].mean()
    print(f"   {period}: {count} artifacts ({pct:.1f}%), mean depth = {mean_depth:.1f} cm")

print("\n8. SPATIAL DISTRIBUTION")
print("-" * 80)
unit_density = df["excavation_unit"].value_counts()
print(f"   Units with artifacts: {len(unit_density)}")
print(f"   Mean artifacts per unit: {unit_density.mean():.1f}")
print(f"   Highest density unit: {unit_density.index[0]} ({unit_density.values[0]} artifacts)")
print(f"   Lowest density unit: {unit_density.index[-1]} ({unit_density.values[-1]} artifacts)")

print("\n" + "=" * 80)

In [None]:
# Key findings summary visualizations
fig = plt.figure(figsize=(16, 12))
gs = fig.add_gridspec(3, 3, hspace=0.3, wspace=0.3)

# 1. Type composition pie chart
ax1 = fig.add_subplot(gs[0, 0])
type_counts = df["artifact_type"].value_counts()
ax1.pie(type_counts.values, labels=type_counts.index, autopct="%1.1f%%", startangle=90)
ax1.set_title("Assemblage Composition", fontweight="bold")

# 2. Temporal distribution
ax2 = fig.add_subplot(gs[0, 1])
period_counts = df["estimated_period"].value_counts().sort_index()
period_counts.plot(kind="bar", ax=ax2, color="steelblue")
ax2.set_title("Temporal Distribution", fontweight="bold")
ax2.set_xlabel("Period")
ax2.set_ylabel("Count")
ax2.tick_params(axis="x", rotation=45)

# 3. Material composition
ax3 = fig.add_subplot(gs[0, 2])
material_counts = df["material"].value_counts()
material_counts.plot(kind="barh", ax=ax3, color="darkgreen")
ax3.set_title("Material Distribution", fontweight="bold")
ax3.set_xlabel("Count")

# 4. Morphometric distributions
ax4 = fig.add_subplot(gs[1, :])
df.boxplot(column=["length_cm", "width_cm", "thickness_cm", "weight_g"], ax=ax4)
ax4.set_title("Morphometric Distributions", fontweight="bold")
ax4.set_ylabel("Measurement (cm or g)")
ax4.grid(axis="y", alpha=0.3)

# 5. Depth profile
ax5 = fig.add_subplot(gs[2, 0])
ax5.hist(df["depth_cm"], bins=15, color="saddlebrown", edgecolor="black", orientation="horizontal")
ax5.set_title("Depth Profile", fontweight="bold")
ax5.set_ylabel("Depth (cm)")
ax5.set_xlabel("Frequency")
ax5.invert_yaxis()

# 6. Condition assessment
ax6 = fig.add_subplot(gs[2, 1])
condition_counts = df["condition"].value_counts()
condition_counts.plot(kind="bar", ax=ax6, color="teal")
ax6.set_title("Preservation Status", fontweight="bold")
ax6.set_xlabel("Condition")
ax6.set_ylabel("Count")
ax6.tick_params(axis="x", rotation=45)

# 7. PCA morphospace
ax7 = fig.add_subplot(gs[2, 2])
for artifact_type in df["artifact_type"].unique():
    mask = df["artifact_type"] == artifact_type
    ax7.scatter(X_pca[mask, 0], X_pca[mask, 1], label=artifact_type, alpha=0.6, s=30)
ax7.set_title("Morphological Space (PCA)", fontweight="bold")
ax7.set_xlabel("PC1")
ax7.set_ylabel("PC2")
ax7.legend(fontsize=7, loc="best")
ax7.grid(alpha=0.3)

plt.suptitle("Comprehensive Site Analysis Summary", fontsize=16, fontweight="bold", y=0.995)
plt.show()

In [None]:
# Final interpretive summary
print("\n" + "=" * 80)
print("INTERPRETIVE SUMMARY AND RECOMMENDATIONS")
print("=" * 80)

print("""
KEY FINDINGS:

1. ASSEMBLAGE DIVERSITY
   The artifact assemblage demonstrates significant typological diversity,
   reflecting a range of activities and technological traditions at the site.

2. STRATIGRAPHIC INTEGRITY
   The correlation between artifact depth and temporal period confirms good
   stratigraphic integrity. Deeper artifacts consistently date to earlier periods,
   validating the application of superposition principles.

3. TECHNOLOGICAL CHANGE
   Temporal trends in morphometrics and type composition reveal technological
   evolution through time. Changes in artifact size, shape, and material use
   reflect shifting cultural practices and resource availability.

4. SPATIAL PATTERNING
   Differential artifact density and type distributions across excavation units
   indicate structured use of space. High-density units and type concentrations
   likely represent discrete activity areas.

5. PRESERVATION PATTERNS
   Taphonomic analysis reveals differential preservation by material type.
   These patterns inform interpretations of assemblage composition and
   original artifact frequencies.

6. TYPOLOGICAL COHERENCE
   Multivariate analysis validates existing typological classifications.
   PCA and clustering results demonstrate morphometric distinctiveness
   of defined artifact types, supporting functional and technological
   interpretations.

RECOMMENDATIONS FOR FURTHER ANALYSIS:

- Detailed use-wear analysis to confirm functional interpretations
- Refitting studies to identify manufacturing sequences
- Geochemical sourcing of raw materials
- Comparative analysis with regional assemblages
- High-resolution spatial analysis within activity areas
- Radiometric dating to refine temporal framework

This quantitative analysis provides a robust foundation for interpreting
past human behavior, technological organization, and site formation processes.
""")

print("=" * 80)

In [None]:
# Export cleaned and analyzed dataset
print("\nExporting analyzed dataset with derived variables...")
df.to_csv("artifacts_analyzed.csv", index=False)
print("Dataset exported to: artifacts_analyzed.csv")
print(f"\nTotal columns in exported dataset: {len(df.columns)}")
print("New derived variables: elongation_ratio, flatness_ratio, relative_thickness, cluster")