# Archaeological Site Analysis with AWS

This notebook demonstrates a complete archaeological data analysis pipeline using AWS services:

1. **Generate Sample Data** - Create realistic artifact dataset
2. **Upload to S3** - Store artifact data in cloud
3. **Lambda Processing** - Classify and analyze artifacts
4. **Query DynamoDB** - Retrieve artifact catalog
5. **Visualization** - Create publication-quality figures
6. **Spatial Analysis** - Analyze artifact distribution
7. **Chronological Analysis** - Examine temporal patterns
8. **Typological Analysis** - Study artifact types

**Duration:** 60-90 minutes  
**Cost:** ~$5-10

## Setup

Import required libraries and configure AWS credentials.

In [None]:
# Import libraries
import boto3
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime, timedelta
import json
from io import StringIO
from pathlib import Path
import warnings
warnings.filterwarnings('ignore')

# Set plotting style
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")

# Configure pandas display
pd.set_option('display.max_columns', None)
pd.set_option('display.width', 1000)

print("✓ Libraries imported successfully")

In [None]:
# Configuration
BUCKET_NAME = 'archaeology-data-xxxx'  # Replace with your bucket name
TABLE_NAME = 'ArtifactCatalog'
LAMBDA_FUNCTION = 'classify-artifacts'
SITE_ID = 'SITE_A'
AWS_REGION = 'us-east-1'

# Initialize AWS clients
s3 = boto3.client('s3', region_name=AWS_REGION)
dynamodb = boto3.resource('dynamodb', region_name=AWS_REGION)
lambda_client = boto3.client('lambda', region_name=AWS_REGION)

print(f"Configuration:")
print(f"  S3 Bucket: {BUCKET_NAME}")
print(f"  DynamoDB Table: {TABLE_NAME}")
print(f"  Lambda Function: {LAMBDA_FUNCTION}")
print(f"  Site ID: {SITE_ID}")

## 1. Generate Sample Artifact Data

Create a realistic archaeological dataset with multiple artifact types and periods.

In [None]:
def generate_artifacts(num_artifacts=200, site_id='SITE_A'):
    """Generate sample archaeological artifacts."""
    np.random.seed(42)
    
    artifact_types = {
        'pottery': ['ceramic', 'terracotta', 'glazed ceramic'],
        'lithic': ['flint', 'obsidian', 'chert', 'quartzite'],
        'bone': ['mammal bone', 'bird bone', 'fish bone'],
        'coin': ['bronze', 'silver', 'gold'],
        'architecture': ['brick', 'stone', 'tile']
    }
    
    periods = ['Neolithic', 'Bronze Age', 'Iron Age', 'Classical', 'Medieval']
    
    artifacts = []
    
    for i in range(num_artifacts):
        artifact_type = np.random.choice(list(artifact_types.keys()))
        material = np.random.choice(artifact_types[artifact_type])
        
        # Generate measurements
        if artifact_type == 'pottery':
            length, width, thickness = 150, 120, 8
            weight = 300
        elif artifact_type == 'lithic':
            length, width, thickness = 50, 35, 10
            weight = 40
        elif artifact_type == 'bone':
            length, width, thickness = 80, 25, 15
            weight = 50
        elif artifact_type == 'coin':
            length, width, thickness = 20, 20, 2
            weight = 5
        else:  # architecture
            length, width, thickness = 250, 150, 50
            weight = 2000
        
        # Add variation
        length += np.random.normal(0, length * 0.2)
        width += np.random.normal(0, width * 0.2)
        thickness += np.random.normal(0, thickness * 0.2)
        weight += np.random.normal(0, weight * 0.3)
        
        # GPS coordinates
        gps_lat = 40.0 + np.random.normal(0, 0.05)
        gps_lon = 20.0 + np.random.normal(0, 0.05)
        
        # Stratigraphy
        layer = np.random.randint(1, 6)
        strat_unit = f"Layer_{layer}"
        period = periods[min(layer - 1, len(periods) - 1)]
        
        # Dating
        period_dates = {
            'Neolithic': (-8000, -4000),
            'Bronze Age': (-3300, -1200),
            'Iron Age': (-1200, -500),
            'Classical': (-800, 300),
            'Medieval': (500, 1500)
        }
        date_min, date_max = period_dates[period]
        dating_value = np.random.randint(date_min, date_max)
        
        artifact = {
            'artifact_id': f"ART_{site_id}_{i+1:04d}",
            'site_id': site_id,
            'artifact_type': artifact_type,
            'material': material,
            'length': round(max(1, length), 2),
            'width': round(max(1, width), 2),
            'thickness': round(max(0.5, thickness), 2),
            'weight': round(max(0.1, weight), 2),
            'gps_lat': round(gps_lat, 6),
            'gps_lon': round(gps_lon, 6),
            'stratigraphic_unit': strat_unit,
            'period': period,
            'dating_method': 'relative' if layer > 3 else 'radiocarbon',
            'dating_value': dating_value,
            'excavation_date': (datetime.now() - timedelta(days=np.random.randint(0, 180))).strftime('%Y-%m-%d'),
            'notes': f'{artifact_type.capitalize()} fragment, {material}'
        }
        
        artifacts.append(artifact)
    
    return pd.DataFrame(artifacts)

# Generate dataset
df_artifacts = generate_artifacts(200, SITE_ID)

print(f"Generated {len(df_artifacts)} artifacts")
print(f"\nArtifact types:")
print(df_artifacts['artifact_type'].value_counts())
print(f"\nPeriods:")
print(df_artifacts['period'].value_counts())

df_artifacts.head(10)

## 2. Upload Data to S3

Upload the artifact dataset to S3 for Lambda processing.

In [None]:
# Save locally first
csv_filename = f"{SITE_ID}_artifacts.csv"
df_artifacts.to_csv(csv_filename, index=False)
print(f"Saved locally: {csv_filename}")

# Upload to S3
s3_key = f"raw/{csv_filename}"
s3.upload_file(csv_filename, BUCKET_NAME, s3_key)
print(f"✓ Uploaded to s3://{BUCKET_NAME}/{s3_key}")

# Verify upload
response = s3.head_object(Bucket=BUCKET_NAME, Key=s3_key)
file_size = response['ContentLength'] / 1024
print(f"  File size: {file_size:.2f} KB")
print(f"  Last modified: {response['LastModified']}")

## 3. Invoke Lambda for Processing

Trigger Lambda function to classify and analyze artifacts.

In [None]:
# Prepare Lambda payload
payload = {
    'bucket': BUCKET_NAME,
    'key': s3_key
}

print(f"Invoking Lambda function: {LAMBDA_FUNCTION}")
print(f"Payload: {json.dumps(payload, indent=2)}")
print("\nProcessing... (this may take 30-60 seconds)\n")

# Invoke Lambda
response = lambda_client.invoke(
    FunctionName=LAMBDA_FUNCTION,
    InvocationType='RequestResponse',
    Payload=json.dumps(payload)
)

# Parse response
response_payload = json.loads(response['Payload'].read())
print("Lambda Response:")
print(json.dumps(response_payload, indent=2))

if response_payload['statusCode'] == 200:
    body = response_payload['body']
    print(f"\n✓ Successfully processed {body['artifacts_processed']} artifacts")
    print(f"  Summary file: {body['summary_file']}")
    print(f"  Processed file: {body['processed_file']}")
else:
    print(f"\n✗ Processing failed: {response_payload.get('body', {}).get('error', 'Unknown error')}")

## 4. Query Results from DynamoDB

Retrieve classified artifacts from DynamoDB.

In [None]:
# Query DynamoDB table
table = dynamodb.Table(TABLE_NAME)

print(f"Querying DynamoDB table: {TABLE_NAME}")

# Scan all items (for small datasets)
response = table.scan()
items = response['Items']

# Handle pagination
while 'LastEvaluatedKey' in response:
    response = table.scan(ExclusiveStartKey=response['LastEvaluatedKey'])
    items.extend(response['Items'])

print(f"✓ Retrieved {len(items)} artifacts from DynamoDB")

# Convert to DataFrame
df_results = pd.DataFrame(items)

# Convert numeric columns
numeric_cols = ['length', 'width', 'thickness', 'weight', 'gps_lat', 'gps_lon', 
                'dating_value', 'classification_confidence', 'l_w_ratio', 
                'thickness_index', 'shape_index']
for col in numeric_cols:
    if col in df_results.columns:
        df_results[col] = pd.to_numeric(df_results[col], errors='coerce')

df_results.head()

## 5. Download Processing Summary

Download and examine the comprehensive analysis summary from S3.

In [None]:
# Download summary JSON
summary_key = f"processed/{SITE_ID}_summary.json"

try:
    response = s3.get_object(Bucket=BUCKET_NAME, Key=summary_key)
    summary = json.loads(response['Body'].read())
    
    print("Processing Summary:")
    print("=" * 80)
    print(f"Site ID: {summary['site_id']}")
    print(f"Processing Date: {summary['processing_date']}")
    print(f"Artifacts Processed: {summary['artifacts_processed']}")
    print(f"\nSpatial Analysis:")
    print(f"  Center: {summary['spatial_analysis']['center_of_mass']['latitude']:.6f}, "
          f"{summary['spatial_analysis']['center_of_mass']['longitude']:.6f}")
    print(f"\nChronological Range:")
    print(f"  {summary['chronological_analysis']['date_range']['earliest']} to "
          f"{summary['chronological_analysis']['date_range']['latest']} "
          f"(span: {summary['chronological_analysis']['date_range']['span_years']} years)")
    print(f"\nArtifact Counts:")
    for artifact_type, count in summary['artifact_counts']['by_type'].items():
        print(f"  {artifact_type}: {count}")
        
except Exception as e:
    print(f"Could not download summary: {e}")

## 6. Visualization: Artifact Distribution

Create visualizations of artifact distributions by type and period.

In [None]:
# Create figure with subplots
fig, axes = plt.subplots(2, 2, figsize=(15, 12))

# 1. Artifact type distribution
type_counts = df_results['artifact_type'].value_counts()
axes[0, 0].bar(type_counts.index, type_counts.values, color='steelblue')
axes[0, 0].set_xlabel('Artifact Type')
axes[0, 0].set_ylabel('Count')
axes[0, 0].set_title('Artifact Type Distribution')
axes[0, 0].tick_params(axis='x', rotation=45)

# 2. Period distribution
period_counts = df_results['period'].value_counts().sort_index()
axes[0, 1].bar(range(len(period_counts)), period_counts.values, color='coral')
axes[0, 1].set_xticks(range(len(period_counts)))
axes[0, 1].set_xticklabels(period_counts.index, rotation=45)
axes[0, 1].set_xlabel('Period')
axes[0, 1].set_ylabel('Count')
axes[0, 1].set_title('Chronological Distribution')

# 3. Material distribution
material_counts = df_results['material'].value_counts().head(10)
axes[1, 0].barh(material_counts.index, material_counts.values, color='seagreen')
axes[1, 0].set_xlabel('Count')
axes[1, 0].set_ylabel('Material')
axes[1, 0].set_title('Material Distribution (Top 10)')

# 4. Classification confidence
axes[1, 1].hist(df_results['classification_confidence'], bins=20, color='mediumpurple', edgecolor='black')
axes[1, 1].set_xlabel('Classification Confidence')
axes[1, 1].set_ylabel('Frequency')
axes[1, 1].set_title('Classification Confidence Distribution')
axes[1, 1].axvline(df_results['classification_confidence'].mean(), 
                   color='red', linestyle='--', label=f"Mean: {df_results['classification_confidence'].mean():.2f}")
axes[1, 1].legend()

plt.tight_layout()
plt.savefig('artifact_distribution.png', dpi=300, bbox_inches='tight')
print("✓ Saved: artifact_distribution.png")
plt.show()

## 7. Spatial Analysis: Geographic Distribution

Visualize the spatial distribution of artifacts across the site.

In [None]:
# Create spatial distribution plot
fig, axes = plt.subplots(1, 2, figsize=(15, 6))

# 1. All artifacts spatial distribution
scatter1 = axes[0].scatter(df_results['gps_lon'], df_results['gps_lat'], 
                          c=pd.Categorical(df_results['artifact_type']).codes,
                          cmap='viridis', alpha=0.6, s=50)
axes[0].set_xlabel('Longitude')
axes[0].set_ylabel('Latitude')
axes[0].set_title('Spatial Distribution of Artifacts')
axes[0].grid(True, alpha=0.3)

# Add legend
legend_elements = [plt.Line2D([0], [0], marker='o', color='w', 
                             markerfacecolor=plt.cm.viridis(i/5), markersize=8, label=atype)
                  for i, atype in enumerate(df_results['artifact_type'].unique())]
axes[0].legend(handles=legend_elements, loc='upper right')

# 2. Density heatmap
from scipy.stats import gaussian_kde

x = df_results['gps_lon'].values
y = df_results['gps_lat'].values

# Calculate point density
xy = np.vstack([x, y])
z = gaussian_kde(xy)(xy)

scatter2 = axes[1].scatter(x, y, c=z, s=50, cmap='hot', alpha=0.6)
axes[1].set_xlabel('Longitude')
axes[1].set_ylabel('Latitude')
axes[1].set_title('Artifact Density Heatmap')
axes[1].grid(True, alpha=0.3)
plt.colorbar(scatter2, ax=axes[1], label='Density')

plt.tight_layout()
plt.savefig('spatial_distribution.png', dpi=300, bbox_inches='tight')
print("✓ Saved: spatial_distribution.png")
plt.show()

# Calculate spatial statistics
print("\nSpatial Statistics:")
print(f"  Center of mass: {df_results['gps_lat'].mean():.6f}, {df_results['gps_lon'].mean():.6f}")
print(f"  Latitude range: {df_results['gps_lat'].min():.6f} to {df_results['gps_lat'].max():.6f}")
print(f"  Longitude range: {df_results['gps_lon'].min():.6f} to {df_results['gps_lon'].max():.6f}")
print(f"  Spatial spread (std): Lat={df_results['gps_lat'].std():.6f}, Lon={df_results['gps_lon'].std():.6f}")

## 8. Morphometric Analysis

Analyze artifact measurements and morphological indices.

In [None]:
# Create morphometric analysis plots
fig, axes = plt.subplots(2, 3, figsize=(18, 12))

# 1. Length vs Width scatter
for artifact_type in df_results['artifact_type'].unique():
    data = df_results[df_results['artifact_type'] == artifact_type]
    axes[0, 0].scatter(data['length'], data['width'], label=artifact_type, alpha=0.6)
axes[0, 0].set_xlabel('Length (mm)')
axes[0, 0].set_ylabel('Width (mm)')
axes[0, 0].set_title('Length vs Width by Type')
axes[0, 0].legend()
axes[0, 0].grid(True, alpha=0.3)

# 2. Length/Width ratio distribution
for artifact_type in df_results['artifact_type'].unique():
    data = df_results[df_results['artifact_type'] == artifact_type]
    axes[0, 1].hist(data['l_w_ratio'], alpha=0.5, label=artifact_type, bins=15)
axes[0, 1].set_xlabel('Length/Width Ratio')
axes[0, 1].set_ylabel('Frequency')
axes[0, 1].set_title('L/W Ratio Distribution')
axes[0, 1].legend()

# 3. Thickness index
axes[0, 2].boxplot([df_results[df_results['artifact_type'] == t]['thickness_index'].dropna() 
                    for t in df_results['artifact_type'].unique()],
                   labels=df_results['artifact_type'].unique())
axes[0, 2].set_xlabel('Artifact Type')
axes[0, 2].set_ylabel('Thickness Index')
axes[0, 2].set_title('Thickness Index by Type')
axes[0, 2].tick_params(axis='x', rotation=45)

# 4. Weight distribution
for artifact_type in df_results['artifact_type'].unique():
    data = df_results[df_results['artifact_type'] == artifact_type]
    axes[1, 0].hist(data['weight'], alpha=0.5, label=artifact_type, bins=15)
axes[1, 0].set_xlabel('Weight (g)')
axes[1, 0].set_ylabel('Frequency')
axes[1, 0].set_title('Weight Distribution')
axes[1, 0].set_yscale('log')
axes[1, 0].legend()

# 5. Shape index
axes[1, 1].boxplot([df_results[df_results['artifact_type'] == t]['shape_index'].dropna() 
                    for t in df_results['artifact_type'].unique()],
                   labels=df_results['artifact_type'].unique())
axes[1, 1].set_xlabel('Artifact Type')
axes[1, 1].set_ylabel('Shape Index')
axes[1, 1].set_title('Shape Index by Type')
axes[1, 1].tick_params(axis='x', rotation=45)

# 6. PCA of morphometric variables
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA

morph_cols = ['length', 'width', 'thickness', 'weight', 'l_w_ratio', 'thickness_index']
X = df_results[morph_cols].dropna()
X_types = df_results.loc[X.index, 'artifact_type']

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

pca = PCA(n_components=2)
X_pca = pca.fit_transform(X_scaled)

for artifact_type in X_types.unique():
    mask = X_types == artifact_type
    axes[1, 2].scatter(X_pca[mask, 0], X_pca[mask, 1], label=artifact_type, alpha=0.6)
axes[1, 2].set_xlabel(f'PC1 ({pca.explained_variance_ratio_[0]:.1%} variance)')
axes[1, 2].set_ylabel(f'PC2 ({pca.explained_variance_ratio_[1]:.1%} variance)')
axes[1, 2].set_title('PCA of Morphometric Variables')
axes[1, 2].legend()
axes[1, 2].grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('morphometric_analysis.png', dpi=300, bbox_inches='tight')
print("✓ Saved: morphometric_analysis.png")
plt.show()

## 9. Chronological Analysis

Analyze temporal patterns in the artifact assemblage.

In [None]:
# Create chronological analysis plots
fig, axes = plt.subplots(2, 2, figsize=(15, 12))

# 1. Timeline of all artifacts
for artifact_type in df_results['artifact_type'].unique():
    data = df_results[df_results['artifact_type'] == artifact_type]
    axes[0, 0].scatter(data['dating_value'], [artifact_type] * len(data), alpha=0.6)
axes[0, 0].set_xlabel('Date (years)')
axes[0, 0].set_ylabel('Artifact Type')
axes[0, 0].set_title('Chronological Distribution of Artifacts')
axes[0, 0].axvline(0, color='red', linestyle='--', alpha=0.5, label='Present')
axes[0, 0].legend()
axes[0, 0].grid(True, alpha=0.3)

# 2. Artifact counts by period
period_type = df_results.groupby(['period', 'artifact_type']).size().unstack(fill_value=0)
period_type.plot(kind='bar', stacked=True, ax=axes[0, 1])
axes[0, 1].set_xlabel('Period')
axes[0, 1].set_ylabel('Artifact Count')
axes[0, 1].set_title('Artifact Types by Period')
axes[0, 1].tick_params(axis='x', rotation=45)
axes[0, 1].legend(title='Type', bbox_to_anchor=(1.05, 1), loc='upper left')

# 3. Stratigraphic sequence
strat_period = df_results.groupby(['stratigraphic_unit', 'period']).size().unstack(fill_value=0)
strat_period.plot(kind='bar', stacked=True, ax=axes[1, 0])
axes[1, 0].set_xlabel('Stratigraphic Unit')
axes[1, 0].set_ylabel('Artifact Count')
axes[1, 0].set_title('Periods by Stratigraphic Unit')
axes[1, 0].tick_params(axis='x', rotation=45)
axes[1, 0].legend(title='Period')

# 4. Dating value histogram
axes[1, 1].hist(df_results['dating_value'], bins=30, color='skyblue', edgecolor='black')
axes[1, 1].set_xlabel('Date (years)')
axes[1, 1].set_ylabel('Frequency')
axes[1, 1].set_title('Temporal Distribution')
axes[1, 1].axvline(0, color='red', linestyle='--', alpha=0.5, label='Present')
axes[1, 1].legend()
axes[1, 1].grid(True, alpha=0.3, axis='y')

plt.tight_layout()
plt.savefig('chronological_analysis.png', dpi=300, bbox_inches='tight')
print("✓ Saved: chronological_analysis.png")
plt.show()

# Print chronological summary
print("\nChronological Summary:")
print(f"  Date range: {df_results['dating_value'].min()} to {df_results['dating_value'].max()}")
print(f"  Time span: {df_results['dating_value'].max() - df_results['dating_value'].min()} years")
print(f"\nPeriod counts:")
for period, count in df_results['period'].value_counts().items():
    print(f"  {period}: {count}")

## 10. Summary Statistics

Generate comprehensive summary statistics for the artifact assemblage.

In [None]:
print("ARCHAEOLOGICAL SITE ANALYSIS SUMMARY")
print("=" * 80)
print(f"\nSite ID: {SITE_ID}")
print(f"Total Artifacts: {len(df_results)}")
print(f"Analysis Date: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")

print("\n" + "=" * 80)
print("TYPOLOGICAL SUMMARY")
print("=" * 80)
print("\nArtifact Types:")
for atype, count in df_results['artifact_type'].value_counts().items():
    pct = count / len(df_results) * 100
    print(f"  {atype:15s}: {count:4d} ({pct:5.1f}%)")

print("\nMaterials (top 5):")
for material, count in df_results['material'].value_counts().head(5).items():
    pct = count / len(df_results) * 100
    print(f"  {material:15s}: {count:4d} ({pct:5.1f}%)")

print("\n" + "=" * 80)
print("CHRONOLOGICAL SUMMARY")
print("=" * 80)
print(f"\nDate Range: {df_results['dating_value'].min()} to {df_results['dating_value'].max()}")
print(f"Time Span: {df_results['dating_value'].max() - df_results['dating_value'].min():,} years")
print("\nPeriods:")
for period, count in df_results['period'].value_counts().sort_index().items():
    pct = count / len(df_results) * 100
    print(f"  {period:15s}: {count:4d} ({pct:5.1f}%)")

print("\n" + "=" * 80)
print("MORPHOMETRIC SUMMARY")
print("=" * 80)
morph_summary = df_results[['length', 'width', 'thickness', 'weight']].describe()
print("\n", morph_summary.round(2))

print("\n" + "=" * 80)
print("SPATIAL SUMMARY")
print("=" * 80)
print(f"\nCenter of Mass:")
print(f"  Latitude:  {df_results['gps_lat'].mean():.6f} (std: {df_results['gps_lat'].std():.6f})")
print(f"  Longitude: {df_results['gps_lon'].mean():.6f} (std: {df_results['gps_lon'].std():.6f})")
print(f"\nBounding Box:")
print(f"  North: {df_results['gps_lat'].max():.6f}")
print(f"  South: {df_results['gps_lat'].min():.6f}")
print(f"  East:  {df_results['gps_lon'].max():.6f}")
print(f"  West:  {df_results['gps_lon'].min():.6f}")

print("\n" + "=" * 80)
print("CLASSIFICATION QUALITY")
print("=" * 80)
print(f"\nMean Confidence: {df_results['classification_confidence'].mean():.3f}")
print(f"Std Confidence:  {df_results['classification_confidence'].std():.3f}")
print(f"Min Confidence:  {df_results['classification_confidence'].min():.3f}")
print(f"Max Confidence:  {df_results['classification_confidence'].max():.3f}")

high_conf = (df_results['classification_confidence'] > 0.8).sum()
print(f"\nHigh confidence (>0.8): {high_conf} ({high_conf/len(df_results)*100:.1f}%)")

print("\n" + "=" * 80)

## 11. Export Results

Export analysis results for further use or publication.

In [None]:
# Export to various formats

# 1. Export full dataset to CSV
output_csv = f"{SITE_ID}_analysis_results.csv"
df_results.to_csv(output_csv, index=False)
print(f"✓ Exported to CSV: {output_csv}")

# 2. Export summary to JSON
summary_dict = {
    'site_id': SITE_ID,
    'analysis_date': datetime.now().isoformat(),
    'total_artifacts': len(df_results),
    'type_distribution': df_results['artifact_type'].value_counts().to_dict(),
    'period_distribution': df_results['period'].value_counts().to_dict(),
    'date_range': {
        'min': int(df_results['dating_value'].min()),
        'max': int(df_results['dating_value'].max())
    },
    'morphometric_summary': df_results[['length', 'width', 'thickness', 'weight']].describe().to_dict(),
    'spatial_center': {
        'latitude': float(df_results['gps_lat'].mean()),
        'longitude': float(df_results['gps_lon'].mean())
    }
}

output_json = f"{SITE_ID}_analysis_summary.json"
with open(output_json, 'w') as f:
    json.dump(summary_dict, f, indent=2)
print(f"✓ Exported to JSON: {output_json}")

# 3. Export by type to separate CSVs
for artifact_type in df_results['artifact_type'].unique():
    type_df = df_results[df_results['artifact_type'] == artifact_type]
    output_file = f"{SITE_ID}_{artifact_type}_artifacts.csv"
    type_df.to_csv(output_file, index=False)
    print(f"✓ Exported {artifact_type}: {output_file}")

print("\n✓ All exports complete!")

## 12. Next Steps

Congratulations! You've completed a full archaeological data analysis pipeline using AWS.

### What You've Accomplished
- Generated and uploaded artifact data to S3
- Processed data with Lambda serverless functions
- Stored catalog in DynamoDB NoSQL database
- Performed comprehensive archaeological analysis
- Created publication-quality visualizations

### Recommended Next Steps

1. **Extend the analysis**:
   - Add more sites for comparative analysis
   - Include artifact images and computer vision
   - Integrate with GIS software (QGIS, ArcGIS)
   - Add radiocarbon dating calibration

2. **Scale up**:
   - Process 10,000+ artifacts from real excavations
   - Use Step Functions for complex workflows
   - Add real-time processing during excavation
   - Create team collaboration features

3. **Move to Tier 3**:
   - Production infrastructure with CloudFormation
   - High availability and disaster recovery
   - API for external data access
   - Advanced monitoring and alerting

4. **Clean up resources**:
   - Follow `cleanup_guide.md` to delete AWS resources
   - Download all results before cleanup
   - Verify costs have stopped

### Resources
- [Open Context](https://opencontext.org/) - Open archaeological data
- [tDAR](https://www.tdar.org/) - The Digital Archaeological Record
- [Archaeological Data Service](https://archaeologydataservice.ac.uk/)
- AWS Documentation: [S3](https://docs.aws.amazon.com/s3/), [Lambda](https://docs.aws.amazon.com/lambda/), [DynamoDB](https://docs.aws.amazon.com/dynamodb/)

### Questions or Issues?
- GitHub Issues: https://github.com/research-jumpstart/research-jumpstart/issues
- Tag: `archaeology`, `tier-2`, `aws`