# Military Base Centroids Calculation 🏰📍

In this notebook, we'll:
- Load the military bases CSV file.
- Parse the "Geo Shape" column, which contains a JSON string representing the polygon.
- Compute the centroid (average latitude and longitude) of the polygon.
- Save the updated dataset with two new columns: `center_lat` and `center_lon`.

[!WARNING]  
Download the data before running this notebook. Check the documentation.

## Data Loading 📥

In this cell, we load:

- **U.S. Military Bases Data:** Loaded from a CSV file.

In [1]:
import os
import pandas as pd


# ============================================================
# SETUP: Define output directory relative to this script
# ============================================================
# Get the absolute path of the directory where this script is located
# In a notebook, __file__ is not defined so we use os.getcwd() as a fallback.
try:
    BASE_DIR = os.path.dirname(os.path.abspath(__file__))
except NameError:
    BASE_DIR = os.getcwd()

# Define the folder where raw data is stored (assumed to be "../data/raw")
RAW_DIR = os.path.join(BASE_DIR, "..", "data", "raw")

# Define the folder where processed data will be saved (assumed to be "../data/processed")
PROCESSED_DIR = os.path.join(BASE_DIR, "..", "data", "processed")
os.makedirs(PROCESSED_DIR, exist_ok=True)  # Create the folder if it doesn't exist

# Build the absolute paths for military bases
output_path = os.path.join(RAW_DIR, "military_bases.csv")

# Load the military bases CSV (delimiter is semicolon)
military_df = pd.read_csv(output_path, delimiter=";")

## Calculating Centroids📍

<div class="alert alert-block alert-info">
    ⚠️ We are going to use Centroids instead of the Geo Point column because in the future we are going to use radial distance to associate UFO occurences with military bases. So, we want the reference point to be on the center and not at the entrance of the base.
</div>

In [7]:
import json
import numpy as np

# Function to compute centroid from a Geo Shape JSON string
def compute_centroid(geo_shape_str):
    """
    Given a Geo Shape string (JSON formatted) for a Polygon, 
    compute the centroid (mean of latitudes and longitudes) 
    of the first ring.
    
    Example input:
    "{""coordinates"": [[[-85.6546, 31.2341], [-85.6528, 31.2350], ...]], ""type"": ""Polygon""}"
    """
    try:
        # Parse the JSON string; sometimes extra quotes need to be handled.
        geo_obj = json.loads(geo_shape_str)
        # Assume the first element in "coordinates" is the outer ring
        coords = geo_obj["coordinates"][0]
        # Separate latitudes and longitudes:
        # Note: Geo Shape has format [lon, lat]. 
        # Note: Geo Point has format [lat, lon]. 
        lons = [pt[0] for pt in coords]
        lats = [pt[1] for pt in coords]
        # Compute simple arithmetic mean:
        centroid_lat = np.mean(lats)
        centroid_lon = np.mean(lons)
        return centroid_lat, centroid_lon
    except Exception as e:
        # If there is any error, return NaN values
        return np.nan, np.nan

In [8]:
# Apply the function to the "Geo Shape" column and create new columns.
military_df[['center_lat', 'center_lon']] = military_df['Geo Shape'].apply(
    lambda x: pd.Series(compute_centroid(x))
)

# Let's preview the updated DataFrame
print(military_df[['Site Name', 'center_lat', 'center_lon', 'Geo Point']].head())

                      Site Name  center_lat  center_lon  \
0           Allen Stagefield AL   31.231056  -85.651145   
1      Louisville Stagefield AL   31.815442  -85.651380   
2  White Sands Missile Range NM   33.073976 -106.374950   
3                   Fort Monroe  -19.630117  -19.630106   
4                MCB Camp Smith   21.386671 -157.905343   

                       Geo Point  
0  31.2309993833, -85.6506347178  
1  31.8157331822, -85.6497984957  
2  33.1594636742, -106.425696182  
3  37.0130203962, -76.3043760544  
4  21.3866284869, -157.905641308  


## Saving the Processed U.S. Military Base Data 💾

In [9]:
# Saving the updated DataFrame to a new CSV for later use
centroid_output = os.path.join(PROCESSED_DIR, "military_bases_processed.csv")

military_df.to_csv(centroid_output, index=False)

print("✅ Military bases with centroids saved to:", centroid_output)

✅ Military bases with centroids saved to: /home/lferr10/code/leovsferreira/ufo-conspiracy/notebooks/../data/processed/military_bases_processed.csv
