## C5. Geographically Weighted Regression 

**Description**  
This section conducts geographically weighted regression between hosptial/community characteristics and AI implementation level

**Purpose**  
To model and analyze spatial relationships between hosptial/community characteristics and AI implementation level 



### 1 Load necessary libraries, functions, and pre-processed data 

In [None]:
# Import necessary libraries
import numpy as np
import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt
import seaborn as sns
import contextily as ctx
import statsmodels.api as sm
from sklearn.preprocessing import StandardScaler
from scipy.spatial.distance import pdist, squareform

In [None]:
# Import functions
import sys
sys.path.append('../')
from calculate_scores import create_union_aipred_row, apply_ai_scores_to_dataframe

# Load data
AHA_master = pd.read_csv('../../data/AHA_master_external_data.csv', low_memory=False)

# Create aipred_it_union separately (your choice, works perfectly)
AHA_master['aipred_it_union'] = AHA_master.apply(create_union_aipred_row, axis=1)

# Use the apply function for all other scores
AHA_IT = apply_ai_scores_to_dataframe(AHA_master)
AHA_IT = AHA_IT[AHA_IT['id_it'].notna()]


In [None]:
import os
os.environ['SHAPE_RESTORE_SHX'] = 'YES'
states = gpd.read_file('../../temp_shp/cb_2018_us_state_500k.shp')

In [None]:
# Remove rows with missing or invalid coordinates
AHA_IT= AHA_IT.dropna(subset=['lat_as', 'long_as'])

# Filter out invalid coordinates
valid_coords = (
    (AHA_master['lat_as'] != 0) & 
    (AHA_master['long_as'] != 0) &
    (AHA_master['lat_as'] >= -90) & 
    (AHA_master['lat_as'] <= 90) &
    (AHA_master['long_as'] >= -180) & 
    (AHA_master['long_as'] <= 180)
)
AHA_IT = AHA_IT[valid_coords]

print(f"Number of hospitals with valid coordinates: {len(AHA_IT)}")

# Create GeoDataFrame
hospitals = gpd.GeoDataFrame(
    AHA_IT, 
    geometry=gpd.points_from_xy(AHA_IT.long_as, AHA_IT.lat_as),
    crs="EPSG:4326"
)
print(f"Successfully created GeoDataFrame with {len(hospitals)} hospitals")

### 2 Data engineering 

These hospital characteristics were selected based on investigator consensus, and we used LASSO regression analysis to explore and identify additional variables that predict AI/ML implementation and reflect hospital resource levels.

- **rural_urban_type** : collected from AHA survey. categorized into {1: rural, 2: micro, 3: metro} based on the location of the hospital ('CBSATYPE')
- **system member** : hospital belonging to a corporate body that owns or manage health provider facilities or health-related subsidiaries. ('MHSMEMB')
- **delivery_system** : delivery system identified using existing theory and AHA Annual Survey data {1: Centralized Health System, 2: Centralized Physician/Insurance Health System, 3: Moderately Centralized Health System, 4: Decentralized Health System, 5: Independent Hospital System, 6/Missing: Insufficient data to determine} ('CLUSTER')
- **community_hospital** : all nonfederal, short-term general, and special hospitals whose facilities and services are available to the public {0: No, 1: Yes}('CHC')
- **subsidary_hospital** : Hospital itself operates subsidiary corporation {0: No, 1: Yes} ('SUBS')
- **frontline_hospital** : Frontline facility {0: No, 1: Yes} ('FRTLN')
- **joint_commission_accreditaion** : Accreditation by joint commision {0: No, 1: Yes} ('MAPP1')
- **center_quality** : Center for Improvement in Healthcare Quality Accreditation {0: No, 1: Yes} ('MAPP22')
- **teaching_hospital** : major teaching hospital ('MAPP8'), minor teaching hospital ('MAPP3' or 'MAPP5')
- **critical_access** critical access hospital {0: No, 1: Yes} ('MAPP18')
- **rural_referral** : rural referral center {0: No, 1: Yes} ('MAPP19')
- **ownership_type** : type of organization responsible for establishing policy concerning overall operation {government_federal, government_nonfederal, nonprofit, forprofit, other} ('CNTRL')
- **bedsize** : bed-size category, ordinal variable ('BSC')
- **medicare_ipd_percentage** : medicare inpatient days / total inpatient days. Proxy variable to reflect the proportion of medicare patient 
- **medicaid_ipd_percentage** : medicaid inpatient days / total inpatient days. Proxy variable to reflect the proportion of medicaid patients 
- **core_index** : summary measure to track the interoperability of US hospitals (https://doi.org/10.1093/jamia/ocae289)
- **friction_index** : summary measures to track the barrier or difficulty in interoperability between hospitals (https://doi.org/10.1093/jamia/ocae289)


In [None]:
## rural_urban_type
# Continue with CBSA type and other variables
AHA_IT['rural_urban_type'] = AHA_IT['cbsatype_as'].map({
    'Rural': 1,      # Rural = 1 (lowest)
    'Micro': 2,      # Micropolitan = 2 (middle)
    'Metro': 3       # Metropolitan = 3 (highest)
})

## system_member
# Create new column 'system_member' based on the conditions
AHA_IT['system_member'] = AHA_IT['mhsmemb_as'].copy()
# Set to 1 where sysid_as is not null and mhsmemb_as is null
AHA_IT.loc[(AHA_IT['sysid_as'].notna()) & (AHA_IT['mhsmemb_as'].isna()), 'system_member'] = 1
# Convert all remaining null values to 0
AHA_IT['system_member'] = AHA_IT['system_member'].fillna(0)

## AHA System Cluster Code - delivery_system
AHA_IT['delivery_system'] = AHA_IT['cluster_as']

## community_hospital
AHA_IT['community_hospital'] = AHA_IT['chc_as'].replace(2, 0)

## subsidary_hospital
AHA_IT['subsidary_hospital'] = AHA_IT['subs_as']

## frontline_hospital
AHA_IT['frontline_hospital'] = AHA_IT['frtln_as'].replace('.', 0)

## joint_commission_accreditation
AHA_IT['joint_commission_accreditation'] = AHA_IT['mapp1_as'].replace(2,0)

## center_quality
AHA_IT['center_quality'] = AHA_IT['mapp22_as'].replace(2,0)

# teaching hospitals 
AHA_IT['teaching_hospital'] = ((AHA_IT['mapp5_as'] == 1) | (AHA_IT['mapp3_as'] == 1) | (AHA_IT['mapp8_as'] == 1)).astype(int)
AHA_IT['major_teaching_hospital'] = ((AHA_IT['mapp8_as'] == 1)).astype(int)
AHA_IT['minor_teaching_hospital'] = (((AHA_IT['mapp5_as'] == 1) | (AHA_IT['mapp3_as'] == 1))&~(AHA_IT['mapp8_as'] == 1)).astype(int)

# critical access hospital
AHA_IT['critical_access'] = (AHA_IT['mapp18_as'] == 1).astype(int)


# rural referral center 
AHA_IT['rural_referral'] = (AHA_IT['mapp19_as'] == 1).astype(int)

# medicare medicaid percentage
AHA_IT['medicare_ipd_percentage'] = AHA_IT['mcripd_as'] / AHA_IT['ipdtot_as'] * 100
AHA_IT['medicaid_ipd_percentage'] = AHA_IT['mcdipd_as'] / AHA_IT['ipdtot_as'] * 100

# bed size 
AHA_IT['bedsize'] = AHA_IT['bsc_as'].astype(int)


In [None]:
# hospital ownership type 

AHA_IT['nonfederal_governement'] = ((AHA_IT['cntrl_as'] == 12) | (AHA_IT['cntrl_as'] == 13)|(AHA_master['cntrl_as'] == 14) | (AHA_master['cntrl_as'] == 15)| (AHA_master['cntrl_as'] == 16)).astype(int)
AHA_IT['non_profit_nongovernment'] = ((AHA_IT['cntrl_as'] == 21) | (AHA_IT['cntrl_as'] == 23)).astype(int)
AHA_IT['for_profit'] = ((AHA_IT['cntrl_as'] == 31) | (AHA_IT['cntrl_as'] == 32) | (AHA_master['cntrl_as'] == 33)).astype(int)
AHA_IT['federal_governement'] = ((AHA_IT['cntrl_as'] == 40) | (AHA_IT['cntrl_as'] == 44) | (AHA_IT['cntrl_as'] == 45) | (AHA_IT['cntrl_as'] == 46) | (AHA_master['cntrl_as'] == 47) | (AHA_master['cntrl_as'] == 48)).astype(int)
# Create a categorical column for hospital ownership types
def create_ownership_category(row):
    if row['cntrl_as'] in [12, 13, 14, 15, 16]:
        return 'nonfederal_government'
    elif row['cntrl_as'] in [21, 23]:
        return 'non_profit_nongovernment'
    elif row['cntrl_as'] in [31, 32, 33]:
        return 'for_profit'
    elif row['cntrl_as'] in [40, 44, 45, 46, 47, 48]:
        return 'federal_government'
    else:
        return 'other'

# Create the categorical column
AHA_IT['ownership_type'] = AHA_IT.apply(create_ownership_category, axis=1)


In [None]:

valid_geo_hospitals = AHA_IT.dropna(subset=['long_as', 'lat_as'])
# Create a GeoDataFrame
hospitals_gdf = gpd.GeoDataFrame(
    valid_geo_hospitals, 
    geometry=gpd.points_from_xy(valid_geo_hospitals.long_as, valid_geo_hospitals.lat_as),
    crs="EPSG:4326" #geographic coordinate system using latitude and longitude
)

# Convert to a projected CRS for accurate distance calculations
geo_hospitals_gdf_projected
 = hospitals_gdf.to_crs(epsg=5070) # projected coordinate system using flat, 2D plane to represent Earth's surface 

### 3 Run GWR 

In [None]:
from sklearn.experimental import enable_iterative_imputer  # noqa
from sklearn.impute import IterativeImputer
from sklearn.preprocessing import StandardScaler
# 1. Prepare coordinates and target variable
coords = np.array(geo_hospitals_gdf_projected[['long_as', 'lat_as']])
y = geo_hospitals_gdf_projected['ai_base_score_imputed'].values

X_data = pd.DataFrame(index=geo_hospitals_gdf_projected.index)

X_data['subsidary_hospital'] = geo_hospitals_gdf_projected['subsidary_hospital'].astype(float)
X_data['frontline_hospital'] = geo_hospitals_gdf_projected['frontline_hospital'].astype(float)
X_data['joint_commission_accreditation'] = geo_hospitals_gdf_projected['joint_commission_accreditation'].astype(float)
X_data['delivery_system'] = geo_hospitals_gdf_projected['delivery_system'].astype(float)
X_data['teaching_hospital'] = geo_hospitals_gdf_projected['teaching_hospital'].astype(float)
X_data['critical_access'] = geo_hospitals_gdf_projected['critical_access'].astype(float)
X_data['rural_referral'] = geo_hospitals_gdf_projected['rural_referral'].astype(float)
X_data['for_profit'] = (geo_hospitals_gdf_projected['ownership_type'] == 'for_profit').astype(float)
X_data['bedsize'] = geo_hospitals_gdf_projected['bedsize'].astype(float)
X_data['medicare_ipd_percentage'] = geo_hospitals_gdf_projected['medicare_ipd_percentage'].astype(float)
X_data['medicaid_ipd_percentage'] = geo_hospitals_gdf_projected['medicaid_ipd_percentage'].astype(float)
X_data['core_index'] = geo_hospitals_gdf_projected['core_index'].astype(float)
X_data['friction_index'] = geo_hospitals_gdf_projected['friction_index'].astype(float)

# 2. Apply MICE imputation
imputer = IterativeImputer(max_iter=10, random_state=42, sample_posterior=False)
X_imputed = pd.DataFrame(imputer.fit_transform(X_data), columns=X_data.columns, index=X_data.index)

# 3. Standardize continuous variables (not binary ones)
continuous_vars = [
    'delivery_system', 'bedsize',
    'medicare_ipd_percentage', 'medicaid_ipd_percentage',
    'core_index', 'friction_index'
]
scaler = StandardScaler()
X_imputed[continuous_vars] = scaler.fit_transform(X_imputed[continuous_vars])

# Final feature matrix for GWR
X_data = X_imputed

In [None]:
def simple_gwr_robust(coords, y, X, bandwidth):
    """
    Simple GWR with better numerical stability
    """
    n = len(coords)
    p = X.shape[1]
    results = np.zeros((n, p))
    
    for i in range(n):
        # Calculate distances
        dists = np.sqrt(np.sum((coords - coords[i])**2, axis=1))
        
        # Calculate weights - use larger sigma to prevent numerical issues
        weights = np.exp(-0.5 * (dists / bandwidth)**2)
        
        # Only use observations with reasonable weights
        threshold = 0.001  # Minimum meaningful weight
        mask = weights >= threshold
        
        if np.sum(mask) < p:  # Not enough observations
            results[i] = np.nan
            continue
        
        # Subset data
        X_local = X[mask]
        y_local = y[mask]
        w_local = weights[mask]
        
        # Weighted least squares with regularization
        W = np.diag(w_local)
        XtW = X_local.T @ W
        XtWX = XtW @ X_local
        
        # Add small regularization to prevent singularity
        reg_param = 1e-6 * np.trace(XtWX) / p
        XtWX_reg = XtWX + reg_param * np.eye(p)
        
        try:
            beta = np.linalg.solve(XtWX_reg, XtW @ y_local)
            results[i] = beta
        except np.linalg.LinAlgError:
            results[i] = np.nan
    
    return results

# Updated workflow
def run_gwr(X_data, continuous_vars, selected_bandwidth):
    """Complete improved GWR workflow with data preparation"""
    
    scaler = StandardScaler()
    X_data[continuous_vars] = scaler.fit_transform(X_data[continuous_vars])
    
    # Add constant for intercept
    X_data['const'] = 1.0
    
    print(f"Feature matrix shape: {X_data.shape}")
    print(f"Variables: {list(X_data.columns)}")
    
    # Convert to array
    X_array = X_data.values.astype(float)
    
    
    # 5. Run final GWR with optimal bandwidth
    print(f"\n  Running GWR with selected bandwidth ({selected_bandwidth:.3f})...")
    final_params = simple_gwr_robust(coords, y, X_array, selected_bandwidth)
    
    # 6. Create results DataFrame
    key_vars = [col for col in X_data.columns if col != 'const']
    var_coeffs = pd.DataFrame()
    
    for var in key_vars:
        col_idx = list(X_data.columns).index(var)
        var_coeffs[var] = final_params[:, col_idx]
    
    # Add coordinates
    var_coeffs['longitude'] = coords[:, 0]
    var_coeffs['latitude'] = coords[:, 1]
    
    # Print final results summary
    print(f"\n Final Results Summary:")
    print(f"   Optimal bandwidth: {selected_bandwidth:.3f} degrees")
    total_valid = np.sum(np.all(np.isfinite(final_params), axis=1))
    print(f"   Valid observations: {total_valid}/{len(coords)} ({100*total_valid/len(coords):.1f}%)")
    
    for var in key_vars:
        coeffs = var_coeffs[var].dropna()
        if len(coeffs) > 0:
            print(f"   {var}: Range=[{coeffs.min():.4f}, {coeffs.max():.4f}], SD={coeffs.std():.4f}")
    
    return var_coeffs, selected_bandwidth, X_data




In [None]:
def simple_gwr_robust(coords, y, X, bandwidth):
    n = len(coords)
    p = X.shape[1]
    results = np.full((n, p), np.nan, dtype=float)

    y = np.asarray(y, dtype=float).ravel()    # <-- (n,) 보장
    X = np.asarray(X, dtype=float)
    coords = np.asarray(coords, dtype=float)

    for i in range(n):
        dists = np.sqrt(np.sum((coords - coords[i])**2, axis=1))
        weights = np.exp(-0.5 * (dists / bandwidth)**2)

        threshold = 1e-3
        mask = weights >= threshold
        if np.sum(mask) < p:
            continue

        X_local = X[mask]
        y_local = y[mask]
        w_local = weights[mask]

        W = np.diag(w_local)
        XtW = X_local.T @ W
        XtWX = XtW @ X_local

        reg_param = 1e-6 * (np.trace(XtWX) / p + 1e-12)
        XtWX_reg = XtWX + reg_param * np.eye(p)

        try:
            beta = np.linalg.solve(XtWX_reg, XtW @ y_local.reshape(-1, 1)) 
            results[i] = beta.ravel() 
        except np.linalg.LinAlgError:
            pass

    return results


In [None]:
import matplotlib.pyplot as plt
import numpy as np
from matplotlib.colors import TwoSlopeNorm
import matplotlib.patches as mpatches

def plot_gwr_results_large(var_coeffs, states_gdf, figsize=(25, 30), point_size=80):
    """
    Large format GWR plotting with continental US state boundaries - SINGLE FIGURE ONLY
    
    Parameters:
    -----------
    var_coeffs : DataFrame
        GWR results with coefficient columns + longitude/latitude
    states_gdf : GeoDataFrame
        US state boundaries (your loaded states variable)
    figsize : tuple
        Figure size - make it large for better visibility
    point_size : int
        Size of scatter points
    """
    
    # Filter for continental US only
    exclude_states = ['AK', 'HI', 'Alaska', 'Hawaii', 'PR', 'Puerto Rico', 
                     'VI', 'Virgin Islands', 'GU', 'Guam', 'AS', 'American Samoa', 
                     'MP', 'Northern Mariana Islands']
    
    state_col = None
    for col in ['STUSPS', 'STATE_ABBR', 'ABBR', 'NAME', 'STATE_NAME', 'STATE']:
        if col in states_gdf.columns:
            state_col = col
            break
    
    if state_col:
        continental_states = states_gdf[~states_gdf[state_col].isin(exclude_states)]
        print(f"Filtering states using column '{state_col}', kept {len(continental_states)} states")
    else:
        continental_states = states_gdf.cx[-130:-65, 20:50]
        print("Filtering by geographic bounds (no state name column found)")
    
    # Get variable names (exclude coordinates)
    variables = [col for col in var_coeffs.columns if col not in ['longitude', 'latitude']]
    n_vars = len(variables)
    
    # Remove rows with NaN coefficients
    plot_data = var_coeffs.dropna()
    
    print(f"Plotting {len(plot_data)} valid observations")
    
    # Use 2 columns for better visibility
    n_cols = 2
    n_rows = (n_vars + n_cols - 1) // n_cols
    
    # Create ONE single figure
    fig, axes = plt.subplots(n_rows, n_cols, figsize=figsize)
    if n_vars == 1:
        axes = [axes] if n_cols == 1 else axes.reshape(1, -1)
    elif n_rows == 1:
        axes = axes.reshape(1, -1)
    
    fig.suptitle('GWR Spatial Coefficients - Local Variation Analysis', 
                 fontsize=24, fontweight='bold', y=0.98)
    
    for i, var in enumerate(variables):
        row = i // n_cols
        col = i % n_cols
        ax = axes[row, col] if n_rows > 1 else axes[col]
        
        # Get data
        coeffs = plot_data[var].values
        x = plot_data['longitude'].values
        y = plot_data['latitude'].values
        
        # Color normalization (centered at 0)
        vmax = np.abs(coeffs).max()
        if vmax == 0:
            vmax = 1
        norm = TwoSlopeNorm(vmin=-vmax, vcenter=0, vmax=vmax)
        
        # Add continental US state boundaries only
        continental_states.boundary.plot(ax=ax, color='black', linewidth=0.8, alpha=0.6)
        
        # Set axis limits to continental US
        ax.set_xlim(-130, -65)
        ax.set_ylim(20, 50)
        
        # Change the color normalization part
        # Instead of using the maximum absolute value, we'll set fixed limits
        norm = TwoSlopeNorm(vmin=-1, vcenter=0, vmax=1)  # Fixed range from -1 to 1

        # The rest of the scatter plot remains the same
        scatter = ax.scatter(x, y, c=coeffs, cmap='RdBu', norm=norm, 
                    s=point_size, alpha=0.8, edgecolors='black', linewidth=0.8, zorder=5)
        
        # Clean formatting - no overlapping text
        ax.set_title(f'Spatial Variation: {var}', fontweight='bold', fontsize=18, pad=20)
        ax.set_xlabel('Longitude', fontsize=14)
        ax.set_ylabel('Latitude', fontsize=14)
        ax.grid(True, alpha=0.3, linewidth=0.5)
        ax.tick_params(labelsize=12)
        
        # Add colorbar
        cbar = plt.colorbar(scatter, ax=ax, shrink=0.8, aspect=30, pad=0.02)
        cbar.set_label('Coefficient Value', fontsize=12, labelpad=15)
        cbar.ax.tick_params(labelsize=10)
        
    
    # Hide empty subplots if any
    if n_vars < n_rows * n_cols:
        for i in range(n_vars, n_rows * n_cols):
            row = i // n_cols
            col = i % n_cols
            ax = axes[row, col] if n_rows > 1 else axes[col]
            ax.set_visible(False)
    
    plt.tight_layout(rect=[0, 0, 1, 0.96])
    return fig

In [None]:
continuous_vars = ['delivery_system', 'bedsize', 'medicare_ipd_percentage', 'medicaid_ipd_percentage', 
                      'core_index', 'friction_index']
# Load your state boundaries first:
# import os
os.environ['SHAPE_RESTORE_SHX'] = 'YES'  
states = gpd.read_file('../../../../../data/map_data/state_boundary.shp')

# Create ONE single figure with all variables
var_coeffs, selected_bandwidth, X_data = run_gwr(X_data, continuous_vars, 2)

In [None]:

fig = plot_gwr_results_large(var_coeffs, states, figsize=(25, 30))
plt.show()


####  3.2 Conduct GWR for geospatial/community characteristics 

In [None]:
# Replace invalid SVI values with median of valid values
svi_columns = ['svi_themes_median', 'svi_theme1_median', 'svi_theme2_median', 
              'svi_theme3_median', 'svi_theme4_median']

for col in svi_columns:
    # Identify valid values (0-1 range)
    valid_mask = (geo_hospitals_gdf_projected[col] >= 0) & (geo_hospitals_gdf_projected[col] <= 1)
    
    # Calculate median of valid values
    valid_median = geo_hospitals_gdf_projected.loc[valid_mask, col].median()
    
    # Replace invalid values with the median
    invalid_mask = ~valid_mask
    geo_hospitals_gdf_projected.loc[invalid_mask, col] = valid_median
    
    print(f"Fixed {col}:")
    print(f"  Replaced {invalid_mask.sum()} invalid values with median {valid_median:.4f}")
    print(f"  New range: {geo_hospitals_gdf_projected[col].min():.4f} to {geo_hospitals_gdf_projected[col].max():.4f}")

In [None]:
# 1. Prepare coordinates and target variable
coords = np.array(geo_hospitals_gdf_projected[['long_as', 'lat_as']])
y = geo_hospitals_gdf_projected['ai_base_score_imputed'].values
    
# 2. Prepare feature matrix (same as your original code)
print("Preparing feature matrix...")
X_data2 = pd.DataFrame(index=geo_hospitals_gdf_projected.index)
    
# Hospital characteristics missing value imputation 
X_data2['svi_themes_median'] = geo_hospitals_gdf_projected['svi_themes_median'].fillna(geo_hospitals_gdf_projected['svi_themes_median'].median())
X_data2['svi_theme1_median'] = geo_hospitals_gdf_projected['svi_theme1_median'].fillna(geo_hospitals_gdf_projected['svi_theme1_median'].median())
X_data2['svi_theme2_median'] = geo_hospitals_gdf_projected['svi_theme2_median'].fillna(geo_hospitals_gdf_projected['svi_theme2_median'].median())    
X_data2['svi_theme3_median'] = geo_hospitals_gdf_projected['svi_theme3_median'].fillna(geo_hospitals_gdf_projected['svi_theme3_median'].median())
X_data2['svi_theme4_median'] = geo_hospitals_gdf_projected['svi_theme4_median'].fillna(geo_hospitals_gdf_projected['svi_theme4_median'].median())
X_data2['national_adi_median'] = geo_hospitals_gdf_projected['national_adi_median'].fillna(geo_hospitals_gdf_projected['national_adi_median'].median())
X_data2['mean_primary_hpss'] = geo_hospitals_gdf_projected['mean_primary_hpss'].fillna(0)
X_data2['mean_dental_hpss'] = geo_hospitals_gdf_projected['mean_dental_hpss'].fillna(0)
X_data2['mean_mental_hpss'] = geo_hospitals_gdf_projected['mean_mental_hpss'].fillna(0)
X_data2['mean_mua_shortage'] = geo_hospitals_gdf_projected['mean_mua_shortage'].fillna(0)
X_data2['mean_mua_elders_shortage'] = geo_hospitals_gdf_projected['mean_mua_elders_shortage'].fillna(0)
X_data2['mean_mua_infant_shortage'] = geo_hospitals_gdf_projected['mean_mua_infant_shortage'].fillna(0)
X_data2['rural_urban_type'] = geo_hospitals_gdf_projected['rural_urban_type'].fillna(geo_hospitals_gdf_projected['rural_urban_type'].median())
X_data2['Device_Percent'] = geo_hospitals_gdf_projected['Device_Percent'].fillna(geo_hospitals_gdf_projected['Device_Percent'].median())
X_data2['Broadband_Percent'] = geo_hospitals_gdf_projected['Broadband_Percent'].fillna(geo_hospitals_gdf_projected['Broadband_Percent'].median())
X_data2['Internet_Percent'] = geo_hospitals_gdf_projected['Internet_Percent'].fillna(geo_hospitals_gdf_projected['Internet_Percent'].median())
    
# Standardize continuous variables (except binary variables)
continuous_vars = ['svi_themes_median', 'svi_theme1_median', 'svi_theme2_median', 'svi_theme3_median', 'svi_theme4_median', 'national_adi_median', 'Device_Percent', 'Broadband_Percent', 'Internet_Percent', 'mean_primary_hpss', 'mean_dental_hpss', 'mean_mental_hpss', 'mean_mua_score', 'mean_mua_elder_score', 'mean_mua_infant_score', 'rural_urban_type']
scaler = StandardScaler()
X_data2[continuous_vars] = scaler.fit_transform(X_data2[continuous_vars])

# 3. Apply MICE imputation
imputer = IterativeImputer(max_iter=10, random_state=42, sample_posterior=False)
X_data2 = pd.DataFrame(imputer.fit_transform(X_data2), columns=X_data2.columns, index=X_data2.index)

 # 4. Run GWR
var_coeffs, selected_bandwidth, X_data2 = run_gwr(X_data2, continuous_vars, 2)