## NASA Center Analysis
<details>
<summary>VARIABLES</summary>

| Variable Name       | Long Name                                          | Variable Category | Units     | Description                                                                                                                                                          |
| ------------------- | -------------------------------------------------- | ----------------- | --------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| tmax_days_35C       | \# days Tmax ≥35°C                                 | extreme index     | \# days   | Number of days, per year, with Tmax >=35C                                                                                                                            |
| tmax_days_90th      | \# days Tmax ≥90th Percentile                      | extreme index     | \# days   | Number of days, per year, with Tmax 90th percentile. 90th percentile calculated using all daily tmax values from 1995-2014.                                          |
| tmax_days_95th      | \# days Tmax ≥95th Percentile                      | extreme index     | \# days   | Number of days, per year, with Tmax 95th percentile. 95th percentile calculated using all daily tmax values from 1995-2014.                                          |
| tmax_days_99th      | \# days Tmax ≥99th Percentile                      | extreme index     | \# days   | Number of days, per year, with Tmax 99th percentile. 99th percentile calculated using all daily tmax values from 1995-2014.                                          |
| Hottest_Tmax        | Hottest Tmax of the Year (°C)                      | extreme index     | degrees C | Hottest Tmax value every year                                                                                                                                        |
| Max_DTR             | Largest Diurnal Temperature Range of the Year (°C) | extreme index     | degrees C | largest diurnal temperature range (tmax minus tmin) each year                                                                                                        |
| tmin_tropnights_20C | \# days Tmin ≥20°C                                 | extreme index     | \# days   | Number of days, per year with Tmin >=20C                                                                                                                             |
| tmin_frostdays_0C   | \# days Tmin ≤0°C                                  | extreme index     | \# days   | Number of days per year with Tmin <=0C                                                                                                                               |
| Coldest_Tmin        | Coldest Tmin of the Year (°C)                      | extreme index     | degrees C | Coldest minimum temperature each year                                                                                                                                |
| prec_days_dry       | \# days with precipitation ≤0.001 in               | extreme index     | \# days   | Number of days, per year, where precipitation <=1e-3 inches                                                                                                          |
| prec_days_oneinch   | \# days with precipitation ≥1 in                   | extreme index     | \# days   | Number of days, per year, where precipitation >=1 inch                                                                                                               |
| prec_days_90th      | \# days with precipitation ≥90th Percentile        | extreme index     | \# days   | Number of days, per year, where precipitation >=90th percentile. 90th percentile calculated usingd all daily precipitation values (dry days EXCLUDED) from 1995-2014 |
| prec_days_95th      | \# days with precipitation ≥95th Percentile        | extreme index     | \# days   | Number of days, per year, where precipitation >=95th percentile. 95th percentile calculated usingd all daily precipitation values (dry days EXCLUDED) from 1995-2014 |
| prec_days_99th      | \# days with precipitation ≥99th Percentile        | extreme index     | \# days   | Number of days, per year, where precipitation >=99th percentile. 99th percentile calculated usingd all daily precipitation values (dry days EXCLUDED) from 1995-2014 |
| tmax_annave         | Annual Average Tmax (°C)                           | annual average    | degrees C | Annual average maximum daily temperature                                                                                                                             |
| tmin_annave         | Annual Average Tmin (°C)                           | annual average    | degrees C | Annual average minimum daily temperature                                                                                                                             |
| prec_annave         | Annual Total Precipitation (mm)                    | annual SUM        | degrees C | Annual SUM of precipitation                                                                                                                                          |</details>

In [1]:
# Imports
import os
import warnings
import zipfile
import numpy as np
import pandas as pd
import pandasql as psql

# Suppress warnings
warnings.filterwarnings('ignore')

## Initialization

In [2]:
path = 'updated_extremes.zip'  # data directory
center = 'LARC'.upper()    # NASA center to analyze
only_future = True         # flag to use only 2020-2099
ssp = ['ssp126', 'ssp245', 'ssp370'] # scenarios to use

In [3]:
# DO NOT CHANGE THIS CELL
# File name convention: variable.ssp###.CENTER.csv

# NASA Centers
centers = ['AMES', 'GSFC', 'JPL', 'KSC', 'MSFC', 'MAF', 'GISS',
           'LARC', 'SSC', 'GRC', 'WFF', 'JSC', 'WSTF', 'AFRC']

# Check if the provided center is valid
if center not in centers:
    raise ValueError(f'{center} not in {centers}')

# Variable unit: number of DAYS when... assume others are celsius
day_unit = ['days', 'tropnights']

# Time periods: 10 years before+after a decade
time_periods = {'short': (2020, 2049),  # 2030's: 2020-2029, 2030-2039, 2040-2049
                'mid':   (2040, 2069),  # 2050's: 2040-2049, 2050-2059, 2060-2069
                'long':  (2070, 2099),  # 2080's: 2070-2079, 2080-2089, 2090-2099
                }

# Get Files/Data

In [4]:
def get_files(path: str, center: str):
    '''Returns list of all csv files in the directory that contain the center and ssp name'''
    # return [os.path.join(path, f) for f in os.listdir(path) 
    #          if center in f and any(s in f for s in ssp) and f.endswith('.csv')]
    with zipfile.ZipFile(path, 'r') as zip_ref:
        files = [f.filename for f in zip_ref.filelist 
                 if center in f.filename and any(s in f.filename for s in ssp) and f.filename.endswith('.csv')]
        return files

def check_df_consistency(df_list: list[pd.DataFrame]):
    '''Returns T/F if all dataframes in the list have the same column names and index values'''
    if not df_list:
        return False
    
    # Get reference column names and index values from the first dataframe
    ref_cols, ref_index = list(df_list[0].columns), list(df_list[0].index)
    
    # Check if all other dataframes have the same column names and index values
    return all(list(df.columns) == ref_cols and list(df.index) == ref_index 
               for df in df_list[1:])

def label_term(year: int):
    '''Returns list of time period labels for a given year'''
    return [t for t, (s, e) in time_periods.items() if s <= year <= e]


def preprocess(filename: str, only_future: bool=True):
    '''Returns a preprocessed pandas DataFrame from a csv file'''
    # df = pd.read_csv(filename)
    with zipfile.ZipFile(path, 'r') as zip_ref:
        with zip_ref.open(filename) as file:
            df = pd.read_csv(file)
    name = filename.split('/')[-1][:-4].split('.')
    
    # Add new columns: term, scenario, and variable
    df.insert(0, 'term', df.Years.apply(label_term))
    df.insert(0, 'scenario', name[1])
    df.insert(0, 'variable', name[0] + ('_days' if any(d in filename for d in day_unit) else '_real'))
    
    # Explode the 'term' column (in case a year belongs to multiple terms)
    df = df.explode('term')

    # Remove rows with NaN terms if only_future is True, otherwise return all rows
    return df.dropna(subset=['term']) if only_future else df # nan's (unlabeled) assumed to be past data


def calculate_statistics(df: pd.DataFrame):
    """Calculates term-wise statistics for the given DataFrame"""
    MME = list(df.filter(regex='MME-').columns)
    models = list(df.columns.drop(['variable', 'scenario', 'term', 'Years'] + MME))
    
    df1 = df.set_index(['variable', 'scenario', 'term', 'Years'])
    df1['calc_mean'] = df1[models].mean(axis=1)
    df1['calc_median'] = df1[models].median(axis=1)
    df1['calc_pct25'] = df1[models].quantile(0.25, axis=1)
    df1['calc_pct75'] = df1[models].quantile(0.75, axis=1)
    df1['calc_pct05'] = df1[models].quantile(0.05, axis=1)
    df1['calc_pct95'] = df1[models].quantile(0.95, axis=1)
    
    df_mme = df1.filter(regex='MME-').round(5)
    df_cal = df1.filter(regex='calc_').round(5)
    df_cal.columns = df_mme.columns
    
    err = df_mme.compare(df_cal, result_names=('mme', 'recalc')).dropna(how='all')
    if len(err) > 0:
        display(err)
        raise ValueError('Calculation mismatch')

def aggregate_data(df: pd.DataFrame):
    """Aggregates data for real and days variables"""
    MME = list(df.filter(regex='MME-').columns)
    cols = ['variable', 'scenario', 'term']

    # Calculate term-wise statistics
    # For variables ending with '_days', use median
    # For variables ending with '_real', use mean
    term_MME = pd.concat([df[df.variable.str.endswith('_days')].groupby(cols)[MME].median(),
                          df[df.variable.str.endswith('_real')].groupby(cols)[MME].mean()
                         ]).reset_index().sort_values(cols, ascending=[1, 1, 0],
                                                      ignore_index=True)

    round_up_half = lambda x: np.ceil(x) if x % 1 == 0.5 else round(x)
    agg_real = (term_MME[term_MME.variable.str.endswith('_real')]
                .groupby(['variable', 'term'])
                .agg({'MME-mean': ['min', 'max']})
                .sort_values(['variable', 'term'], ascending=[1, 0]))
    agg_real.columns = ['min', 'max']
    agg_real['rounded_min'], agg_real['rounded_max'] = agg_real['min'].round(1), agg_real['max'].round(1)
    agg_real['diff'] = (agg_real['rounded_max'] - agg_real['rounded_min']).round(1)
    
    agg_days = (term_MME[term_MME.variable.str.endswith('_days')]
                .groupby(['variable', 'term'])
                .agg({'MME-median': ['min', 'max']})
                .sort_values(by=['variable', 'term'], ascending=[1, 0]))
    agg_days.columns = ['min', 'max']
    agg_days['rounded_min'] = agg_days['min'].apply(round_up_half).astype(int)
    agg_days['rounded_max'] = agg_days['max'].apply(round_up_half).astype(int)
    agg_days['diff'] = (agg_days['rounded_max'] - agg_days['rounded_min']).astype(int)
    
    return pd.concat([agg_real, agg_days], axis=0)

def get_results(path:str, center: str):
    """Writes aggregated data for a given center to an Excel file"""    
    files = sorted(get_files(path, center))
    df_list = [preprocess(f, only_future) for f in files]
    
    if not check_df_consistency(df_list):
        raise ValueError('DataFrames are inconsistent')
    
    df = pd.concat(df_list).reset_index(drop=True)
    
    # Check the number of years per time period
    years_per_term = df.groupby(['variable', 'scenario', 'term']).size().unique()
    if len(years_per_term) != 1 or years_per_term[0] != 30:
        raise ValueError(f'# of years per time period is incorrect: {years_per_term}')
    
    calculate_statistics(df)
    print(f'{len(files)} {center} files')
    # print(files[:5], '\n')
    return df, aggregate_data(df)

In [5]:
df, results = get_results(path, center)

51 LARC files


# Quality Check

In [6]:
df.groupby('variable').max()

Unnamed: 0_level_0,scenario,term,Years,ACCESS-CM2,ACCESS-ESM1-5,BCC-CSM2-MR,CESM2,CMCC-ESM2,CNRM-CM6-1,CNRM-ESM2-1,...,MPI-ESM1-2-LR,MRI-ESM2-0,NorESM2-LM,NorESM2-MM,MME-mean,MME-median,MME-pct25,MME-pct75,MME-pct05,MME-pct95
variable,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Coldest_Tmin_real,ssp370,short,2099.0,-0.931519,-3.823822,-2.503448,,-2.908722,-3.291687,-4.559418,...,-3.282806,-1.653656,-3.980804,-3.33139,-6.561266,-6.352448,-7.718903,-4.259308,-10.060699,-1.363556
Hottest_Tmax_real,ssp370,short,2099.0,47.414276,45.155914,47.664703,,45.756561,46.238922,47.965546,...,43.820068,45.541718,44.691162,47.79422,42.553859,42.776978,41.717285,43.820068,39.927338,47.414276
Max_DTR_real,ssp370,short,2099.0,34.059143,29.701355,39.230255,,31.971222,37.473969,33.203094,...,29.767059,38.054535,28.248413,32.30127,26.489332,27.029785,24.990265,29.414246,23.054901,37.861099
annmean_tmax_real,ssp370,short,2099.0,27.312395,26.693493,27.229473,,26.42729,27.77725,26.785118,...,25.850698,25.449072,25.747143,27.040831,25.558043,25.747143,25.033554,26.176758,24.429266,27.306044
annmean_tmin_real,ssp370,short,2099.0,15.580885,14.985955,15.211082,,15.626901,16.496094,15.355649,...,15.402845,14.750004,14.672589,15.639556,14.736029,14.672589,14.199707,15.153758,13.651192,16.911816
annsum_prec_real,ssp370,short,2099.0,1685.417893,1801.064932,1786.310981,1808.596818,1860.659329,1643.07245,1657.399828,...,1816.360349,1836.667256,1742.415865,1749.63661,1426.768415,1447.683173,1304.76108,1579.149605,1218.777148,1775.081698
prec_days_90th_days,ssp370,short,2099.0,42.0,48.0,40.0,41.0,39.0,35.0,39.0,...,57.0,41.0,47.0,42.0,29.5,30.5,27.25,35.0,22.05,47.65
prec_days_95th_days,ssp370,short,2099.0,28.0,27.0,22.0,25.0,29.0,21.0,25.0,...,32.0,27.0,30.0,22.0,18.0,18.5,14.0,21.75,11.0,28.7
prec_days_99th_days,ssp370,short,2099.0,9.0,12.0,10.0,8.0,9.0,8.0,9.0,...,13.0,9.0,10.0,7.0,5.5,5.0,3.25,7.5,2.05,12.0
prec_days_dry_days,ssp370,short,2099.0,122.0,107.0,165.0,192.0,168.0,189.0,183.0,...,113.0,161.0,126.0,206.0,128.590909,135.5,108.0,156.75,88.2,182.7


In [7]:
df.groupby('variable').min()

Unnamed: 0_level_0,scenario,term,Years,ACCESS-CM2,ACCESS-ESM1-5,BCC-CSM2-MR,CESM2,CMCC-ESM2,CNRM-CM6-1,CNRM-ESM2-1,...,MPI-ESM1-2-LR,MRI-ESM2-0,NorESM2-LM,NorESM2-MM,MME-mean,MME-median,MME-pct25,MME-pct75,MME-pct05,MME-pct95
variable,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Coldest_Tmin_real,ssp126,long,2020.0,-20.428391,-20.41095,-30.617416,,-19.131744,-24.557144,-22.94371,...,-19.054031,-24.631653,-15.746368,-21.732437,-13.303314,-13.215698,-15.710175,-10.944702,-23.594238,-9.756409
Hottest_Tmax_real,ssp126,long,2020.0,36.378784,34.746796,36.579468,,35.086365,34.750946,35.739655,...,34.268738,36.197693,34.3815,36.164886,37.705517,37.221222,35.866425,38.633484,33.838257,39.813721
Max_DTR_real,ssp126,long,2020.0,19.484528,20.47287,20.711517,,19.935699,20.811005,19.885498,...,19.211548,20.189728,18.38623,20.936829,23.0599,22.423767,21.014496,23.827087,19.194702,25.837036
annmean_tmax_real,ssp126,long,2020.0,21.218201,21.42486,20.746437,,19.891321,21.214272,21.130535,...,20.588789,21.272921,21.530184,21.282082,22.055873,21.920818,21.448803,22.300543,20.588789,22.887661
annmean_tmin_real,ssp126,long,2020.0,9.779553,9.857891,9.466441,,9.082039,9.659132,10.215306,...,9.791513,9.776453,9.917004,10.140264,10.833823,10.774098,10.34732,11.001023,9.539014,11.684646
annsum_prec_real,ssp126,long,2020.0,915.26863,836.970968,865.837182,886.217967,990.703695,771.679358,843.144565,...,876.891344,872.979374,758.049076,776.496225,1153.638787,1128.614293,1028.393563,1234.365352,885.312174,1307.05774
prec_days_90th_days,ssp126,long,2020.0,16.0,16.0,11.0,11.0,10.0,9.0,12.0,...,12.0,13.0,13.0,8.0,22.272727,20.5,16.5,24.75,11.2,29.0
prec_days_95th_days,ssp126,long,2020.0,5.0,4.0,3.0,3.0,4.0,4.0,4.0,...,7.0,5.0,6.0,2.0,10.863636,10.5,7.25,13.0,3.25,15.0
prec_days_99th_days,ssp126,long,2020.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,2.0,2.0,1.0,3.0,0.0,4.0
prec_days_dry_days,ssp126,long,2020.0,60.0,45.0,100.0,127.0,88.0,120.0,114.0,...,49.0,96.0,64.0,123.0,109.363636,102.5,78.5,128.75,52.75,143.9


## Calculate Change Per (variable, scenario)
- short - mid
- short - long
- mid - long

In [8]:
# query = """
# SELECT
#     a.variable,
#     a.scenario,
#     CASE
#         WHEN a.term = 'short' AND b.term = 'mid' THEN 'short-mid'
#         WHEN a.term = 'short' AND b.term = 'long' THEN 'short-long'
#         WHEN a.term = 'mid' AND b.term = 'long' THEN 'mid-long'
#     END AS term_diff,
#     b.'MME-mean' - a.'MME-mean' AS 'MME-mean',
#     b.'MME-median' - a.'MME-median' AS 'MME-median',
#     b.'MME-pct25' - a.'MME-pct25' AS 'MME-pct25',
#     b.'MME-pct75' - a.'MME-pct75' AS 'MME-pct75'
# FROM term_MME a
# JOIN term_MME b
#     ON a.variable = b.variable
#     AND a.scenario = b.scenario
#     AND (
#         (a.term = 'short' AND b.term = 'mid') OR
#         (a.term = 'short' AND b.term = 'long') OR
#         (a.term = 'mid' AND b.term = 'long')
#     )
# ORDER BY 1, 2, 3 DESC
# """

# change = psql.sqldf(query, locals())

# display(change.head(2))

# Results

In [9]:
display(results)

Unnamed: 0_level_0,Unnamed: 1_level_0,min,max,rounded_min,rounded_max,diff
variable,term,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Coldest_Tmin_real,short,-11.475012,-11.276643,-11.5,-11.3,0.2
Coldest_Tmin_real,mid,-10.754145,-10.200959,-10.8,-10.2,0.6
Coldest_Tmin_real,long,-10.46319,-7.920878,-10.5,-7.9,2.6
Hottest_Tmax_real,short,38.696575,39.078562,38.7,39.1,0.4
Hottest_Tmax_real,mid,39.179674,39.696653,39.2,39.7,0.5
Hottest_Tmax_real,long,39.126319,41.202436,39.1,41.2,2.1
Max_DTR_real,short,24.576354,24.833572,24.6,24.8,0.2
Max_DTR_real,mid,24.486292,24.680787,24.5,24.7,0.2
Max_DTR_real,long,24.232885,24.449889,24.2,24.4,0.2
annmean_tmax_real,short,22.712597,22.799994,22.7,22.8,0.1


# All

In [10]:
with pd.ExcelWriter('center_casi_projections.xlsx') as writer:
    for center in centers:
        path = 'updated_extremes.zip'  # data directory
        center = center.upper()    # NASA center to analyze
        df, results = get_results(path, center)
        results.to_excel(writer, sheet_name=center)

51 AMES files
51 GSFC files
51 JPL files
51 KSC files
51 MSFC files
51 MAF files
51 GISS files
51 LARC files
51 SSC files
51 GRC files
51 WFF files
51 JSC files
51 WSTF files
51 AFRC files
