# Combining GISS
While the GISS datasets found through ESGF-NCI do cover the past 1000 years, the files are broken up into 50 year segments due to size. As we are only looking at summary statistics, we can extract these without combining the files themselves.

This notebook shows the process of combining one such dataset (the r1i1p121 experiment) used to develop the combo.py function

Due to the large number of files, these have not been downloaded, and instead are accessed via url. However, the nc files are not accessible by the straight download link, instead requiring '#mode=bytes' to be added at the end in order to be read. This does not appear to cause any other issues in the loading of the dataset.

In [15]:
import xarray as xr
url = 'https://ds.nccs.nasa.gov/thredds/fileServer/CMIP5/NASA/GISS/past1000/E2-R_past1000_r1i1p121/sos_Omon_GISS-E2-R_past1000_r1i1p121_085001-089912.nc#mode=bytes'
ds = xr.load_dataset(url)
ds

All of the download files have a standardized url, with only the end section representing the timeframe (YYYYMM-YYYYMM). In order to be legible, these have been presented by combining strings rather than each uploaded fully.

In [1]:
experiment = "E2-R_past1000_r1i1p121" 
file_base = "https://ds.nccs.nasa.gov/thredds/fileServer/CMIP5/NASA/GISS/past1000/"+experiment+"/sos_Omon_GISS-"

file_1800 = file_base + experiment + '_180101-185012.nc#mode=bytes'
file_1750 = file_base + experiment + '_175101-180012.nc#mode=bytes'
file_1700 = file_base + experiment + '_170101-175012.nc#mode=bytes'
file_1650 = file_base + experiment + '_165101-170012.nc#mode=bytes'
file_1600 = file_base + experiment + '_160101-165012.nc#mode=bytes'
file_1550 = file_base + experiment + '_155101-160012.nc#mode=bytes'
file_1500 = file_base + experiment + '_150101-155012.nc#mode=bytes'
file_1450 = file_base + experiment + '_145101-150012.nc#mode=bytes'
file_1400 = file_base + experiment + '_140101-145012.nc#mode=bytes'
file_1350 = file_base + experiment + '_135101-140012.nc#mode=bytes'
file_1300 = file_base + experiment + '_130101-135012.nc#mode=bytes'
file_1250 = file_base + experiment + '_125101-130012.nc#mode=bytes'
file_1200 = file_base + experiment + '_120101-125012.nc#mode=bytes'
file_1150 = file_base + experiment + '_115101-120012.nc#mode=bytes'
file_1100 = file_base + experiment + '_110101-115012.nc#mode=bytes'
file_1050 = file_base + experiment + '_105101-110012.nc#mode=bytes'
file_1000 = file_base + experiment + '_100001-105012.nc#mode=bytes'
file_0950 = file_base + experiment + '_095001-099912.nc#mode=bytes'
file_0900 = file_base + experiment + '_090001-094912.nc#mode=bytes'
file_0850 = file_base + experiment + '_085001-089912.nc#mode=bytes'

In [19]:
ds = xr.load_dataset(file_0850)
ds

An array is created of the URLs with columns for the Maximum, Minimum, Average, and Count, utilizing the **sumstat** function. While the maximum of the Max column and Minimum of the Min column will produce the total max and min, average instead requires the use of the formula:
> total average = sum(average * number of values) / total number of values

In [2]:
import pandas as pd
import numpy as np
import ncsumstat as ns
 
df = pd.DataFrame(np.nan, index=range(0,20), columns=['URL', 'Max', 'Min','Avg','Count','Avg*Count'])
df['URL']=[file_0850,file_0900,file_0950,file_1000,file_1050,file_1100,file_1150, file_1200,file_1250,file_1300,file_1350,
           file_1400,file_1450,file_1500,file_1550,file_1600,file_1650,file_1700,file_1750,file_1800]

lon = 90
lat = 0
for x in range(df.shape[0]):
    df.loc[x,'Max'] = ns.max(lon,lat,df.loc[x,'URL'])
    df.loc[x,'Min'] = ns.min(lon,lat,df.loc[x,'URL'])
    df.loc[x,'Avg'] = ns.avg(lon,lat,df.loc[x,'URL'])
    df.loc[x,'Count'] = ns.count(lon,lat,df.loc[x,'URL'])
    df.loc[x,'Avg*Count'] = df.loc[x,'Avg'] * df.loc[x,'Count']
df

Unnamed: 0,URL,Max,Min,Avg,Count,Avg*Count
0,https://ds.nccs.nasa.gov/thredds/fileServer/CM...,34.203995,32.53685,33.340626,600.0,20004.375458
1,https://ds.nccs.nasa.gov/thredds/fileServer/CM...,34.051292,32.527519,33.279259,600.0,19967.555237
2,https://ds.nccs.nasa.gov/thredds/fileServer/CM...,34.100899,32.564892,33.324093,600.0,19994.455719
3,https://ds.nccs.nasa.gov/thredds/fileServer/CM...,34.147537,32.54353,33.328716,612.0,20397.174362
4,https://ds.nccs.nasa.gov/thredds/fileServer/CM...,34.100441,32.541012,33.288685,600.0,19973.210907
5,https://ds.nccs.nasa.gov/thredds/fileServer/CM...,34.239113,32.619923,33.306366,600.0,19983.81958
6,https://ds.nccs.nasa.gov/thredds/fileServer/CM...,33.979759,32.513836,33.179916,600.0,19907.949829
7,https://ds.nccs.nasa.gov/thredds/fileServer/CM...,33.879238,32.489872,33.2234,600.0,19934.04007
8,https://ds.nccs.nasa.gov/thredds/fileServer/CM...,34.01886,32.495705,33.254539,600.0,19952.723694
9,https://ds.nccs.nasa.gov/thredds/fileServer/CM...,34.016018,32.440399,33.271431,600.0,19962.858582


In [16]:
totalmax = df['Max'].max()
totalmin = df['Min'].min()
totalavgcount = df['Avg*Count'].sum() 
totalsum = df['Count'].sum()
totalavg = totalavgcount / totalsum

totalmax, totalmin, totalavg

(34.239112854003906, 32.42214584350586, 33.27420974182677)