Clickable Histogram of Atmospheric Data (CHAD)

Author: Matthew Niznik (mailto:matthew.niznik9@gmail.com)<br>
Post-Doctoral Associate, RSMAS, University of Miami

The purpose of the Clickable Histogram of Atmospheric Data (CHAD) is to load a 2-D histogram and the data used to generate it in order to create a 2-D scatterplot allowing the user to load the location and time that a particular datapoint in the plot came from using Unidata's Integrated Data Viewer (IDV) (http://www.unidata.ucar.edu/software/idv/).

Input needed:

CHAD(hist,xBinEdges,yBinEdges,lons,lats,times,startDatetime,<br>
     xData,yData,xDataBinned,yDataBinned,maxPlottedInBin,<br>
     figXPixelsReq,figYPixelsReq,figDPIReq,xFmtStr,yFmtStr)<br>

hist: Binned 2-D histogram data<br>
xBinEdges,yBinEdges: 1-D arrays containing the edges of each histogram bin (i.e. if 10x10, 11 entry long arrays)<br>
lons,lats: 1-D arrays containing the longitude and latitudes for xData and yData<br>
times,startDatetime: a 1-D array containing the number of (minutes?) since the startDatetime (a Datetime object)<br>
xData,yData: 3-D [F(x,y,t)] variables for the scatter plot (Ex: Precipitation and W(z=500 hPa))<br>
xDataBinned,yDataBinned: 3-D [F(x,y,t)] variables containing only the bin values in X and Y (saves sorting time)<br>
maxPlottedInBin: value to keep the number of data points reasonable in terms of finding a close match via click<br>
figXPixels,figYPixels,figDPI: specify the size and resolution of the CHAD<br>
xFmtStr,yFmtStr: Python format strings (e.g. "%.1f") specifying a reasonable formatting for click output<br>

Creating an instance of the CHAD class generates the plot from the input data and then waits for user input clicks.

(Note: iPython Notebook needs a few tweaks to work most seemlessly with the CHAD instances - those will be pointed out below as they come up.)

In [None]:
#--- User Changeable Parameters (and appropriate libraries) ---

#--- Figure Size and Resolution ---
#--- Set the figure x by y resolution, DPI, and the max number of points to appear in a given bin ---
#--- (Plotting time as well as finding an individual event prohibitive for very large maxPlottedInBin values)
figureXSize = 800
figureYSize = 800
figDPI = 150
maxPlottedInBin = 25

#--- Formatting for Output ---
#--- Basic Help: The number after the decimal point sets the number of decimal points shown in output ---
#--- For more on Python string formatting, see: () ---
xFmtStr = "%.2f"
yFmtStr = "%.2f"

#--- Start time is needed so that an appropriate time will be loaded upon calls to IDV ---
import datetime
startYear = 2005
startMonth = 06
startDay = 01
startDate = datetime.datetime(startYear,startMonth,startDay)

#--- Load Input from URL/Filepath ---
var1Name = 'Precipitation'
var2Name = 'W500'

urlToLoadHist = '/Users/niznik/Work/Data/CHAD_Data/histOutput_r90x45_3_V2.nc4'

histName = 'HIST'
var1EdgeName = 'PRECBIN'
var2EdgeName = 'W500BIN'
var1BinnedName = 'PRECBINNED'
var2BinnedName = 'W500BINNED'

urlToLoadValues = '/Users/niznik/Work/Data/CHAD_Data/allVars_r90x45_3.nc4'

var1ValueName = 'PREC'
var2ValueName = 'W'
var1ValueMult = 86400.
var2ValueMult = 1.

lonValueName = 'lon'
latValueName = 'lat'
timeValueName = 'time'

In [None]:
from IPython.display import clear_output

import netCDF4
import numpy as np

import CHAD

In [None]:
#--- Fixing the output so it isn't buffered ---
#--- See: http://stackoverflow.com/questions/29772158/make-ipython-notebook-print-in-real-time ---

oldsysstdout = sys.stdout
class flushfile():
    def __init__(self, f):
        self.f = f
    def __getattr__(self,name): 
        return object.__getattribute__(self.f, name)
    def write(self, x):
        self.f.write(x)
        self.f.flush()
    def flush(self):
        self.f.flush()
sys.stdout = flushfile(sys.stdout)

In [None]:
#--- Load the Data ---
#--- Note: this is where user knowledge of the local data is useful ---
#--- Data needs to be processed into the form that CHAD expects ---
lonBin = 0
latBin = 5
Nx = 90
Ny = 45
numLonBins = 10
numLatBins = 9
firstTimeValues = 8*16
lastTimeValues = -1*8*15

#--- Load Histogram and Bin Edges ---
cdfInHist = netCDF4.Dataset(urlToLoadHist,'r')
hist = np.sum(cdfInHist.variables[histName][:,lonBin,latBin,:,:,:,:,:,:],(0,3,4,5,6))
var1Edges = np.append(cdfInHist.variables[var1EdgeName+'MINS'][0],cdfInHist.variables[var1EdgeName+'MAXS'][:])
var2Edges = np.append(cdfInHist.variables[var2EdgeName+'MINS'][0],cdfInHist.variables[var2EdgeName+'MAXS'][:])

lonAVMin = lonBin*(Nx/numLonBins)
lonAVMax = lonAVMin+(Nx/numLonBins)
latAVMin = latBin*(Ny/numLatBins)
latAVMax = latAVMin+(Ny/numLatBins)

var1Binned = cdfInHist.variables[var1BinnedName][:,latAVMin:latAVMax,lonAVMin:lonAVMax]
var2Binned = cdfInHist.variables[var2BinnedName][:,latAVMin:latAVMax,lonAVMin:lonAVMax]
cdfInHist.close()

#--- Load Timeseries ---
#--- Note: Add multiplier options for both var1 and var2 values instead of hardwiring
cdfInValues = netCDF4.Dataset(urlToLoadValues,'r')
var1Values = cdfInValues.variables[var1ValueName][firstTimeValues:lastTimeValues,
                                                  latAVMin:latAVMax,lonAVMin:lonAVMax]*var1ValueMult
var2Values = cdfInValues.variables[var2ValueName][firstTimeValues:lastTimeValues,
                                                  latAVMin:latAVMax,lonAVMin:lonAVMax]*var2ValueMult
lonValues = cdfInValues.variables[lonValueName][lonAVMin:lonAVMax]
latValues = cdfInValues.variables[latValueName][latAVMin:latAVMax]
timeValues = cdfInValues.variables[timeValueName][firstTimeValues:lastTimeValues]
cdfInValues.close()

startHourInS = (timeValues[1]-timeValues[0])*(60./2.)
startDatetime = startDate+datetime.timedelta(0,startHourInS)

In [None]:
#--- Create CHAD using a proper call ---
%qtconsole

CHAD1 = CHAD.CHAD(hist,var1Edges,var2Edges,lonValues,latValues,timeValues,
                  startDatetime,var1Values,var2Values,var1Binned,var2Binned,
                  maxPlottedInBin,figureXSize,figureYSize,figDPI,
                  xFmtStr,yFmtStr)
CHAD1.showPlot()