Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement transient moving-average thermal model #1081

Closed
mikofski opened this issue Oct 15, 2020 · 14 comments · Fixed by #1391
Closed

implement transient moving-average thermal model #1081

mikofski opened this issue Oct 15, 2020 · 14 comments · Fixed by #1391
Milestone

Comments

@mikofski
Copy link
Member

New Feature
Implement the transient moving-average model proposed by Matt Prillman

Describe the solution you'd like
a new function pvlib.temperature.prillman(times, ...)

Describe alternatives you've considered
see also #1080

Additional context
see NREL/ssc#261

@kandersolar
Copy link
Member

@mjprilliman FYI

@kandersolar
Copy link
Member

I've taken a look at implementing this but am not satisfied with what I've come up with so far. Here are my notes in case anyone else is working on this.

Option 1: cell_temp_ss.rolling('1200s').apply(_prilliman_average)

My natural starting point was to use pandas to get the offset-based windows and apply a custom averaging function. The problem is that tseries.rolling(...).apply(...) is way too slow -- ~1 second computation time for three days of 1-minute data, even without a custom averaging function:

In [50]: N=60*24*3  # three days of 1-min data
    ...: s = pd.Series(range(N), pd.date_range('2019-01-01', freq='1T', periods=N))
    ...: %time _=s.rolling('1200s').apply(np.mean)
    ...: 
CPU times: user 1.08 s, sys: 2.41 ms, total: 1.08 s
Wall time: 1.02 s

It's also a little clumsy giving the averaging function access to the P time series. The best way I found involved lots of extra indexing and made it even slower.

Option 2: pandas EWM

pandas does provide exponential-weighted functions via s.ewm, but as far as I can tell it considers the entire history of the time series; there is no way to do a moving window (pandas-dev/pandas#11288). That's maybe an acceptable error because the exponential weighting would minimize the contribution of points outside the window, but according to its docstring, the weight must be a float, i.e. it is a constant, which is not the case for the Prilliman et al model (Eq. 11; P depends on the wind speed time series). So I don't think the pandas EWM functions are flexible enough for this.

Option 3: constrain to regular time series

I think the inefficiency of tseries.rolling(...).apply(...) can be avoided if the input time series index is evenly spaced -- something like the Hankel matrix indexing approach of the pre-#1074 detect_clearsky. But it's a shame to require even spacing, especially when the paper specifically highlights the algorithm's ability to handle data with nonuniform sampling.

Option 4: accelerate it using numba, or maybe cython

Haven't tried it but I'm guessing this is the answer, in coordination with #1060. Like in #1037, I can't find the trick to get these transient models running efficiently with base pandas/numpy.

@mjprilliman
Copy link

Hi Kevin,

I'd like to help with this implementation. I had previously written a standalone Python script that was running the model accurately but I will have to revisit to see how or if it can tie in with the existing framework. I am working on SAM features this week but I can start working on this next week (possibly late this week) if that works.

@mjprilliman
Copy link

def pvl_MAmodel_2(Tmstamp, SS, WS, m_u, a=np.array([.0046,4.5537e-4,-2.2586e-4,-1.5661e-5])):
    #Numpy array MA model
    #convert datetime to datenum
    Tmstamp['Datenum'] = Tmstamp[['Times']].apply(pd.to_numeric)/(1e9)
    Tmstamp2 = np.array([Tmstamp.Datenum])
    
    #Separate bilinear interpolation coefficients into 4 variables
    a0 = a[0]
    a1 = a[1]
    a2 = a[2]
    a3 = a[3]
    #P = pd.DataFrame(a0+a1*WS+a2*m_u+a3*m_u*WS,dtype=float) # Power parameter for exponential weighting function
    P = np.array(a0+a1*WS+a2*m_u+a3*m_u*WS)
    #Initialize Results variable T_MA
    #T_MA = pd.DataFrame(SS.values[0]) #Initialize Moving Average with 
    #T_MA = np.arange(len(Tmstamp))
    #T_MA = np.full_like(T_MA,np.nan,dtype=np.double)
    #T_MA = np.array(SS[0,0])
    T_MA = np.array(SS[0])
    #T_MA[0] = SS[0,0]

    #Set constant window length
    WindowLength = 20*60
    #Initialize Back Counter
    I_B = 0
    
    cntr1 = 1
    for i in np.linspace(1,len(Tmstamp)-1,len(Tmstamp)-1):
        
        #Set cntr1 to iterative i to match Matlab code
        I_F = cntr1 - 1 #Front indice always set one timestep behind current value
        deltaT_I_F = Tmstamp2[0,cntr1] - Tmstamp2[0,I_F] #Time difference between current time and front indice (in case time series isn't uniform)
        if deltaT_I_F > WindowLength:
            #T_MA[cntr1] = SS[0,cntr1] #if the front indice is more than 20 minutes behind the current value, just use the steady-state approximation
            #T_MA = np.append(T_MA,SS[0,cntr1])
            T_MA = np.append(T_MA,SS[cntr1])

        else:
            while((Tmstamp2[0,cntr1] - Tmstamp2[0,I_B]) > WindowLength) & (I_B<I_F):
                     I_B = I_B+1
                     #If the front indice is within 20 minutes and the back indice is not going to bump into the front, bump up the back indice by one until it is within 20 minutes (window length)
            TimeBack = (Tmstamp2[0,cntr1] - Tmstamp2[0,I_B:I_F+1]) #Calculate the time in seconds back from the current value for each value between the front and back indices
            #TempsInWindow = SS[0,I_B:I_F+1] #find the steady-state temperature approximations for the times within the indices (include front indice)
            TempsInWindow = SS[I_B:I_F+1]
            #Weight = np.exp(-P[0,cntr1]*TimeBack.astype(float)) #Calculate the weight of each element in indice as function of power parameter P = fcn(WS, unit mass) and time back          
            Weight = np.exp(-P[cntr1]*TimeBack) #Calculate the weight of each element in indice as function of power parameter P = fcn(WS, unit mass) and time back          

            Rel_weight = Weight/np.sum(Weight) #Calculate weights relative to the total (weighted average)
            TempsWeighted = (TempsInWindow)*Rel_weight;
            Temp = np.sum(TempsWeighted);
            #Temp = pd.DataFrame(TempsInWindow.dot(np.transpose(Rel_weight))) #Dot product to find the temperature prediction at the current timestep
            #Temp = (TempsInWindow.dot(np.transpose(Rel_weight))) #Dot product to find the temperature prediction at the current timestep
            #Temp = (np.transpose(TempsInWindow).dot(Rel_weight))
            
            T_MA = np.append(T_MA,Temp) #append Temp to the ongoing T_MA dataframe (contains values for the whole year)
            #T_MA[cntr1] = Temp
            
        cntr1 += 1 
    
    
    return(T_MA)

For reference this is the Python function I was using in my grad studies, although most of the model development and validation was done in Matlab.

@cwhanse
Copy link
Member

cwhanse commented Oct 27, 2020

@kanderso-nrel I didn't realize that rolling.apply() was so slow (compared to built-in methods). I'll take a second look at it's use in #1074.

If performance is a roadblock, for irregular data on short intervals, resampling seems appropriate.

@kandersolar
Copy link
Member

Thanks @mjprilliman, it'll be interesting to see how your implementation compares. It's good to have a reference implementation too. I don't think this function is tied to a deadline so of course whenever you can find the time for it is fine.

I didn't realize that rolling.apply() was so slow (compared to built-in methods)

@cwhanse I forgot to mention this earlier but you can specify raw=True in apply to get an order of magnitude or two speedup. The difference is whether the applied function is passed Series objects (raw=False) or numpy arrays (raw=True). In this application I needed to keep it as a Series for the timedelta calculations so I used the slow route. Maybe the fast path could work in #1074, not sure.

@mikofski
Copy link
Member Author

My natural starting point was to use pandas to get the offset-based windows and apply a custom averaging function. The problem is that tseries.rolling(...).apply(...) is way too slow

  • It maybe b/c you are using apply. Have you tried: tseries.rolling('1200s').mean()? it may be faster

  • A few times I have also found Pandas operations to be convenient but slow compared to NumPy, but AFAIK NumPy doesn't have rolling(), but I think I found this online somewhere:

def moving_average(x, window=5):
    """
    Moving average of numpy array
    
    Parameters
    ----------
    x : numeric
        a numpy array to average
    window : int
        the window over which to average
    Returns
    -------
    an array of the same size with index at beginning of window
    
    If ``window <= 1`` then do nothing and return ``x``.
    """
    if window <= 1: return x
    m = window - 1
    y = np.pad(x, (0, m), 'edge')
    z = 0
    for n in range(window):
        z += np.roll(y, -n)
    return z[:-m] / window

It's right bounded, so not sure how to backfill or center. I using numpy.roll and numpy.pad

@kandersolar
Copy link
Member

It maybe b/c you are using apply. Have you tried: tseries.rolling('1200s').mean()? it may be faster

The built-in functions are definitely way faster. The problem is that I didn't see a way to implement the exponential weighting using only the built-in functions, so I went with apply and a custom function. It's tricky not only because it's a weighted average of all points in the last 1200s but also because the weights vary for each window based on wind speed (and also the index if the series is irregularly-sampled).

I think I found this online somewhere

The issue there, assuming we want to accommodate irregular sampling, is that the window size can vary based on how many timestamps fall into the last 1200s -- it's an offset-based window, not a count-based window. I couldn't see an elegant/efficient way to recreate pandas's offset-based window with numpy, but would love to see one!

@jforbess
Copy link
Contributor

jforbess commented Nov 30, 2020

I recently became aware of np.nditer, which sped up my rolling window apply greatly. I think you should be able to use it for this exponential weighted function, basically applied as a for loop with an efficiently stored/accessed array. I was doing a simple calc on 20 years of 525600 long series, and it was quite snappy, relative to the rolling window apply which was a long enough process that the full 20 year calc was at least tens of minutes if not an hour. Sorry I don't have more quantified values.

@kandersolar
Copy link
Member

Here are a few prototype implementations. Note that the output is very slightly different from the function @mjprilliman posted above -- I've not yet chased down the difference but it's good enough for proof of concept anyway. tl;dr: the numpy-based implementations are quite fast but assume regular sampling. https://gist.github.com/kanderso-nrel/1d6da384d7af8afc24c230f1f144eb57

I still think it's a shame to not support irregular sampling, but the performance penalty of using pandas here is so huge that I think we should just require regular sampling. It's certainly much better than nothing, which is what we've had since this issue was opened (October 2020...).

One question for @mjprilliman: the handful of temperature near the start of the inputs don't have a full window of previous conditions to average across. The numpy implementations in my notebook linked above return NaN for those points, but your example implementation uses whatever values are available, even if they don't comprise a full 20-minute window. I think yours is more faithful to the reference's equations, but I wonder if mine is more consistent with its spirit? Which approach do you think makes sense here?

@mjprilliman
Copy link

Hi Kevin,

Thanks for these examples. For your question, I would say the spirit of the model would be to include anything within the window even if there is not a full window to look back on. But taking model speed into account I think your v3 and v4 approaches make the most sense even without the partial window calculations. I would imagine in most use cases the first 20 minutes of data would occur in thermal equilibrium (night time) so a steady-state model would be sufficient. There could always be edge cases where some of the thermal inertia is not captured in the first time steps of a model but I think the speed up is worth more. Just my thoughts maybe @jsstein feels differently.

@adriesse
Copy link
Member

Nice work, Kevin. Based on a comment in the code, I wonder if this could help for the pandas version:

https://stackoverflow.com/questions/60736556/pandas-rolling-apply-using-multiple-columns/60918101#60918101

@kandersolar
Copy link
Member

the spirit of the model would be to include anything within the window even if there is not a full window to look back on.

I think the numpy functions could be modified to do this without a meaningful increase in runtime. I'll try that out before opening a PR. Thanks!

@adriesse I think that SO answer is essentially the same approach taken in the notebook: use the index of the moving subset to retrieve the corresponding subset of other columns. Maybe that notebook code could be made clearer, but I think the workaround to access multiple time series in a single rolling window is the same. Great minds think alike I guess ;)

@cwhanse
Copy link
Member

cwhanse commented Jan 24, 2022

I think we should just require regular sampling.

I don't see that as a major problem. This function operates on the output of a temperature model (applies a form of exponential smoothing). It is likely that a user calculated that output with a regular time index. Even if not, the smoothing basically justifies interpolation on the input cell temperature and wind speed to a regular index, in the following sense: output from applying the function to interpolated input is likely to be very similar to output when applying the function to the non-regular input then interpolating the output.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants