Applied Statistics in Solar Energy: Why invest on P90?
This post goes through the underpinnings of Exceedance Probabilities and the benefit solar energy developers and enthusiasts alike stand by knowing the basics.
Exceedance Probabilities in the context of solar energy are also referred to as P50/P90 analysis or simply, P values. Among banks and investment firms it's the staple statistical method to determine the economic risk associated with solar resource uncertainty.
Objectively, a P50/P90 analysis determines the likelihood that a solar plant will yield an specific amount of energy (ie. dollars) during any given year of its life. For this reason, exceedance probabilities are paramount for a solar project to a) secure competitive financing and, b) manage operational costs and debt obligations.
So, what exactly are they?
P as in Percentile
P values refer to the probability that a certain value will be exceeded. For example, a P90 value of 100, means there is a 90% probability of exceeding 100.
In our context, a P90 of 100 is the likelihood that a solar plant will yield more than 100 units of energy.
Pro-tip: Notice P90 is NOT a 90% probability of producing 100, but of exceeding 100 units (subtle yet very different meaning and a common misunderstanding).
Methodologies to calculate P values
Dealing with 'uncertainty' and 'statistical methods' may sound intimidating, but once we clear out a few concepts it should be straight forward to grasp.
Inputs and High Level
To calculate statistically robust P values of energy estimates we need two inputs:
1) A long-term historical weather dataset: Using multi-year data assures considering potential worst-case scenarios that could affect a project financial terms. 2) Performance system modeling: An hourly simulation of the system performance for every single year in the dataset provides the detailed expectations of a system output.
All else equal, yearly outputs are mainly driven by the weather conditions at the project's site. Let's not forget, the goal is to understand that variability and determine the asset expectations on a year-to-year basis.
From an statistics standpoint, we do this by fitting the historical dataset to a function we understand in order to make inferences from it.
In other words, we calculate P values through a two step process:
1) Fit the previously simulated yearly plant outputs to a distribution function and, 2) Calculate a desired P value derived from the function properties.
Consider a fictional 10 MW utility-scale plant somewhere in California.
I've simulated yearly energy outputs for a 1998-2015 dataset at the site, see results in the following table:
Simulated Yearly Energy Ouputs [GWh]
Plotting a histogram will show how the energy outputs are distributed, or, the density of our sample.
The most typical path going forward is to assume the data follows a Normal Probability Distribution (see Wikipedia) and to calculate Cumulative Form (the integral). This is a nice idea if the data was normally distributed, however, that's arguably not the case with the solar resource.
Across a 20-30 year life of a solar project, outlier events such as cyclic weather patterns or volcanic eruptions may skew the data (in our example we potentially have one!).
An alternative approach is to not assume any particular distribution and build one directly from the data. Particularly, we want to build an Empirical Cumulative Density Function.
For illustration purposes, I will calculate the Exceedance Probabilities with both methods and expose why understanding the difference matters.
Let's assume we are using the standard distribution first.
Normal Distribution Approach
This approach is simple since we know what the function looks like (Wikipedia). Note it takes two parameters, the Mean and Standard Deviation of the dataset.
For our hypothetical plant, this is what it looks like with the calculated mean and variance.
The integral of this curve is the cumulative density function -- what we are interested in.
To interpret this plot, take the P10 value, it reads: 'The likelihood that the plant will yield more than 18.4 GWh is 10%'. That's it! We can solve the function to get the proportion of the population (probability) that is greater than any value P. Let's stop and think about this further.
If you were to invest on a solar plant with minimal risk exposure, would you want to calculate your Return of Investment using a high or low P value? To answer that, notice a P10 value means that the proportion of yearly simulations where the outcome exceeds such value is only 10%. Another way to think about it is that, a P value is inversely proportional to the expected production.
For example a high probability of exceedance, ie. P90, will reference a relatively low production yield. And the reverse is true for low P values. That is why Financial Institutions and Plant Owners, on both their best interests would plan according to at least the P50 (which is the expected value or mean); the latter makes sure they service debt obligations and manage operational costs, while the former reduces risk of borrower's default.
Now, let's take a look to the Empirical procedure. Take our hypothetical plant and dataset, there are 18 years, 18 production values. Each value constitutes an equal contribution of the total probability or 1 / 18. Since the distribution is cumulative, we want to sort the values (lowest to highest), and do a cumulative sum of the total contribution at each consecutive data point. The procedure is shown below:
|Energy Output [GWh]||Prob||cumsum|
With that, we are ready to plot.
In contrast to the first method, we obtain specific P values by performing linear interpolation. For example, a P90 value can be computed by interpolating between the values in the table for which the Cumulative Density is equal to 0.1. This is crucial since it means we need as many data points as possible to establish a representative cumulative curve and by consequence, reliable exceedance probabilities. So, this curve tells us a slightly different story than our first method:
The plot exposes the empirical distribution of an 18-year irradiance dataset deviates at various sections when compared to the calculated normal distribution. For this example, the P90 and P50 values calculated using the normal CDF differs optimistically by 0.6% and -0.3%, respectively.
Take particular note to the deviations on the tail of the distributions. Assuming a normal distribution may lead to arrive to seemingly irrational conclusions, in this example it would suggest that the simulated production for year 1998 has a likelihood of occurring 1 in 330 years.
Invest on P90 from an empirical distribution if possible. As we've seen, to evaluate a project's financial risk we use P50 and P90 exceedance probabilities based on a multi-year historical dataset. Weather, just as stock and other phenomena is a fat tailed distribution which does not fit to the normal distribution very closely, thus, the empirical methodology yields more reliable results and realistic estimates to manage your plant expectations.