Parameter estimation for : Left skewed distributions #13071

digitalgemba · 2020-11-12T00:18:15Z

I was going through the code for def fit(self, data, *args, **kwds):, Does the optimisation differentiate between negative and positive skew distribution ?

if yes , please let me know the line link and ignore further.

If not , for a data as below,

temp=([0.035, 0.038, 0.054, 0.049, 0.052, 0.035, 0.035, 0.05 , 0.042, 0.034, 0.052, 0.035, 0.034, 0.037, 0.052, 0.035, 0.041, 0.035, 0.039, 0.04 , 0.04 , 0.043, 0.043, 0.035, 0.037, 0.034, 0.036, 0.037, 0.035, 0.032, 0.034, 0.038, 0.031, 0.05 , 0.042, 0.034, 0.036, 0.037, 0.055, 0.034, 0.036, 0.037, 0.035, 0.055, 0.044,0.04 , 0.031, 0.035, 0.038, 0.054])

print(skewtest(data)[0])
-3.4412103891299326

For Rayleigh Distribution,

rayleigh.fit(data)
(0.01122619903412441, 0.025047969058503067)

This does not make sense to me as this is case of 0 to loc , rather than 0 to Inf.

I have written the following, and I am struggling with ._fitstart .

def func(params):
     n_bad=[]
    loc,scale=params
    data1=(loc-data)/(scale)
    n_log_scale = (len(data)) *log(scale)
    logpdf = rayleigh._logpdf(data1)
    finite_logpdf = np.isfinite(logpdf)
    n_bad += np.sum(~finite_logpdf)
    if n_bad > 0:
        penalty = n_bad * log(np.finfo(float).max) * 100
        return -np.sum(logpdf[finite_logpdf], axis=0)+n_log_scale+penalty
    return -np.sum(logpdf, axis=0)+n_log_scale

x0=rayleigh._fitstart(max(data)*1.01+data)
res=optimize.fmin(func,x0)
array([0.0602275 , 0.01192598])

Please comment.

The text was updated successfully, but these errors were encountered:

mdhaber · 2020-11-25T23:08:35Z

@digitalgemba

Does the optimisation differentiate between negative and positive skew distribution

To my recollection, no. The generic optimization does not check the skew and change the procedure based on that, if that's what you're asking. Should it?

I'm not sure I can help with the rest, but I'll try.

There is data in a variable temp, but then there is rayleigh.fit(data). Is data the same as temp?

This does not make sense to me as this is case of 0 to loc , rather than 0 to Inf.

Can you elaborate on what you are expecting and why what you get does not match your expectation? I don't understand what is not making sense. We may not be as familiar with the distribution as you are, so it is not obvious from the numbers what is wrong.

I have written the following, and I am struggling with ._fitstart

_fitstart is a private method. You are welcome to use it, but we can't support its usage. If there is a bug, let us know.

You may be interested in gh-12097. It added an analytical formula for the scale parameter. It will be available when SciPy 1.6 is released, or you are welcome to build SciPy from master now. You might also be interested in this reference; see the fitting information in Chapter 39. The other first-order condition for Rayleigh, which you can use to get the location if you know the scale, is

In master, the fit returned by rayleigh.fit satisfies the first-order conditions very accurately.

Please also check how rayleigh is defined in scipy.stats. It might be different from what you are expecting.

mdhaber · 2020-11-30T01:50:21Z

@digitalgemba I'll go ahead and close this, but if you think there is a bug in master or if you're requesting some other feature, please let me know.

digitalgemba · 2020-12-03T07:20:09Z

The data below , is skewed

data=([0.038, 0.042, 0.012, 0.044, 0.049, 0.052, 0.054, 0.044, 0.058,0.039, 0.054, 0.043, 0.057, 0.058, 0.052, 0.051, 0.046, 0.049,0.052, 0.052, 0.051, 0.053, 0.044, 0.037, 0.04 , 0.032, 0.037,0.042, 0.045, 0.042, 0.041, 0.035, 0.052, 0.051, 0.052, 0.054,0.055, 0.052, 0.051, 0.042, 0.047, 0.032, 0.033, 0.036, 0.041,0.051, 0.046, 0.047, 0.042, 0.052])

skewtest(data)

SkewtestResult(statistic=-3.4412103891299326, pvalue=0.0005791180499256706)

The parameters does not reflect actual data .

How do we address such issues ?

mdhaber · 2020-12-03T07:30:05Z

I want to make sure I understand your issue. Let me try to restate it:
Your data is skewed. You try to fit a distribution to the data using the fit method, but the fitted distribution is not skewed.
Is that correct?

digitalgemba · 2020-12-03T07:31:42Z

Yes.

mdhaber · 2020-12-03T07:32:29Z

Ok, let me try fitting your data. Which distribution do you want me to try? Rayleigh, lognorm, or something else? In your original post you mentioned Rayleigh, but your recent post showed "lognorm" in the plot.

digitalgemba · 2020-12-03T08:07:09Z

Please check for lognormal, rest I will try after your feedback.

I measure distribution fitting by r^2.

temp=sorted(data)
xi=morestats._calc_uniform_order_statistic_medians(50)
xj=rayleigh.ppf(xi,loc,scale)
linregress(xj,temp)

mdhaber · 2020-12-03T16:36:15Z

lognorm's shape parameter doesn't support skew in that direction. Take a look at the plots on Wikipedia; they all have long tails to the right whereas your data has a longer tail on the left. If I reflect your data about the vertical axis, the shape parameter begins to be used.

import matplotlib.pyplot as plt
from scipy.stats import lognorm
import numpy as np

data=([0.038, 0.042, 0.012, 0.044, 0.049, 0.052, 0.054, 0.044, 0.058,0.039,
       0.054, 0.043, 0.057, 0.058, 0.052, 0.051, 0.046, 0.049,0.052, 0.052,
       0.051, 0.053, 0.044, 0.037, 0.04 , 0.032, 0.037,0.042, 0.045, 0.042,
       0.041, 0.035, 0.052, 0.051, 0.052, 0.054,0.055, 0.052, 0.051, 0.042,
       0.047, 0.032, 0.033, 0.036, 0.041,0.051, 0.046, 0.047, 0.042, 0.052])
data = -np.array(data)
p = lognorm.fit(data)
x = np.linspace(-0.1, 0.1, 300)
fig, ax = plt.subplots(1, 1)
ax.hist(data, density=True, histtype='stepfilled', alpha=0.2)
ax.plot(x, lognorm(*p).pdf(x))

But the other way of looking at this might be that lognormal may not be the appropriate distribution.

AtsushiSakai added scipy.stats query A question or suggestion that requires further information labels Nov 13, 2020

This was referenced Nov 25, 2020

ENH: stats: override stats.rayleigh.fit with analytical MLE #12097

Merged

A Solid Foundation for Statistics in Python with SciPy mdhaber/scipy#26

Closed

mdhaber closed this as completed Nov 30, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parameter estimation for : Left skewed distributions #13071

Parameter estimation for : Left skewed distributions #13071

digitalgemba commented Nov 12, 2020

mdhaber commented Nov 25, 2020 •

edited

mdhaber commented Nov 30, 2020

digitalgemba commented Dec 3, 2020

mdhaber commented Dec 3, 2020

digitalgemba commented Dec 3, 2020

mdhaber commented Dec 3, 2020 •

edited

digitalgemba commented Dec 3, 2020

mdhaber commented Dec 3, 2020 •

edited

Parameter estimation for : Left skewed distributions #13071

Parameter estimation for : Left skewed distributions #13071

Comments

digitalgemba commented Nov 12, 2020

mdhaber commented Nov 25, 2020 • edited

mdhaber commented Nov 30, 2020

digitalgemba commented Dec 3, 2020

mdhaber commented Dec 3, 2020

digitalgemba commented Dec 3, 2020

mdhaber commented Dec 3, 2020 • edited

digitalgemba commented Dec 3, 2020

mdhaber commented Dec 3, 2020 • edited

mdhaber commented Nov 25, 2020 •

edited

mdhaber commented Dec 3, 2020 •

edited

mdhaber commented Dec 3, 2020 •

edited