Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add features #32

Merged
merged 11 commits into from Nov 13, 2020
Merged

Add features #32

merged 11 commits into from Nov 13, 2020

Conversation

anlava
Copy link
Collaborator

@anlava anlava commented Nov 2, 2020

New features + tests

@hombit
Copy link
Member

hombit commented Nov 2, 2020

Please run black . to reformat the code

@anlava anlava changed the title Add features: median, standard deviation Add features Nov 2, 2020
m_std = np.std(m, ddof=1)
m_new = np.cumsum(m - m_mean)
result = m_new / (len(m) * m_std)
return max(result) - min(result)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace with np.ptp

class Eta(BaseFeature):
def __call__(self, t, m, sigma=None, sorted=None, fill_value=None):
n = len(m)
m_std = np.std(m, ddof=1) ** 2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

np.var

def __call__(self, t, m, sigma=None, sorted=None, fill_value=None):
n = len(m)
m_std = np.std(m, ddof=1) ** 2
m_sum = sum([(m[i + 1] - m[i]) ** 2 for i in range(n - 1)])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

np.sum((m[1:] - m[:-1]) ** 2)

def __call__(self, t, m, sigma=None, sorted=None, fill_value=None):
m_mean = np.mean(m)
d_mean = np.mean(np.power(sigma, 2))
m_std = np.std(m, ddof=1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

np.var

def __call__(self, t, m, sigma=None, sorted=None, fill_value=None):
n = len(m)
m_mean = np.mean(m)
m_st = np.std(m, ddof=1) ** 4
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

np.var

from ._base import BaseFeature


class Kurtosis(BaseFeature):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scipy.stat?

Comment on lines 8 to 10
m_span = [m[i + 1] - m[i] for i in range(len(m) - 1)]
t_span = [t[i + 1] - t[i] for i in range(len(t) - 1)]
div = [abs(i / j) for i, j in zip(m_span, t_span)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vectorize

from ._base import BaseFeature


class Skew(BaseFeature):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?scipy.stats


class WeightedMean(BaseFeature):
def __call__(self, t, m, sigma=None, sorted=None, fill_value=None):
return np.average(m, weights=np.power(sigma, 2))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-2

@hombit
Copy link
Member

hombit commented Nov 12, 2020

Check style test failed, can you, please, black the code?

Copy link
Member

@hombit hombit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably some random-number statistics-motivated tests can be added. Thus, we can test AndersonDarlingNormal using a sample of normal distribution, and test ReducedChi2 with a random vector generated from multidimensional normal distribution having mu = [x, x, ..., x] and Sigma = diag(sigma_1, sigma_2, ...., sigma_n)

n = len(m)
m_wmean = np.average(m, weights=sigma)
s = ((m - m_wmean) / sigma) ** 2
return sum(s) / (n - 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

np.sum is faster, proof:

In [1]: a = np.arange(1024)

In [2]: %timeit sum(a)
275 µs ± 22.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [3]: %timeit np.sum(a)
6.21 µs ± 548 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

feature = ReducedChi2()
desired = feature(m, m, sigma)
actual = 10.666667
assert_allclose(actual, desired)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add an additional test with non-constant sigma

class ReducedChi2(BaseFeature):
def __call__(self, t, m, sigma=None, sorted=None, fill_value=None):
n = len(m)
m_wmean = np.average(m, weights=sigma)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

weights=sigma ** -2?

@hombit hombit merged commit 77cbc86 into light-curve:pure-python Nov 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants