Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

generalized linear_regression #1716

Closed
wants to merge 26 commits into from

Conversation

@ThomasLecocq
Copy link
Contributor

@ThomasLecocq ThomasLecocq commented Mar 13, 2017

This PR adds a generalized linear regression to obspy.signal - this allows OLS and WLS, with/without allowing for an intercept.

PR Checklist

  • All tests still pass.
  • Any new features or fixed regressions are be covered via new tests.
  • Any new or changed features have are fully documented.
  • Significant changes have been added to CHANGELOG.txt .
  • First time contributors have added your name to CONTRIBUTORS.txt .

TODO

  • Test
@Jollyfant
Copy link
Contributor

@Jollyfant Jollyfant commented Mar 13, 2017

I'm just curious but does OLS stand for orthogonal least squares (this is for bi-variant data) or ordinary least squares -- like the classical approach?

@ThomasLecocq
Copy link
Contributor Author

@ThomasLecocq ThomasLecocq commented Mar 13, 2017

ordinary least squares & the weighted version

the goal of this is to remove the need for statsmodels for my msnoise routines that come down to obspy

@barsch

This comment has been minimized.

Copy link
Member

@barsch barsch commented on obspy/signal/tests/test_regression.py in 682042e Mar 13, 2017

weight

@ThomasLecocq

This comment has been minimized.

Copy link
Contributor Author

@ThomasLecocq ThomasLecocq commented on 682042e Mar 13, 2017

raaaah, indeed !

@ThomasLecocq ThomasLecocq mentioned this pull request Mar 13, 2017
4 of 6 tasks complete
@ThomasLecocq ThomasLecocq added this to the 1.1.0 milestone Mar 13, 2017
@ThomasLecocq
Copy link
Contributor Author

@ThomasLecocq ThomasLecocq commented Mar 14, 2017

good to go once reviewed by you guyz :)

1/sigma.
:param p0: Initial guess for the parameters. If None, then the initial
values will all be 0 (Different from SciPy where all are 1)
:param intercept: If False: solves y=a*x ; if True: solves y=a*x+b.

This comment has been minimized.

@Jollyfant

Jollyfant Mar 14, 2017
Contributor

It is semantics but maybe we should call this floating_intercept instead of intercept (or anchor_intercept but the boolean will need to flip). Because you will be anchoring the regression to the origin. Both regressions will naturally have an intercept and this threw me off initially.

CodeCanary fails but that is unrelated. Other than that I can approve. Nice tests!

This comment has been minimized.

@Jollyfant

Jollyfant Mar 14, 2017
Contributor

constant to me says y = b so it is not a good candidate IMHO.

This comment has been minimized.

@ThomasLecocq

ThomasLecocq Mar 14, 2017
Author Contributor

or revert the arg : force_y_intercept_through_origin ... I don't know which one makes it easier to understand...

This comment has been minimized.

@Jollyfant

Jollyfant Mar 14, 2017
Contributor

Sure intercept_origin works if flipped.

This comment has been minimized.

@ThomasLecocq

ThomasLecocq Mar 14, 2017
Author Contributor

Perfect ! I agree on that, it'll be easier to understand !

@ThomasLecocq
Copy link
Contributor Author

@ThomasLecocq ThomasLecocq commented Mar 14, 2017

ah and I don't know how to trigger the doc bot ? @megies

@Jollyfant
Copy link
Contributor

@Jollyfant Jollyfant commented Mar 14, 2017

+DOCS

@ThomasLecocq
Copy link
Contributor Author

@ThomasLecocq ThomasLecocq commented Mar 24, 2017

all good, OK for you boyz ? @krischer @megies


def linear_regression(xdata, ydata, weights=None, p0=None,
intercept_origin=True):
""" Use linear least squares to fit a function, f, to data. This method

This comment has been minimized.

@QuLogic

QuLogic Mar 24, 2017
Member

First line should be standalone:

"""
Use linear least squares to fit a function, f, to data.

This method is a ....
docstring
Copy link
Member

@megies megies left a comment

Some minor docstring changes, otherwise this is good to go, I think (without having it tried out).


def linear_regression(xdata, ydata, weights=None, p0=None,
intercept_origin=True):
""" Use linear least squares to fit a function, f, to data.

This comment has been minimized.

@megies

megies Mar 28, 2017
Member

Please stick to our docstring conventions.. :-)

  • three double quotes
  • short one-line (if possible) summary
  • blank line
  • more details
def xyz(...):
    """
    Short summary on first line

    More detailed information,
    multiple lines etc.
    """

def linear_regression(xdata, ydata, weights=None, p0=None,
intercept_origin=True):
""" Use linear least squares to fit a function, f, to data.

This comment has been minimized.

@megies

megies Mar 28, 2017
Member

-> Use least squares to fit a linear function to data. (letter f as function name is not used otherwise)

This comment has been minimized.

@ThomasLecocq

ThomasLecocq Mar 28, 2017
Author Contributor

ok

:meth:`scipy.optimize.minpack.curve_fit`; allowing for Ordinary Least
Square and Weighted Least Square regressions:
* OLS with origin intercept : ``linear_regression(xdata, ydata)``

This comment has been minimized.

@megies

megies Mar 28, 2017
Member

"origin intercept" sounds awkward, no? Maybe OLS through origin or OLS without intercept, .. or something else..?

This comment has been minimized.

@ThomasLecocq

ThomasLecocq Mar 28, 2017
Author Contributor

+1 for through origin

uncertainties are assumed to be 1. In SciPy vocabulary, our weights are
1/sigma.
:param p0: Initial guess for the parameters. If None, then the initial
values will all be 0 (Different from SciPy where all are 1)

This comment has been minimized.

@megies

megies Mar 28, 2017
Member

I guess there is a good reason for this deviation from scipy defaults with MSNoise in the back of the head..? But maybe in general it might be worth a thought to stick to scipy defaults?

This comment has been minimized.

@ThomasLecocq

ThomasLecocq Mar 28, 2017
Author Contributor

it's mostly because it's soooo complicated in scipy/statsmodels ... if we allow "sigma": then we should state "if you want to pass weights, then you should pass 1/weights"... :(

:param xdata: The independent variable where the data is measured.
:param ydata: The dependent data - nominally f(xdata, ...)
:param weights: If not None, the uncertainties in the ydata array. These
are used as weights in the least-squares problem. If None, the

This comment has been minimized.

@megies

megies Mar 28, 2017
Member

  • please use 4 spaces as indents on follow-up lines in parameter specs..
  • builtins / literals (like None) should be enclosed in two backticks or monospaced font in our docs
    :param weights: If not ``None``, the uncertainties in the ydata array. These
        are used as weights in the least-squares problem. If None, the

This comment has been minimized.

@ThomasLecocq

ThomasLecocq Mar 28, 2017
Author Contributor

ok

intercept_origin=True):
""" Use linear least squares to fit a function, f, to data.
This method is a generalized version of
:meth:`scipy.optimize.minpack.curve_fit`; allowing for Ordinary Least

This comment has been minimized.

This comment has been minimized.

@ThomasLecocq

ThomasLecocq Mar 28, 2017
Author Contributor

ok

1/sigma.
:param p0: Initial guess for the parameters. If None, then the initial
values will all be 0 (Different from SciPy where all are 1)
:param intercept_origin: If True: solves y=a*x (default); if False:

This comment has been minimized.

@megies

megies Mar 28, 2017
Member

Again, enclose formulas etc in two backticks, please.

This comment has been minimized.

@ThomasLecocq

ThomasLecocq Mar 28, 2017
Author Contributor

ok

slope, intercept = p
std_slope = np.sqrt(cov[0, 0])
std_intercept = np.sqrt(cov[1, 1])
return slope, intercept, std_slope, std_intercept

This comment has been minimized.

@megies

megies Mar 28, 2017
Member

Python modules in our repo usually have the doctest footer, even though there are none currently, would not hurt to add it

This comment has been minimized.

@ThomasLecocq

ThomasLecocq Mar 28, 2017
Author Contributor

euh ? what's that ?

This comment has been minimized.

@ThomasLecocq

ThomasLecocq Mar 28, 2017
Author Contributor

ok boss :)

:rtype: tuple
:returns: (slope, std_slope) if `intercept_origin` is `True`;
(slope, intercept, std_slope, std_intercept) if `False`.

This comment has been minimized.

@megies

megies Mar 28, 2017
Member

again.. two backticks for monospace -> "... if intercept_origin=True ..."

This comment has been minimized.

@ThomasLecocq

ThomasLecocq Mar 28, 2017
Author Contributor

ok



def linear_regression(xdata, ydata, weights=None, p0=None,
intercept_origin=True):

This comment has been minimized.

@megies

megies Mar 28, 2017
Member

I find the naming intercept_origin not very intuitive, maybe rename that option to allow_intercept?

This comment has been minimized.

@ThomasLecocq

ThomasLecocq Mar 28, 2017
Author Contributor

well, read above, we're coming from there and arrived to intercept_origin :(

+DOCS
…egression2

Conflicts:
	obspy/signal/regression.py
@ThomasLecocq
Copy link
Contributor Author

@ThomasLecocq ThomasLecocq commented Mar 29, 2017

@megies : all done

Copy link
Member

@krischer krischer left a comment

Some final comments - sorry for joining the discussion so late...I was a bit short on time. After these its IMHO good to merge :)

If you have no time right now, please let me know and I'll do it.

# Email: Thomas.Lecocq@seismology.be
#
# Copyright (C) 2017 Thomas Lecocq
# --------------------------------------------------------------------

This comment has been minimized.

@krischer

krischer Mar 29, 2017
Member

Can you merge that information with the copyright header right below it? These two also kind of contradict each other.

This comment has been minimized.

@ThomasLecocq

ThomasLecocq Mar 29, 2017
Author Contributor

ok, I actually followed the header in trigger.py :)

from future.builtins import * # NOQA

import scipy.optimize
import numpy as np

This comment has been minimized.

@krischer

krischer Mar 29, 2017
Member

Please put numpy before scipy.

This comment has been minimized.

@ThomasLecocq

ThomasLecocq Mar 29, 2017
Author Contributor

ok


def linear_regression(xdata, ydata, weights=None, p0=None,
intercept_origin=True):
""" Use least squares to fit a linear function to data.

This comment has been minimized.

@krischer

krischer Mar 29, 2017
Member

Can you do

"""
Use least squares...

with a newline after the triple quotes

This comment has been minimized.

@ThomasLecocq

ThomasLecocq Mar 29, 2017
Author Contributor

ok

(slope, intercept, std_slope, std_intercept) if ``False``.
"""
if weights is not None:
sigma = 1./weights

This comment has been minimized.

@krischer

krischer Mar 29, 2017
Member

Add spaces around / - not sure why flake8 does not complain.

This comment has been minimized.

@ThomasLecocq

ThomasLecocq Mar 29, 2017
Author Contributor

ok

…egression

Conflicts:
	obspy/signal/regression.py
@megies
Copy link
Member

@megies megies commented Mar 30, 2017

@ThomasLecocq forgot to add a test file? http://tests.obspy.org/75817/#1

But on the other hand it sounds like the file is related to #1719? Or are you using the same test file in both PRs?

@ThomasLecocq
Copy link
Contributor Author

@ThomasLecocq ThomasLecocq commented Mar 30, 2017

F@#{|^|^[#{^@[[|@#{[@#^[{#[^# git stuff.... my two PRs got crappy mixed together. so this one here shouldn't have the MWCS stuff... pfffr.....
what's the easiest? I create a new branch of current master, cherry pick changes for linear_regression and PR those ?

@krischer
Copy link
Member

@krischer krischer commented Mar 30, 2017

I create a new branch of current master, cherry pick changes for linear_regression and PR those ?

Sounds like a good way to resolve the mess.

@megies
Copy link
Member

@megies megies commented Mar 30, 2017

F@#{|^|^[#{^@[[|@#{[@#^[{#[^# git stuff.... my two PRs got crappy mixed together. so this one here shouldn't have the MWCS stuff... pfffr.....
what's the easiest? I create a new branch of current master, cherry pick changes for linear_regression and PR those ?

Need help? I can fix it if need be, squashing everything together, dont think we need any granularity here..

@ThomasLecocq
Copy link
Contributor Author

@ThomasLecocq ThomasLecocq commented Apr 1, 2017

ok I will do that tomorrow, easier...

@krischer
Copy link
Member

@krischer krischer commented Apr 7, 2017

Continued in #1747.

@krischer krischer closed this Apr 7, 2017
@QuLogic QuLogic added the duplicate label Apr 12, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

6 participants
You can’t perform that action at this time.