Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to calculate the bootstrap error and its confidence interval of a time series data #726

Closed
geophysics91 opened this issue Sep 5, 2020 · 2 comments
Labels

Comments

@geophysics91
Copy link

geophysics91 commented Sep 5, 2020

Dear experts, i need to calculate the bootstrap error of the 5 time series data appended in a file. In side the time_series files five time series data are separated with > > symbols. https://i.fluffy.cc/12NLsqHhTTcvR67btNjRzXZCkbpkfw9c.html can anybody suggest better way to do it. I tried http://rasbt.github.io/mlxtend/user_guide/evaluate/bootstrap/#example-1-bootstrapping-the-mean but its for only single timseries data

@pkaf
Copy link

pkaf commented Sep 15, 2020

Looking at the implementation of

bootstrap(x, func, num_rounds=1000, ci=0.95, ddof=1, seed=None)

it says x can be (n_samples, [n_columns]), perhaps you need to reshape your data to have this dimension?

@rasbt
Copy link
Owner

rasbt commented Sep 15, 2020

it says x can be (n_samples, [n_columns]), perhaps you need to reshape your data to have this dimension?

Yes, @pkaf is correct it can be both an 1D or 2D array. Reshaping may not be necessary though. It depends on what your argument for fun is. E.g., the numpy mean function can compute the mean for both 1D and 2D arrays so both

import numpy as np
from mlxtend.evaluate import bootstrap


rng = np.random.RandomState(123)
x = rng.normal(loc=5., size=100)
original, std_err, ci_bounds = bootstrap(x, num_rounds=1000, func=np.mean, ci=0.95, seed=123)
print('Mean: %.2f, SE: +/- %.2f, CI95: [%.2f, %.2f]' % (original, 
                                                        std_err, 
                                                        ci_bounds[0],
                                                        ci_bounds[1]))

and

rng = np.random.RandomState(123)
x = rng.normal(loc=5., size=(100, 2))
original, std_err, ci_bounds = bootstrap(x, num_rounds=1000, func=np.mean, ci=0.95, seed=123)
print('Mean: %.2f, SE: +/- %.2f, CI95: [%.2f, %.2f]' % (original, 
                                                        std_err, 
                                                        ci_bounds[0],
                                                        ci_bounds[1]))

would work.

You could also handle the reshaping yourself if it is necessary for your func. E.g., like in the example below:

from mlxtend.data import autompg_data

from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score

X, y = autompg_data()


lr = LinearRegression()

def r2_fit(X, model=lr):
    x, y = X[:, 0].reshape(-1, 1), X[:, 1]
    pred = lr.fit(x, y).predict(x)
    return r2_score(y, pred)


original, std_err, ci_bounds = bootstrap(X, num_rounds=1000,
                                         func=r2_fit,
                                         ci=0.95,
                                         seed=123)
print('Mean: %.2f, SE: +/- %.2f, CI95: [%.2f, %.2f]' % (original, 
                                                             std_err, 
                                                             ci_bounds[0],
                                                             ci_bounds[1]))

@rasbt rasbt added the Question label Sep 15, 2020
@rasbt rasbt closed this as completed Feb 8, 2021
Repository owner locked and limited conversation to collaborators Feb 8, 2021

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
Projects
None yet
Development

No branches or pull requests

3 participants