Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: add ffill checkpointing to blaze core loader #1276

Merged
merged 7 commits into from Jun 21, 2016

Conversation

@llllllllll
Copy link
Contributor

@llllllllll llllllllll commented Jun 13, 2016

Adds an optional argument to from_blaze for users to provide a ffill checkpoint table.

@coveralls
Copy link

@coveralls coveralls commented Jun 13, 2016

Coverage Status

Coverage decreased (-0.02%) to 79.751% when pulling f019e8d on blaze-loader-checkpoints into c83414e on master.

@llllllllll llllllllll force-pushed the blaze-loader-checkpoints branch from f2d4651 to 23abf5f Jun 15, 2016
:Release: 1.0.2
:Date: TBD

Enhancements

This comment has been minimized.

@richafrank

richafrank Jun 16, 2016
Member

Performance?

This comment has been minimized.

@llllllllll

llllllllll Jun 16, 2016
Author Contributor

eh, I think it is more of a feature because you will need to do work to take advantage of it.

checkpoints,
no_checkpoints_rule,
)
if {deltas, checkpoints} != {None}:

This comment has been minimized.

@richafrank

richafrank Jun 16, 2016
Member

Could you just explain this check?

This comment has been minimized.

@llllllllll

llllllllll Jun 16, 2016
Author Contributor

oh, this was not the full check that I wanted, I need to fix this. I think it should be: if 'auto' in (deltas, checkpoints)

@llllllllll llllllllll force-pushed the blaze-loader-checkpoints branch from 1360ca4 to cb67ee4 Jun 17, 2016
Returns
-------
deltas : Expr or None
metadata : Expr or None
The deltas table to use.

This comment has been minimized.

@richafrank

richafrank Jun 21, 2016
Member

This needs an update to be deltas or checkpoints.

The deltas table to use.
"""
if isinstance(deltas, bz.Expr) or deltas != 'auto':
return deltas
if isinstance(metadata_expr, bz.Expr) or metadata_expr != 'auto':

This comment has been minimized.

@richafrank

richafrank Jun 21, 2016
Member

I find this a little bit confusing. I think it's equivalent to if isinstance(metadata_expr, bz.Expr) or metadata_expr is None:, which is a bit clearer to me. What do you think?

return (
None
if expr is None else
bz.transform(expr, **{TS_FIELD_NAME: expr[AD_FIELD_NAME]})

This comment has been minimized.

@richafrank

richafrank Jun 21, 2016
Member

Is this a mutation or a new construction?

This comment has been minimized.

@llllllllll

llllllllll Jun 21, 2016
Author Contributor

All blaze operations are pure

This comment has been minimized.

@richafrank

richafrank Jun 21, 2016
Member

Ok great.

checkpoints_ts = odo(ts[ts <= lower_dt].max(), pd.Timestamp)
if pd.isnull(checkpoints_ts):
materialized_checkpoints = pd.DataFrame(columns=colnames)
lower = lower_dt

This comment has been minimized.

@richafrank

richafrank Jun 21, 2016
Member

In this case, will the materialized_expr be missing forward-filled values at its lower bound when we collect_expr(expr, lower) below?

This comment has been minimized.

@llllllllll

llllllllll Jun 21, 2016
Author Contributor

Ah, I think this case needs to be lower = None because this means we have no checkpoints so we should just query everything.

This comment has been minimized.

@llllllllll

llllllllll Jun 21, 2016
Author Contributor

I will add a test for this case.

materialized_checkpoints,
materialized_expr,
),
ignore_index=True,

This comment has been minimized.

@richafrank

richafrank Jun 21, 2016
Member

What's the thought - are the indices not consistent here?

This comment has been minimized.

@llllllllll

llllllllll Jun 21, 2016
Author Contributor

they don't matter, but I would rather they be a simple index instead of a non-unique index.

This comment has been minimized.

@richafrank

richafrank Jun 21, 2016
Member

Oh, why's that?

This comment has been minimized.

@llllllllll

llllllllll Jun 21, 2016
Author Contributor

Some operations are not supported by structures if the index isn't unique. Also, in later versions of pandas a simple index can be a RangeIndex object which will be much more memory efficient.

This comment has been minimized.

@richafrank

richafrank Jun 21, 2016
Member

Ok thanks.

deltas : Expr or None
The deltas table to use.
metadata : Expr or None
The the deltas or metadata table to use.

This comment has been minimized.

@richafrank

richafrank Jun 21, 2016
Member

typo - extra "the"

@coveralls
Copy link

@coveralls coveralls commented Jun 21, 2016

Coverage Status

Coverage increased (+0.06%) to 79.839% when pulling cb266b9 on blaze-loader-checkpoints into 7176bb7 on master.

@coveralls
Copy link

@coveralls coveralls commented Jun 21, 2016

Coverage Status

Coverage increased (+0.06%) to 79.839% when pulling b210acb on blaze-loader-checkpoints into 7176bb7 on master.

@llllllllll llllllllll merged commit d608e0a into master Jun 21, 2016
2 checks passed
2 checks passed
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@llllllllll llllllllll deleted the blaze-loader-checkpoints branch Jun 21, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

4 participants
You can’t perform that action at this time.