New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Blaze core loader fixes and performance improvements #1866

Merged
merged 17 commits into from Jul 21, 2017

Conversation

Projects
None yet
4 participants
@llllllllll
Member

llllllllll commented Jun 29, 2017

Change the blaze core loader algorithm.

TODO: document the new algorithm before merging.

NOTE: in test_cols_with_some_missing_vals I have left the old expected values commented out for int_value and bool_value. Right now we treat the missing value differently for ffill purposes on those dtypes but idk why

@llllllllll llllllllll requested review from ehebert and ssanderson Jun 29, 2017

@llllllllll llllllllll changed the title from Blaze core loader fixes and performance improvements to WIP: Blaze core loader fixes and performance improvements Jun 29, 2017

@llllllllll llllllllll requested a review from kglowinski Jun 29, 2017

@coveralls

This comment has been minimized.

coveralls commented Jun 30, 2017

Coverage Status

Coverage decreased (-0.3%) to 87.401% when pulling a1b6e73 on blaze-core-loader-fixes into 5e6110e on master.

@llllllllll llllllllll force-pushed the blaze-core-loader-fixes branch from 3275631 to ea41b9a Jun 30, 2017

@coveralls

This comment has been minimized.

coveralls commented Jun 30, 2017

Coverage Status

Coverage decreased (-0.1%) to 87.399% when pulling ea41b9a on blaze-core-loader-fixes into 5343344 on master.

@llllllllll llllllllll force-pushed the blaze-core-loader-fixes branch from ea41b9a to 9e1cc37 Jun 30, 2017

@coveralls

This comment has been minimized.

coveralls commented Jun 30, 2017

Coverage Status

Coverage decreased (-0.1%) to 87.399% when pulling 9e1cc37 on blaze-core-loader-fixes into 5343344 on master.

@coveralls

This comment has been minimized.

coveralls commented Jun 30, 2017

Coverage Status

Coverage decreased (-0.1%) to 87.399% when pulling 9e1cc37 on blaze-core-loader-fixes into 5343344 on master.

@coveralls

This comment has been minimized.

coveralls commented Jul 1, 2017

Coverage Status

Coverage decreased (-0.5%) to 87.046% when pulling 594c450 on blaze-core-loader-fixes into 5343344 on master.

@coveralls

This comment has been minimized.

coveralls commented Jul 1, 2017

Coverage Status

Coverage decreased (-0.1%) to 87.407% when pulling 594c450 on blaze-core-loader-fixes into 5343344 on master.

@llllllllll llllllllll force-pushed the blaze-core-loader-fixes branch from 616ad67 to 4b06319 Jul 6, 2017

@coveralls

This comment has been minimized.

coveralls commented Jul 6, 2017

Coverage Status

Coverage decreased (-0.1%) to 87.379% when pulling 1a17936 on blaze-core-loader-fixes into 9d01ba7 on master.

1 similar comment
@coveralls

This comment has been minimized.

coveralls commented Jul 6, 2017

Coverage Status

Coverage decreased (-0.1%) to 87.379% when pulling 1a17936 on blaze-core-loader-fixes into 9d01ba7 on master.

@ehebert

First round of review. I am going to do a second round focused on _core.array_for_column_impl.

# ignore sids that are not requested
continue
column_ix = <object> column_ix_ob # cast to np.int64_t

This comment has been minimized.

@ehebert

ehebert Jul 7, 2017

Member

Why isn't this Py_ssize_t or np.intp_t?

This comment has been minimized.

@llllllllll

llllllllll Jul 7, 2017

Member

which expression are you talking about?

This comment has been minimized.

@ssanderson

ssanderson Jul 13, 2017

Member

I think @ehebert is asking why column_ix is statically typed as int64_t instead of Py_ssize_t or np.intp_t since we're always using it as an array index with wraparound disabled?

This comment has been minimized.

@llllllllll

llllllllll Jul 13, 2017

Member

npy_intp is int*, using that as an integral value is weird. Py_ssize_t and npy_int64 are the same thing.

@@ -138,7 +138,7 @@
from __future__ import division, absolute_import
from abc import ABCMeta, abstractproperty
from collections import namedtuple, defaultdict

This comment has been minimized.

@ehebert

ehebert Jul 7, 2017

Member

👍 on removal of defaultdict.

@coveralls

This comment has been minimized.

coveralls commented Jul 7, 2017

Coverage Status

Coverage decreased (-0.1%) to 87.379% when pulling 5f8435b on blaze-core-loader-fixes into 9d01ba7 on master.

@coveralls

This comment has been minimized.

coveralls commented Jul 8, 2017

Coverage Status

Coverage decreased (-0.1%) to 87.379% when pulling 0b6e913 on blaze-core-loader-fixes into 9d01ba7 on master.

@ssanderson ssanderson force-pushed the blaze-core-loader-fixes branch from 0b6e913 to 14e685e Jul 11, 2017

@coveralls

This comment has been minimized.

coveralls commented Jul 11, 2017

Coverage Status

Coverage decreased (-0.1%) to 87.379% when pulling 14e685e on blaze-core-loader-fixes into fb248ef on master.

@ssanderson

@llllllllll I posted what I have since I'm away at scipy for this week. I think the only remaining issue I didn't comment on is the issue we talked about in person re: forward-filling booleans.

@@ -128,6 +128,16 @@ def __ne__(self, other):
# Numexpr doesn't know how to use LabelArrays.
return ArrayPredicate(term=self, op=operator.ne, opargs=(other,))
def bad_compare(self, other):
raise TypeError('cannot compare values of type' % type(self).__name__)

This comment has been minimized.

@ssanderson

ssanderson Jul 13, 2017

Member

The format string here doesn't match what we're formatting in. Probably could do with a quick test case for this.

This comment has been minimized.

@ssanderson

ssanderson Jul 13, 2017

Member

Wondering if we should just add these dispatches to the base Term class and then have Factor actually implement them

This comment has been minimized.

@llllllllll

llllllllll Jul 14, 2017

Member

idk, what do you think is best here?

order=order,
casting=casting,
subok=subok,
copy=copy,

This comment has been minimized.

@ssanderson

ssanderson Jul 13, 2017

Member

I think we can pass copy=False here since we know we already made a copy in as_string_array?

This comment has been minimized.

@ssanderson

ssanderson Jul 13, 2017

Member

Actually, I don't think this matters either way because the coercion from object -> S is going to copy no matter what.

def astype(self,
dtype,
order='K',
casting='unsafe',

This comment has been minimized.

@ssanderson

ssanderson Jul 13, 2017

Member

Is 'unsafe' the default for this in numpy?

This comment has been minimized.

@llllllllll

llllllllll Jul 13, 2017

Member
a.astype(dtype, order='K', casting='unsafe', subok=True, copy=True)
@@ -0,0 +1,607 @@
cdef class Adjustment:

This comment has been minimized.

@ssanderson

ssanderson Jul 13, 2017

Member

What does this file do? I haven't seen .pyd before.

This comment has been minimized.

@ssanderson

ssanderson Jul 13, 2017

Member

It seems like this is just a copy of adjustment.pyx?

This comment has been minimized.

@llllllllll

llllllllll Jul 13, 2017

Member

pxd files are like headers so that the functions can be cimported in other cython files. you need the function signatures and class member definitions but that's it.

This comment has been minimized.

@ssanderson

ssanderson Jul 13, 2017

Member

This is a .pyd file though, not a .pxd...

This comment has been minimized.

@llllllllll

llllllllll Jul 13, 2017

Member

this was committed incorrectly, I have a pxd also. this makes me think it is not needed since the tests ran.

This comment has been minimized.

@llllllllll

llllllllll Jul 14, 2017

Member

this file is totally named incorrectly, it shouldn't have been committed at all.

cdef class Adjustment:
"""

This comment has been minimized.

@ssanderson

ssanderson Jul 13, 2017

Member

Will Cython raise an error if the .pyd gets out of sync with the .pyx file?

This comment has been minimized.

@llllllllll

llllllllll Jul 13, 2017

Member

I doubt it

The keyword arguments to forward to the odo calls internally.
apply_deltas_adjustments : bool, optional
Whether or not deltas adjustments should be applied to the baseline
values. If False, only novel deltas will be applied.

This comment has been minimized.

@ssanderson

ssanderson Jul 13, 2017

Member

I don't think this comment accurately describes what this flag does now. This basically says ignore_deltas, but if that's the case I think the consumer can get the same behavior by just passing deltas=False?

),
'int_value': np.array(
[1, 0, 3, 1, 0, 3, 1, 2, 3],
# [1, 0, 3, 1, 0, 0, 1, 2, 0],

This comment has been minimized.

@ssanderson

ssanderson Jul 13, 2017

Member

What's up with the commented-out lines here?

This comment has been minimized.

@ssanderson

ssanderson Jul 13, 2017

Member

I think you answered this for me when we were looking at this in person.

This comment has been minimized.

@llllllllll

llllllllll Jul 14, 2017

Member

these were the changes I had questions about, we talked in person and we discussed the new ffill algorithm to preserve the old behavior

asof_ix = asof_ixs[n]
if asof_ix == out_of_bounds_ix:
raise ValueError('asof_date newer than timestamp')

This comment has been minimized.

@ssanderson

ssanderson Jul 13, 2017

Member

Should we put the asof/timestamp in this error message?

This comment has been minimized.

@llllllllll

llllllllll Jul 13, 2017

Member

if this happens at all it probably happens in a lot of rows. in practice when you hit this case you need to run through the debugger to look at the dataset and see what is broken

dt.date(),
data_query_time,
).tz_localize(data_query_tz).tz_convert('utc')
Py_INCREF(combined)

This comment has been minimized.

@ssanderson

ssanderson Jul 13, 2017

Member

Given the amount of work we're doing in the .combine method above, it really doesn't seem worth the extra complexity and worry to be doing manual reference management on the values here.

else:
ts_dates = dates
cdef np.ndarray[np.int64_t] ts_ixs = ts_dates.searchsorted(

This comment has been minimized.

@ssanderson

ssanderson Jul 13, 2017

Member

Commentary on why we're using side='right' here might be helpful.

This comment has been minimized.

@llllllllll

llllllllll Jul 13, 2017

Member

added one for ts and one for asof because they are different reasons.

@coveralls

This comment has been minimized.

coveralls commented Jul 13, 2017

Coverage Status

Coverage decreased (-0.1%) to 87.379% when pulling 28aab3a on blaze-core-loader-fixes into fb248ef on master.

@coveralls

This comment has been minimized.

coveralls commented Jul 14, 2017

Coverage Status

Coverage decreased (-0.1%) to 87.37% when pulling 285f719 on blaze-core-loader-fixes into fb248ef on master.

1 similar comment
@coveralls

This comment has been minimized.

coveralls commented Jul 14, 2017

Coverage Status

Coverage decreased (-0.1%) to 87.37% when pulling 285f719 on blaze-core-loader-fixes into fb248ef on master.

@llllllllll

This comment has been minimized.

Member

llllllllll commented Jul 15, 2017

any more comments on this or are we just waiting for the downstream consumers?

@ssanderson

Took a second pass at this. I just had one comment about a possibly-significant warning in the new cython module. Proposed fix here: #1891

return value == missing_value
cdef inline unsafe_setslice_column(np.ndarray[column_type, ndim=2] array,

This comment has been minimized.

@ssanderson

ssanderson Jul 18, 2017

Member

I'm seeing warnings from Cython here saying that "Buffer unpacking not optimized away". https://jakevdp.github.io/blog/2012/08/16/memoryview-benchmarks-2/ suggests that this is actually a pretty significant cost.

@coveralls

This comment has been minimized.

coveralls commented Jul 19, 2017

Coverage Status

Coverage decreased (-0.1%) to 87.37% when pulling a20696e on blaze-core-loader-fixes into fb248ef on master.

@llllllllll llllllllll force-pushed the blaze-core-loader-fixes branch from a20696e to bd7db28 Jul 19, 2017

@coveralls

This comment has been minimized.

coveralls commented Jul 19, 2017

Coverage Status

Coverage decreased (-0.1%) to 87.37% when pulling bd7db28 on blaze-core-loader-fixes into 15fe38c on master.

@llllllllll llllllllll force-pushed the blaze-core-loader-fixes branch from bd7db28 to c0ecbf5 Jul 20, 2017

@coveralls

This comment has been minimized.

coveralls commented Jul 21, 2017

Coverage Status

Coverage decreased (-0.1%) to 87.506% when pulling c0ecbf5 on blaze-core-loader-fixes into 7e1abe9 on master.

@llllllllll llllllllll merged commit 83c3ff4 into master Jul 21, 2017

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@llllllllll llllllllll deleted the blaze-core-loader-fixes branch Jul 21, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment