Add groupby aggregate tests, along with a few additions for cudf integration. #268

chinmaychandak · 2019-08-19T21:10:14Z

Most changes made to aggregations.py attempt to fix #266. Somehow, as of now, cudf does not do the index naming implicitly like Pandas.

Other changes in the file attempt to create a temporary fallback on Pandas Timedelta API to perform window-over-time-groupby aggregations.

…ration for SDFs

codecov-io · 2019-08-19T21:22:54Z

Codecov Report

Merging #268 into master will decrease coverage by 0.02%.
The diff coverage is 93.75%.

@@            Coverage Diff             @@
##           master     #268      +/-   ##
==========================================
- Coverage   94.71%   94.69%   -0.03%     
==========================================
  Files          13       13              
  Lines        1609     1620      +11     
==========================================
+ Hits         1524     1534      +10     
- Misses         85       86       +1

Impacted Files	Coverage Δ
streamz/dataframe/aggregations.py	`98.84% <93.75%> (-0.27%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a00b6e3...e691394. Read the comment docs.

jsmaupin · 2019-08-20T18:16:18Z

streamz/dataframe/tests/test_cudf_dataframes.py



-@pytest.fixture(params=['core', 'dask'])
+@pytest.fixture(params=["core", "dask"])


Are we standardizing on double quotes? From what I've seen, Python code has always used single quotes for strings unless there was a reason to do otherwise.

If we are going to go this way we should commit to black and be done with it IMO.

jsmaupin · 2019-08-20T18:22:29Z

streamz/dataframe/aggregations.py

-            if hasattr(og, 'index'):
-                assert (o.index == og.index).all()
+#             if hasattr(og, 'index'):
+#                 assert (o.index == og.index).all()


I would rather not check in commented out code. Just delete the code and if we need to restore it will be in the file history in Git.

jsmaupin · 2019-08-20T18:27:24Z

streamz/dataframe/aggregations.py

    old = []
-    while dfs[0].index.min() < mn:
+    while pd.Timestamp(dfs[0].index.min()) < mn:
        o = dfs[0].loc[:mn]


I'm curious as to how casting to a pandas timestamp helps here.

For now, cudf's df.index.min() (or max()) returns a numpy.datetime64 as opposed to Pandas dataframes returning a pandas._libs.tslibs.timestamps.Timestamp. Hence the explicit cast to make it compatible with Pandas Timedelta, which is required for these operations.

The statements modified for this purpose would be redundant for Pandas, since the types are compatible with Pandas Timedelta.

jsmaupin · 2019-08-20T18:27:58Z

streamz/dataframe/aggregations.py

-    mx = max(df.index.max() for df in dfs)
-    mn = mx - window
+    mx = pd.Timestamp(max(df.index.max() for df in dfs))
+    mn = pd.Timestamp(mx) - window


Isn't mx already a pd.Timestamp type, why pass it into another pd.Timestamp(...) type?

…ration for SDFs

chinmaychandak · 2019-08-22T00:46:44Z

@martindurant Could you please review my code and merge it soon, if possible? I'd be happy to make any changes you think would be necessary.

chinmaychandak · 2019-08-29T18:12:59Z

Hey @CJ-Wright, could you please have a look at this? I'd appreciate it if this could be merged soon! :)

CJ-Wright · 2019-08-31T17:50:45Z

I'll try to look at it soon (I need to read in more of the dataframe things)

chinmaychandak · 2019-09-03T15:45:18Z

That would be great, thanks @CJ-Wright!

chinmaychandak · 2019-09-04T19:09:30Z

@CJ-Wright Did you have a chance to look at this yet? This is a major blocker for a bigger project! :(

CJ-Wright · 2019-09-05T01:15:09Z

Seems reasonable to me. I wish we had codecov on this repo.

CJ-Wright

Next time please don't make style changes (single vs double quote) to code that you aren't substantially changing, it makes review a bit more difficult, since I need to parse which lines are logic changes and which are style changes.

chinmaychandak · 2019-09-05T02:18:05Z

Next time please don't make style changes (single vs double quote) to code that you aren't substantially changing, it makes review a bit more difficult, since I need to parse which lines are logic changes and which are style changes.

Sure, will definitely keep this in mind moving forward.

@CJ-Wright Thanks a lot for reviewing and merging this, really appreciate it! Is it possible to upload the updated conda package with these changes?

CJ-Wright · 2019-09-05T02:38:03Z

We'd need to cut a release. Can you open up an issue for this?

chinmaychandak · 2019-09-05T03:00:29Z

Sure, created #271. Please let me know if I need to do anything else. Thanks!

Add groupby aggregate tests, along with a few additons for cudf integ…

14acd8a

…ration for SDFs

jsmaupin reviewed Aug 20, 2019

View reviewed changes

Chinmay Chandak added 8 commits August 20, 2019 20:45

Add groupby aggregate tests, along with a few additons for cudf integ…

cf4801d

…ration for SDFs

Merge branch 'master' of https://github.com/chinmaychandak/streamz

12f64fa

Merge branch 'master' of https://github.com/chinmaychandak/streamz

4f68af8

Merge branch 'master' of https://github.com/chinmaychandak/streamz

e1e2401

Merge branch 'master' of https://github.com/chinmaychandak/streamz

2bca156

Merge branch 'master' of https://github.com/chinmaychandak/streamz

69c12fb

Merge branch 'master' of https://github.com/chinmaychandak/streamz

5749ef0

Merge branch 'master' of https://github.com/chinmaychandak/streamz

e691394

CJ-Wright reviewed Sep 5, 2019

View reviewed changes

CJ-Wright merged commit d60a6e4 into python-streamz:master Sep 5, 2019

chinmaychandak mentioned this pull request Sep 5, 2019

Include updated cudf integration for SDFs in conda-forge package. #271

Closed



		@pytest.fixture(params=['core', 'dask'])
		@pytest.fixture(params=["core", "dask"])

Add groupby aggregate tests, along with a few additions for cudf integration. #268

Add groupby aggregate tests, along with a few additions for cudf integration. #268

Uh oh!

Conversation

chinmaychandak commented Aug 19, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-io commented Aug 19, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chinmaychandak Aug 20, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chinmaychandak commented Aug 22, 2019

Uh oh!

chinmaychandak commented Aug 29, 2019

Uh oh!

CJ-Wright commented Aug 31, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chinmaychandak commented Sep 3, 2019

Uh oh!

chinmaychandak commented Sep 4, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CJ-Wright commented Sep 5, 2019

Uh oh!

CJ-Wright left a comment

Choose a reason for hiding this comment

Uh oh!

chinmaychandak commented Sep 5, 2019

Uh oh!

CJ-Wright commented Sep 5, 2019

Uh oh!

chinmaychandak commented Sep 5, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

chinmaychandak commented Aug 19, 2019 •

edited

Loading

codecov-io commented Aug 19, 2019 •

edited

Loading

chinmaychandak Aug 20, 2019 •

edited

Loading

CJ-Wright commented Aug 31, 2019 •

edited

Loading

chinmaychandak commented Sep 4, 2019 •

edited

Loading

chinmaychandak commented Sep 5, 2019 •

edited

Loading