Parallelism granularity #48

raulcf · 2016-07-20T13:07:19Z

Right now is table to avoid redundant reads.
By splitting data on memory we could provide more fine-granular parallelism, i.e. per column, while still avoiding redundant reads.

raulcf · 2016-10-26T12:51:29Z

Some evidence: in data.gov 30/10K files contain ~50% data

sgt101 · 2016-10-26T15:46:58Z

Interesting stat.... What does it mean? Amount of data in mbs? Or number
of rows?

We could find these stats for BT's warehouses if that helps?

Simon
On Wednesday, October 26, 2016, Raul notifications@github.com wrote:

Some evidence: in data.gov 30/10K files contain ~50% data

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.<
https://ci3.googleusercontent.com/proxy/3KYElAXuMB7ZvhVBaiLCSFqX09K_1FicMFlo_nJBIe1Szg41zZIW_wI8zXpX3SYuM-KbQPiV6Qww_B3uNG9I1FVDK852yCmrgoesYeMImohyrG-FyiqFPx9tMoo4xrb71SZcS75wYXjFQ1uQvigwM9kP28rb-Q=s0-d-e1-ft#https://github.com/notifications/beacon/AATC_qy74A2hmIBU77xmvpCGfVs0kY50ks5q30zRgaJpZM4JQwVL.gif>

mansoure · 2016-11-20T07:00:21Z

This is very interesting problem. @raulcf how far are you with this issue? I think I can help here.

raulcf added the enhancement label Jul 20, 2016

mansoure mentioned this issue Nov 20, 2016

Decouple indexing from profiling #51

Open

raulcf added this to the v0.5 milestone Oct 16, 2017

raulcf added this to Various in DDProfiler for version v0.5 Nov 10, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallelism granularity #48

Parallelism granularity #48

raulcf commented Jul 20, 2016

raulcf commented Oct 26, 2016

sgt101 commented Oct 26, 2016

mansoure commented Nov 20, 2016

Parallelism granularity #48

Parallelism granularity #48

Comments

raulcf commented Jul 20, 2016

raulcf commented Oct 26, 2016

sgt101 commented Oct 26, 2016

mansoure commented Nov 20, 2016