Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallelism granularity #48

Open
raulcf opened this issue Jul 20, 2016 · 3 comments
Open

Parallelism granularity #48

raulcf opened this issue Jul 20, 2016 · 3 comments

Comments

@raulcf
Copy link
Contributor

raulcf commented Jul 20, 2016

Right now is table to avoid redundant reads.
By splitting data on memory we could provide more fine-granular parallelism, i.e. per column, while still avoiding redundant reads.

@raulcf
Copy link
Contributor Author

raulcf commented Oct 26, 2016

Some evidence: in data.gov 30/10K files contain ~50% data

@sgt101
Copy link
Collaborator

sgt101 commented Oct 26, 2016

Interesting stat.... What does it mean? Amount of data in mbs? Or number
of rows?

We could find these stats for BT's warehouses if that helps?

Simon
On Wednesday, October 26, 2016, Raul notifications@github.com wrote:

Some evidence: in data.gov 30/10K files contain ~50% data


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.<
https://ci3.googleusercontent.com/proxy/3KYElAXuMB7ZvhVBaiLCSFqX09K_1FicMFlo_nJBIe1Szg41zZIW_wI8zXpX3SYuM-KbQPiV6Qww_B3uNG9I1FVDK852yCmrgoesYeMImohyrG-FyiqFPx9tMoo4xrb71SZcS75wYXjFQ1uQvigwM9kP28rb-Q=s0-d-e1-ft#https://github.com/notifications/beacon/AATC_qy74A2hmIBU77xmvpCGfVs0kY50ks5q30zRgaJpZM4JQwVL.gif>

@mansoure
Copy link
Contributor

This is very interesting problem. @raulcf how far are you with this issue? I think I can help here.

@raulcf raulcf added this to the v0.5 milestone Oct 16, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

No branches or pull requests

3 participants