Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add dataframe groupby/aggregate + graph reindex operators #86

Merged
merged 1 commit into from
Mar 12, 2020

Conversation

slabasan
Copy link
Collaborator

@slabasan slabasan commented Oct 28, 2019

  • Add dataframe groupby/aggregate operator
  • Add graph reindex operator
  • Add tests

@slabasan slabasan added the WIP label Oct 28, 2019
@slabasan slabasan force-pushed the features/groupby-aggregate branch 3 times, most recently from 6ddc191 to 74ecb6b Compare December 10, 2019 19:11
@slabasan
Copy link
Collaborator Author

Should node be appended as an index level if a user wants to groupby module for example? The current graphframe implementation requires that node be an index level.

@slabasan slabasan force-pushed the features/groupby-aggregate branch 6 times, most recently from 266c443 to 91d47e0 Compare December 20, 2019 18:57
@slabasan slabasan force-pushed the features/groupby-aggregate branch 2 times, most recently from c04eb89 to b78872d Compare January 23, 2020 18:51
@slabasan
Copy link
Collaborator Author

If I want to groupby module, but the node has None for the module, do we drop this node?

@slabasan
Copy link
Collaborator Author

slabasan commented Feb 13, 2020

I'm trying to address this test graph, but I'm the connections aren't being set correctly. Assuming I want to groupby-aggregate by modules, I have node B and F which should be grouped in the same super node.

Original graph and associated dataframe:

0.000 A
├─ 5.000 B
│  └─ 5.000 C
│     └─ 1.000 D
└─ 10.000 E
   └─ 1.000 F

              name  time (inc)  time module
node                                       
{'name': 'A'}    A       130.0   0.0   main
{'name': 'B'}    B        20.0   5.0    foo
{'name': 'C'}    C         5.0   5.0   graz
{'name': 'D'}    D         8.0   1.0   graz
{'name': 'E'}    E        55.0  10.0    bar
{'name': 'F'}    F         1.0   1.0    foo

I want:

0.000 main
├─ 5.000 foo
│  └─ 5.000 graz
└─ 10.000 bar
   └─ 5.000 foo (same foo as above)

@slabasan slabasan removed the WIP label Feb 27, 2020
@slabasan slabasan changed the title [WIP] Add dataframe groupby/aggregate + graph reindex operators Add dataframe groupby/aggregate + graph reindex operators Feb 27, 2020
@slabasan slabasan added the WIP label Feb 27, 2020
@slabasan slabasan force-pushed the features/groupby-aggregate branch 6 times, most recently from ec0aa54 to eeaedd1 Compare March 3, 2020 16:18
@slabasan slabasan removed the WIP label Mar 3, 2020
@slabasan
Copy link
Collaborator Author

slabasan commented Mar 3, 2020

This is ready for review.

This introduces a single graphframe API that performs a groupby-aggregate on a
dataframe, then reindexes the graph.
- add node column to resulting dataframe (based on groupby-aggregate)
- add reindex tests on different graphs
@slabasan slabasan merged commit cf11f5c into hatchet:master Mar 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant