Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimization #5

Open
3 of 4 tasks
rokroskar opened this issue Jul 25, 2016 · 1 comment
Open
3 of 4 tasks

optimization #5

rokroskar opened this issue Jul 25, 2016 · 1 comment

Comments

@rokroskar
Copy link
Owner

rokroskar commented Jul 25, 2016

Currently, several spots in the workflow are taking a huge amount of time to run. Where are the bottle necks? Can we easily solve them? One idea is to minimize (de)serialization overhead by keeping partitions in full numpy arrays -- but will this work? For some of the steps, serialization is still a big problem.

Keep a list of places in need of optimization here:

  • reading particles and setting particle IDs (still room for small optimization in setting the actual IDs)
  • the particle arrays in cython code should be memory views of cfof.PARTICLE type
  • the first part of the group merge stage computes the group mappings across domains -- this currently does a full data shuffle because there is no partition information provided. We could do better by taking the RDD of PRIMARY_GHOST_PARTICLE particles and doing a union on the RDD of GHOST_PARTICLE_COPY particles, which will need to be shuffled. But this should result in a much smaller data shuffle overall because there are many more primary ghost particles. currently this is pretty fast, will not change for the moment
  • in count_groups_partition_cython the for loop is probably not needed -- just concatenate the partition and run np.unique once to get the (groupID, count) tuples
rokroskar added a commit that referenced this issue Jul 25, 2016
@rokroskar
Copy link
Owner Author

rokroskar commented Sep 19, 2016

memoryviews implemented in #7

@rokroskar rokroskar reopened this Sep 19, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant