Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distributed cluster gather/sort/filter/match operations #44

Closed
basaks opened this issue Nov 21, 2017 · 3 comments
Closed

Distributed cluster gather/sort/filter/match operations #44

basaks opened this issue Nov 21, 2017 · 3 comments
Assignees

Comments

@basaks
Copy link
Contributor

basaks commented Nov 21, 2017

By # 41c83f7, we can process a small number of events, say upto, 5k events. This works on single process and in memory sort/filter/joins using pandas dataframe.

However, we need to process upwards of 500k+ events. Just from ISC and Engdahl we have 300k+ events.

@basaks
Copy link
Contributor Author

basaks commented Nov 21, 2017

@alexgorb We can test 3d inversion functionality using a small number of events ( a few thousand events). Should we finish this parallelisation now before moving onto other tasks?

@basaks
Copy link
Contributor Author

basaks commented Nov 21, 2017

corresponding jira: https://gajira.atlassian.net/browse/PST-227

@basaks
Copy link
Contributor Author

basaks commented Nov 28, 2017

  1. So far, gather of arrivals is optimally distributed.
  2. Distributed median being difficult to compute, we still have a single process sort and median computation. Improved performance by using pandas throughout and avoiding for loop. For details see Improve median filter performance during 3d travel time inversion input generation #49.
  3. Matching is still single process and very efficient for the size of our data by the matching stage.

Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant