Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Mmann1123 ray extract #300

Closed
wants to merge 7 commits into from
Closed

feat: Mmann1123 ray extract #300

wants to merge 7 commits into from

Conversation

mmann1123
Copy link
Collaborator

What is this PR changing?

Adding ray extract client - helps avoid threading conflicts for large polygon files

Checklist

  • [x ] Remember to add a semantic tag to the commit name

Tag options:

  • feat: A new feature
  • fix: A bug fix
  • docs: Documentation only changes
  • style: Changes that do not affect the meaning of the code (white-space, formatting, missing semi-colons, etc)
  • refactor: A code change that neither fixes a bug nor adds a feature
  • perf: A code change that improves performance
  • test: Adding missing or correcting existing tests
  • chore: Changes to the build process or auxiliary tools and libraries such as documentation generation

Example:

fix: <branch name> PR number

@mmann1123 mmann1123 added the enhancement New feature or request label Mar 7, 2024
@mmann1123
Copy link
Collaborator Author

@jgrss this seems to radically help with large sets of polygons etc

import ray

if not ray.is_initialized():
ray.init()
Copy link
Owner

@jgrss jgrss Mar 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you pass processes to ray with ray.init(num_cpus=processes)?

This method is intended to be used with Ray for distributed computing.
Assumes `data` is accessible in the scope where this function is called.
"""
return data.isel(band=bands_idx, y=yidx, x=xidx).data.compute()
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need to pass .compute(num_workers=1) with ray? What are the processes being used? Do you have ray + dask threading?

return data.isel(band=bands_idx, y=yidx, x=xidx).data.compute()

# Dynamically assign the Ray-enabled method to the class.
SpatialOperations.extract_data_slice = _ray_extract_data_slice
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need to set this class method? Could you instead do

res = ray.get(
    _ray_extract_data_slice.remote(data, bands_idx, yidx, xidx)
)


if not ray.is_initialized():
ray.init()
res = ray.get(
Copy link
Owner

@jgrss jgrss Mar 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you comment on what ray is doing? I'm curious how a wrapped ray.remote method calling Xarray isel is faster than calling isel directly.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jgrss yeah that is a good question. I may have gone off half cocked here. I need to do some more testing to see if we really have an improvement. I am however suffering with an issue, when we large stacked inputs and large polygons I need to figure out how to chunk the process. I am working on another idea under mmann1123_ray_extract2

Copy link
Owner

@jgrss jgrss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mmann1123 -- a few comments first so that I can understand the ray.remote wrapper around Xarray and a dask compute.

@mmann1123 mmann1123 marked this pull request as draft March 16, 2024 13:07
@mmann1123 mmann1123 closed this Apr 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants