Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Odd cubing behavoir with extract #80

Closed
mmann1123 opened this issue Oct 1, 2021 · 4 comments
Closed

Odd cubing behavoir with extract #80

mmann1123 opened this issue Oct 1, 2021 · 4 comments
Assignees
Labels
bug Something isn't working

Comments

@mmann1123
Copy link
Collaborator

mmann1123 commented Oct 1, 2021

I am again seeing some oddities with extract. Wondering if you have any ideas for problem solving it, just not sure where to start. I have a large vector dataset of 70k administrative units in Ethiopia, trying to get monthly precip means for each unit. Precip is from CHIRPs and is 0.05 dd res.

Example image, resampled to 0.01dd.
resample

Open and extract code:

with gw.config.update(ref_bounds=bounds, ref_res=(0.01, 0.01)):
    with gw.open(
        f_list,
        band_names=["ppt"],
        time_names=dates,
        nodata=-9999,
        resampling="bilinear",
    ) as ds:
        print(ds)
        df = ds.gw.extract(
            aoi=eas,
            all_touched=True,
            band_names=ds.band.values.tolist(),
            time_names=ds.time.values.tolist(),
            n_threads=4,  # n_jobs creates memory error that is uncaught
            verbose=2,
        )
        print(df.head)

        ag = df.groupby(by=["id"])   # ive also tried aggregating by the administrative code, no change
            .agg("mean")
        )

1st issue, without resampling to a higher resolution say 0.01 dd, 30k+ administrative units return no data, even with 'all touched' as true.

2nd issue, if I do resample, my outputted admin units take on a really weird block pattern. I have also tried the groupby based on the unique code for the admin unit, although id seems to correspond to the index on the features. This is the same for resample = bilinear and nearest.

Screenshot from 2021-10-01 15-43-07

@jgrss

@mmann1123 mmann1123 added the bug Something isn't working label Oct 1, 2021
@jgrss
Copy link
Owner

jgrss commented Oct 4, 2021

@mmann1123 can you send me small versions of this data to test?

@mmann1123
Copy link
Collaborator Author

mmann1123 commented Oct 5, 2021

@jgrss I just shared the dropbox folders with you. Keep in mind also that I am implementing my fix to issue #75, which I am 95% sure is unrelated to this issue. Let me know if you want a more compact example to work with.

@jgrss
Copy link
Owner

jgrss commented Oct 8, 2021

Hey, I don't have an active Dropbox account. Is there another way you can send the data? Are they small enough to email?

@jgrss
Copy link
Owner

jgrss commented Oct 19, 2021

Hi @mmann1123 if your vector data are polygons geowombat.extract will do:

df = gpd.overlay(df,
                 gpd.GeoDataFrame(data=[0],
                                  geometry=[data.gw.geometry],
                                  crs=df_crs),
                 how='intersection').drop(columns=[0])

I tried to clip the shapefile that you sent by approximating the image bounds in your graphic above. When I did that I received a geopandas error. Are you able to clip the shapefile to a smaller bounding box by gpd.overlay or by:

gpd.clip(esa.to_crs('epsg:4326'), bounds)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants