Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update data conversion functions to use ee.data methods. #1824

Closed
jdbcode opened this issue Nov 6, 2023 · 8 comments · Fixed by #1849
Closed

Update data conversion functions to use ee.data methods. #1824

jdbcode opened this issue Nov 6, 2023 · 8 comments · Fixed by #1849
Labels
enhancement improvements to geemap

Comments

@jdbcode
Copy link
Collaborator

jdbcode commented Nov 6, 2023

Use ee.data. computeFeatures and computePixels in data conversion functions. By specifying a compatible fileFormat, these methods can return data in Python-native formats like structured NumPy arrays for rasters and Pandas DataFrames or GeoPandas GeoDataFrames for vectors. In the case of vectors, computeFeatures methods will make several network requests to fetch all the pages of the table before returning the Python object.

Here is a demo

ee_to_df could benefit - what others?

@jdbcode jdbcode added Feature Request New feature or request enhancement improvements to geemap and removed Feature Request New feature or request labels Nov 6, 2023
@giswqs
Copy link
Member

giswqs commented Nov 6, 2023

These functions can benefit from the ee.data methods. I will work on this sometime next week.

image

@jdbcode
Copy link
Collaborator Author

jdbcode commented Nov 6, 2023

Thanks, Qiusheng. I'm happy to edit too, or look, answer questions, test, review when ready- let me know!

A bug was found with ee.data.computeFeatures - it currently won't return more than 1001 features. The PR should probably not be merged until the bug is fixed.

@giswqs
Copy link
Member

giswqs commented Nov 7, 2023

Sure. I will wait for the fix.

@jackoceancolor
Copy link

These functions can benefit from the ee.data methods. I will work on this sometime next week.

image

Oh my gosh! YES PLEASE!!

@giswqs
Copy link
Member

giswqs commented Dec 17, 2023

@jdbcode It seems ee.data.computePixels ignores the bandIds parameter. It always output a a 2D array without bands.

https://developers.google.com/earth-engine/apidocs/ee-data-computepixels

# Region of interest.
coords = [
    -121.58626826832939,
    38.059141484827485,
]
region = ee.Geometry.Point(coords)

# Sentinel-2 median composite.
image = (ee.ImageCollection('COPERNICUS/S2')
              .filterBounds(region)
              .filterDate('2020-04-01', '2020-09-01')
              .median())

# Make a projection to discover the scale in degrees.
proj = ee.Projection('EPSG:4326').atScale(10).getInfo()

# Get scales out of the transform.
scale_x = proj['transform'][0]
scale_y = -proj['transform'][4]

# Make a request object.
request = {
    'expression': image,
    'fileFormat': 'NUMPY_NDARRAY',
    'bandIds': ['B4', 'B3', 'B2'],
    'grid': {
        'dimensions': {
            'width': 640,
            'height': 640
        },
        'affineTransform': {
            'scaleX': scale_x,
            'shearX': 0,
            'translateX': coords[0],
            'shearY': 0,
            'scaleY': scale_y,
            'translateY': coords[1]
        },
        'crsCode': proj['crs'],
    },
    'visualizationOptions': {'ranges': [{'min': 0, 'max': 3000}]},
}

image_arr = ee.data.computePixels(request)
print(image_arr.shape)
# (640, 640) instead of (640, 640, 3)

@jdbcode
Copy link
Collaborator Author

jdbcode commented Dec 19, 2023

It returns a NumPy Structured array, which is a little different than the maybe-expected-3D array.

The structured array has the 3rd dimension included as tuples at the intersection of each row and column. They are accessible by name e.g. image_arr ['vis-red']. 'vis-red' is the band name because the request used the visualizationOptions parameters to scale to 8-bit RGB image.

in: image_arr

out: array([[(50, 69, 87), (50, 69, 87), (51, 68, 88), ..., (71, 78, 86),
        (65, 75, 86), (68, 81, 84)],
       [(50, 70, 87), (50, 69, 88), (49, 69, 89), ..., (73, 77, 88),
        (65, 76, 85), (64, 81, 83)],
       [(49, 69, 88), (50, 70, 88), (50, 70, 89), ..., (64, 75, 87),
        (63, 79, 84), (62, 80, 82)],
       ...,
       [(73, 80, 89), (66, 72, 84), (61, 69, 82), ..., (56, 67, 80),
        (55, 65, 79), (55, 65, 79)],
       [(76, 81, 90), (62, 72, 84), (57, 70, 81), ..., (54, 66, 80),
        (54, 66, 80), (54, 64, 80)],
       [(77, 82, 92), (62, 73, 83), (56, 70, 81), ..., (54, 66, 79),
        (54, 66, 79), (54, 64, 79)]],
      dtype=[('vis-red', 'u1'), ('vis-green', 'u1'), ('vis-blue', 'u1')])

in: image_arr.dtype

out: dtype([('vis-red', 'u1'), ('vis-green', 'u1'), ('vis-blue', 'u1')])

in: image_arr['vis-red']

out: array([[50, 50, 51, ..., 71, 65, 68],
       [50, 50, 49, ..., 73, 65, 64],
       [49, 50, 50, ..., 64, 63, 62],
       ...,
       [73, 66, 61, ..., 56, 55, 55],
       [76, 62, 57, ..., 54, 54, 54],
       [77, 62, 56, ..., 54, 54, 54]], dtype=uint8)

@jdbcode
Copy link
Collaborator Author

jdbcode commented Dec 19, 2023

Also, the example for ee.data.computePixels is a bit strange - they use the same params/args as the ee.data.getPixels example which is funky with how to specify the region to export. The advantage of ee.computePixels is that you can specify the area to download using an ee.Geomtry object with ee.Image.clipToBoundsAndScale:

knoxville = ee.Geometry.BBox(-84.07, 35.87, -83.79, 36.06) # some ROI
image1 = (ee.ImageCollection('COPERNICUS/S2')
              .filterBounds(knoxville)
              .filterDate('2020-07-01', '2020-09-01')
              .median()
              .setDefaultProjection('EPSG:4326', None, 20) # Or some other CRS/scale
              .clipToBoundsAndScale(geometry=knoxville, scale=20)) # Clip to ROI and scale

image_arr1 = ee.data.computePixels({
    'expression': image1,
    'fileFormat': 'NUMPY_NDARRAY',
    'bandIds': ['B4', 'B3', 'B2'],
    'visualizationOptions': {'ranges': [{'min': 0, 'max': 3000}]} # Don't need to RGB
})

display(image_arr1)


import matplotlib.pyplot as plt
plt.imshow(np.dstack(([image_arr1[band] for band in image_arr1.dtype.names])))

Here is a notebook that describes the differences: https://developers.google.com/earth-engine/tutorials/community/data-converters

@giswqs
Copy link
Member

giswqs commented Dec 19, 2023

Thank you for clarifying. It makes sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement improvements to geemap
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants