Skip to content

Error on python deserialization of tile with bool cell_type #188

@vpipkt

Description

@vpipkt

Description

With various logical local algebra operations, a bool cell type tile is returned.

from pyrasterframes.rasterfunctions import rf_local_equal, rf_convert_cell_type
from pyspark.sql.functions import lit

df = spark.read.raster('/data/raster/example.tif')
df = df.withColumn('trouble', rf_local_equal('proj_raster', 42))
df.limit(10).toPandas()

Expected result

We should be able to return bool cell type Tiles to the Python driver via collect, head, toPandas etc operations on the dataframe.

Actual result

Deserialization error as shown below

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/opt/conda/lib/python3.7/site-packages/pyrasterframes/rf_types.py in deserialize(self, datum)
    437             as_numpy = np.frombuffer(cell_data_bytes, dtype=cell_type.to_numpy_dtype())
--> 438             reshaped = as_numpy.reshape((rows, cols))
    439             t = Tile(reshaped, cell_type)

ValueError: cannot reshape array of size 8192 into shape (256,256)
.
.
.

Work around

Explicitly convert the cell type to int8.

from pyrasterframes.rf_types import CellType
ct = CellType.int8()
df = df.drop('trouble') \
       .withColumn('work_around', 
                   rf_convert_cell_type(
                       rf_local_equal('proj_raster', lit(42)),
                   ct))
df.limit(10).toPandas()

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugWhen it really isn't a "feature".doozieA hard issue/bug dealing with deep Spark internals.

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions