Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error on python deserialization of tile with bool cell_type #188

Open
vpipkt opened this issue Jul 17, 2019 · 1 comment
Open

Error on python deserialization of tile with bool cell_type #188

vpipkt opened this issue Jul 17, 2019 · 1 comment
Labels
bug When it really isn't a "feature". doozie A hard issue/bug dealing with deep Spark internals.
Milestone

Comments

@vpipkt
Copy link
Member

vpipkt commented Jul 17, 2019

Description

With various logical local algebra operations, a bool cell type tile is returned.

from pyrasterframes.rasterfunctions import rf_local_equal, rf_convert_cell_type
from pyspark.sql.functions import lit

df = spark.read.raster('/data/raster/example.tif')
df = df.withColumn('trouble', rf_local_equal('proj_raster', 42))
df.limit(10).toPandas()

Expected result

We should be able to return bool cell type Tiles to the Python driver via collect, head, toPandas etc operations on the dataframe.

Actual result

Deserialization error as shown below

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/opt/conda/lib/python3.7/site-packages/pyrasterframes/rf_types.py in deserialize(self, datum)
    437             as_numpy = np.frombuffer(cell_data_bytes, dtype=cell_type.to_numpy_dtype())
--> 438             reshaped = as_numpy.reshape((rows, cols))
    439             t = Tile(reshaped, cell_type)

ValueError: cannot reshape array of size 8192 into shape (256,256)
.
.
.

Work around

Explicitly convert the cell type to int8.

from pyrasterframes.rf_types import CellType
ct = CellType.int8()
df = df.drop('trouble') \
       .withColumn('work_around', 
                   rf_convert_cell_type(
                       rf_local_equal('proj_raster', lit(42)),
                   ct))
df.limit(10).toPandas()
@vpipkt vpipkt added the bug When it really isn't a "feature". label Jul 17, 2019
@metasim metasim added this to the 0.8.0 milestone Jul 25, 2019
@vpipkt vpipkt self-assigned this Aug 8, 2019
@vpipkt
Copy link
Member Author

vpipkt commented Aug 8, 2019

Stragegy for time being

  1. Document bug and workaround in release notes
  2. Set milestone for 0.8.1

@vpipkt vpipkt closed this as completed Aug 8, 2019
@vpipkt vpipkt reopened this Aug 8, 2019
@vpipkt vpipkt modified the milestones: 0.8.0, 0.8.1 Aug 8, 2019
@vpipkt vpipkt removed their assignment Aug 21, 2019
@metasim metasim modified the milestones: 0.8.1, 0.8.2 Aug 23, 2019
@metasim metasim modified the milestones: 0.8.2, 0.8.3 Sep 23, 2019
@metasim metasim modified the milestones: 0.8.3, 0.8.4 Oct 9, 2019
@metasim metasim added the doozie A hard issue/bug dealing with deep Spark internals. label Sep 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug When it really isn't a "feature". doozie A hard issue/bug dealing with deep Spark internals.
Projects
None yet
Development

No branches or pull requests

2 participants