-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Request: add to df.write_ndjson(json_columns: list[str])
, for columns to be decoded and written out as JSON
#17054
Comments
Is the problem that you have something like: df = pl.select(a=1,b=2,c=pl.lit('{"foo":[1, 2, 3],"bar":[4,5,6]}'))
df.select(pl.col.c.str.json_decode()).write_ndjson()
# '{"c":{"foo":[1,2,3],"bar":[4,5,6]}}\n' But you need the data written without the outer column name label? # {"foo":[1,2,3],"bar":[4,5,6]}\n' |
Here are some examples that hopefully help better explain my issue with that. The issue is that null values are outputed as # Issue #17054
import polars as pl
import json
import tempfile
big_geojson_obj = {"type":"FeatureCollection","features":[{"id":"baddba6f1276e861263d05d9cbecff74","type":"Feature","properties":{"lineColor":"#ffa000","lineWidth":2,"fillColor":"#ffe082","fillOpacity":0.1},"geometry":{"coordinates":[[[-114.42286807,55.199035275],[-118.90384586,53.413681626],[-115.7853142,51.95024781],[-111.63559015,53.23660491],[-114.42286807,55.199035275]]],"type":"Polygon"}}]}
df = pl.DataFrame({
"id": [1, 2, 3, 4],
"name": ["Location1", "Location2", "LocationWithLongGeom", "LocationWithNullGeom"],
"geometry": [
'{"type":"Point","coordinates":[102.0,0.5]}',
'{"type":"Point","coordinates":[103.0,1.0]}',
json.dumps(big_geojson_obj),
None
]
})
print("================ START BASIC WAY ================")
# Just output the column as a String (not what I want)
with tempfile.NamedTemporaryFile(suffix=".ndjson") as f:
df.write_ndjson(f.name)
f.seek(0)
print(f.read().decode())
print("================ END BASIC WAY ================")
print("================ START DEMO GOAL ================")
# Obviously this way is very slow
for row in df.iter_rows(named=True):
row_out = row.copy()
row_out["geometry"] = json.loads(row_out["geometry"]) if row_out["geometry"] is not None else None
print(json.dumps(row_out)) # fill write would happen here
print("================ END DEMO GOAL ================")
print("================ START SUGGESTION 1 ================")
# This is the previous suggestion.
# The issue is that null values are outputed as `"geometry":{"type":null,"coordinates":null,"features":null}` instead of just `null`.
df1 = df.with_columns(pl.col('geometry').str.json_decode())
with tempfile.NamedTemporaryFile(suffix=".ndjson") as f:
df1.write_ndjson(f.name)
f.seek(0)
print(f.read().decode())
print("================ END SUGGESTION 1 ================")
|
Well this is one of the oldest feature request in polars project, #3462. Your For an workaround one must venture outside My personal thinking is that #3462 should be marked P-high and 1.0 todo list, current implementation of |
Description
We store a column as geojson in a string in a dataframe. It would be really nice to be able to write this column out as json in the outputted JSON. Very niche use case, but all the alternative hacks are pretty awful.
The text was updated successfully, but these errors were encountered: