New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

java.io.UTFDataFormatException: encoded string too long #57

Open
kvndrsslr opened this Issue Mar 19, 2016 · 0 comments

Comments

Projects
None yet
1 participant
@kvndrsslr
Copy link

kvndrsslr commented Mar 19, 2016

Hello,

when attempting to generate links for two files with Linked Geospatial Data with SILK singlemachine version 2.7.0 I get the exception java.io.UTFDataFormatException: encoded string too long: 83821 bytes.
Indeed, the string literals in this file are very long WKT serializations of polygons and multipolygons, which to my understanding should be supported by SILK as it contains plugins designated to geospatial relations and distances.
The problem seems to lie in the core silk code rather than the plugins.
For reproducability I have created 3 pastebins containing the log with the exceptions stacktrace, the configuration file and the very small source dataset used in the linking task. The target dataset can be downloaded from datahub.io.
This article shows why this exception occurs and how it can be fixed.

Edit: As a temporary workaround I used awk 'length($0) < 65536' file > new_file on the datasets (Java streams can't store strings longer than 64 kbyte = 65536 byte).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment