Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upC Stack Overlfow in from_json when parsing large file #61
Comments
|
I cut a large-ish file up into smaller pieces to see if I could narrow it down to a certain recursion level, but no luck yet. |
|
Note for me: I've left branch Am I making too many nested lists? Am I going round more than |
|
My guess is that it's not exiting a level of recursion somewhere in json <- paste0(
readLines("https://github.com/zemirco/sf-city-lots-json/raw/master/citylots.json"),
collapse = ""
)
test <- jsonify::from_json(json, simplify = FALSE)
Error: segfault from C stack overflow
test <- jsonify::from_json(json, simplify = TRUE)
Error: segfault from C stack overflow |
|
testing recursion depth library(Rcpp)
cppFunction(
code = '
void recurse(unsigned int i) {
Rcpp::Rcout << i << std::endl;;
recurse(i+1);
}
')
recurse(0)
...
165736
# Error: C stack usage 7969204 is too close to the limit |
|
@knapply following your rl <- readLines("~/Documents/github/sf-city-lots-json/citylots.json")
js <- paste(rl, collapse = "")
system.time({
res <- jsonify:::rcpp_from_json2( js, TRUE, TRUE )
})
user system elapsed
13.171 0.469 13.675
head( res$features[[1]][[2]] )
MAPBLKLOT BLKLOT BLOCK_NUM LOT_NUM FROM_ST TO_ST STREET ST_TYPE ODD_EVEN
1 0001001 0001001 0001 001 0 0 UNKNOWN <NA> E
2 0002001 0002001 0002 001 0 0 UNKNOWN <NA> E
3 0004002 0004002 0004 002 0 0 UNKNOWN <NA> E
4 0005001 0005001 0005 001 206 286 JEFFERSON ST E
5 0006001 0006001 0006 001 350 366 JEFFERSON ST E
6 0007001 0007001 0007 001 2936 2936 HYDE ST E
|
|
on branch rl <- readLines("~/Documents/github/sf-city-lots-json/citylots.json")
js <- paste(rl, collapse = "")
system.time({
res <- from_json( js )
})
# user system elapsed
# 11.628 0.340 11.981
system.time({
res <- from_json( js, simplify = FALSE )
})
# user system elapsed
# 5.563 0.277 5.846@knapply I'm going to send to CRAN in a day or so to get this segfault issue resolved. But I sitll think some sort of standardisation between the two libraries is worth discussing. Also, how should I add you to the |
|
I didn't realize you were using this here, but...
... is what I use elsewhere. Thanks. |
|
merged into master |
|
I hate to bring bad news fail <- jsonify::from_json( # simplifying fails
jsonify::to_json(matrix("yo", nrow = 1e3, ncol = 1e3))
)
#> Error: segfault from C stack overflow
fail <- jsonify::from_json(
jsonify::to_json(replicate(1e5, letters, simplify = FALSE))
)
#> Error: segfault from C stack overflow
success <- jsonify::from_json( # non-simplify
jsonify::to_json(matrix("yo", nrow = 1e3, ncol = 1e3)),
simplify = FALSE
)
success <- jsonify::from_json( # non-simplify
jsonify::to_json(replicate(1e5, letters)),
simplify = FALSE
)
success <- jsonify::from_json( # less rows
jsonify::to_json(matrix("yo", nrow = 1e2, ncol = 1e3))
)
success <- jsonify::from_json( # less rows
jsonify::to_json(replicate(1e4, letters, simplify = FALSE))
)I don't think I would've come across this if I weren't benchmarking things and I haven't been able to recreate it with "real" data. |
|
ha - no worries. But it seems to now only be in the "simplify" logic, rather than the "parse" logic now, which is a step in the right direction. Looking at it it's probably in my recursive |
|
It is working on files it wasn't before, so there's definitely progress! These "less rows" versions from the example above... success <- jsonify::from_json( # less rows
jsonify::to_json(matrix("yo", nrow = 1e2, ncol = 1e3))
)
success <- jsonify::from_json( # less rows
jsonify::to_json(replicate(1e4, letters, simplify = FALSE))
)... both return matrices... class(
jsonify::from_json(
jsonify::to_json(matrix("yo", nrow = 1e2, ncol = 1e3))
)
)
#> [1] "matrix" "array"
class(
jsonify::from_json(
jsonify::to_json(replicate(1e4, letters, simplify = FALSE))
)
)
#> [1] "matrix" "array"I'm not sure if things run through |
|
I'll dive in and take a closer look at some point today. Thanks for finding this |
|
so, some investigation has led me to
@knapply does anything spring to mind to indicate why |
|
Pre-allocating does it case rapidjson::kStringType: {
//out[i] = "test";
size_t sl = child.GetStringLength();
std::string s = std::string( child.GetString(), sl);
out[i] = s;
update_rtype< STRSXP >( r_type );
break;
}m <- matrix("yo", nrow = 1e3, ncol = 1e3)
js <- jsonify::to_json(m)
fail <- jsonify::from_json( js )
str( fail )
chr [1:1000, 1:1000] "yo" "yo" "yo" "yo" "yo" "yo" "yo" "yo" "yo" "yo" "yo" "yo" "yo" "yo" "yo" "yo" "yo" "yo" "yo" .. |
|
Nice! Sorry for the delay, but finger-crossed! |
|
no worries. It's on CRAN now too. So I'll close this until you find another issue :) |

For example, my google timeline history.
This is not a
FileReadStreamord.Parse()errror as this workswhere
jsonis my google timeline historySo the error is in the parsing-to-R phase...
... into the rabbit hole we go...