Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large data.frame is corrupted when row.names is includes in 1.7-3, but not 0.6-8 #16

Closed
SurajGupta opened this issue Feb 10, 2014 · 2 comments

Comments

@SurajGupta
Copy link

I have a large data.frame (The encoded message is around 33MB) that is encoded by my client, send to the server (both on Windows 7, using R 2.15.3), received by the server and processed correctly by server in v. 0.6-8, but the data is corrupted under v 1.7-3. When I send the SEXP to the server, the server responds saying that everything is OK. The corruption appears when validating the SEXP at the server (and thus when retrieving the same SEXP from the server).

Using 0.6-8, when I call str() and then str(rownames()) on the data.frame then I see the following in the server:

Rserve: Ok, ready to answer queries.
'data.frame':   1500003 obs. of  3 variables:
 $ Date    : Date, format: "2014-02-10" "2014-02-09" ...
 $ Integers: int  0 1 2 3 4 5 6 7 8 9 ...
 $ Doubles : num  -31.123 0.954 0.592 0.389 0.467 ...
 chr [1:1500003] "2" "3" "4" "5" "6" "7" "8" "9" "10" ...

However, using 1.7-3, when I call str() I see the follow:

Rserve: Ok, ready to answer queries.
List of 3
 $ : Date[1:1500003], format: "2014-02-10" "2014-02-09" ...
 $ : int [1:1500003] 0 1 2 3 4 5 6 7 8 9 ...
 $ : num [1:1500003] -31.123 0.51 0.237 0.71 0.882 ...
 - attr(*, "assigns2")=Error: cannot allocate vector of size 5.7 Mb
In addition: Warning messages:
1: In do.call(str, c(list(object = obj), aList, list(...)), quote = TRUE) :
  Reached total allocation of 2047Mb: see help(memory.size)
2: In do.call(str, c(list(object = obj), aList, list(...)), quote = TRUE) :
  Reached total allocation of 2047Mb: see help(memory.size)
3: In do.call(str, c(list(object = obj), aList, list(...)), quote = TRUE) :
  Reached total allocation of 2047Mb: see help(memory.size)
4: In do.call(str, c(list(object = obj), aList, list(...)), quote = TRUE) :
  Reached total allocation of 2047Mb: see help(memory.size)

And calling str(rownames()) using using 1.7-3 just shows NULL at the server.

It's interesting that the exact same data.frame, without any rownames (so just using the default rownames for a data.frame), causes no corruption. So perhaps there's a bug related to attributes in 1.7-3? After sending the data.frame without rownames and then sending the data.frame with row.names, Rserve is at just under 200MB. But when I call str(), Rserve shoots up to over 2GB. There is not anywhere close to 2GB of data in R.

Note that in both cases (0.6-8 and 1.7-3) that I am using the exact same installation of R. The only difference between the two tests is literally the Rserve folder under site-library. I've triple-checked that I'm following the protocol as specified, but of course I could have missed something.

I serialized the bytes to disk for the server call that assigns the data.frame. It can be found here: https://dl.dropboxusercontent.com/u/10558029/encode.7z
Can someone else reproduce the behavior I'm seeing?

@s-u
Copy link
Owner

s-u commented Feb 18, 2014

Sorry that it took so long - it took a while to get the binary output (exotic compression of text that has to be parsed into binary).

Anyway, issue found - two lines were swapped in protection of pairlists. Attributes are pairlists, so it took a very large attribute list (here containing row names) to trigger.

@SurajGupta
Copy link
Author

much appreciated! I'll test my client when the next version of Rserve gets rolled out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants