Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

iput/iget should not write temp files to disk before streaming to iRODS #28

Closed
MartinSchobben opened this issue Apr 27, 2023 · 2 comments
Assignees
Labels
enhancement New feature or request

Comments

@MartinSchobben
Copy link
Collaborator

MartinSchobben commented Apr 27, 2023

I think the solution is to make a connection to iRODS and then stream from memory to the final destination. Although I do not know how this looks like with a connection to iRODS, locally this could look like this:

# test object
x <- matrix(1:100, 10, 10)

# serialize r object write in memory to vector (connection = NULL)
y <- serialize(x, connection = NULL)
# length of object for chunking
size_y <- length(y)
# make a file -> this should then be an object on iRODS
fil <- tempfile()
# this is an R connection (IO stream object) -> this should become a connection to iRODS REST 
tmp <- file(fil)
# open the connection
open(tmp, "wb")
# chunk 1
writeBin(y[1:(size_y / 2)], tmp)
# chunk 2
writeBin(y[(size_y / 2 + 1):size_y], tmp)
# destroy connections
close(tmp)

# open connection  -> this should become a connection to iRODS REST 
con <- file(fil, "rb") 
# read object -> back to memory
# chunk 1 (`fil` would work as well but I use a connection here as it is 
# closer to the iRODS REST situation)
x1 <- readBin(con, raw(), n = size_y / 2, endian = "swap")
# chunk 2
x2 <- readBin(con, raw(), n = size_y / 2, endian = "swap")
# fuse chunks
z <- c(x1, x2)
# check if complete
all.equal(z, y)
# unserialize
unserialize(z)

@korydraughn
Copy link

Yes, that is how istream works. It reads bytes from stdin and sends chunks (via an in-memory buffer) to the iRODS server. The downside is that the length of the input stream is unknown. That means istream does not support parallel transfer.

Because you're using the REST API, you won't have parallel transfer available either. Just something to keep in mind.

@MartinSchobben
Copy link
Collaborator Author

MartinSchobben commented Apr 27, 2023

Yes, I guess we have to life with that. But the implementation for the serial approach on the R side is poor at the moment, as I designed it to write to disk first and then send to the REST API. The implementation as shown before might circumvent this. I placed it here to remind myself that I should look into this.

MartinSchobben added a commit to MartinSchobben/irods_client_library_rirods that referenced this issue May 7, 2023
MartinSchobben added a commit to MartinSchobben/irods_client_library_rirods that referenced this issue May 7, 2023
MartinSchobben added a commit to MartinSchobben/irods_client_library_rirods that referenced this issue May 7, 2023
MartinSchobben added a commit to MartinSchobben/irods_client_library_rirods that referenced this issue May 7, 2023
MartinSchobben added a commit to MartinSchobben/irods_client_library_rirods that referenced this issue May 9, 2023
MartinSchobben added a commit to MartinSchobben/irods_client_library_rirods that referenced this issue May 9, 2023
MartinSchobben added a commit to MartinSchobben/irods_client_library_rirods that referenced this issue May 20, 2023
MartinSchobben added a commit to MartinSchobben/irods_client_library_rirods that referenced this issue May 20, 2023
@trel trel added the enhancement New feature or request label Mar 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Development

No branches or pull requests

3 participants