Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bucket and files fxns should return data frames #11

Open
sckott opened this issue Oct 17, 2023 · 3 comments
Open

bucket and files fxns should return data frames #11

sckott opened this issue Oct 17, 2023 · 3 comments

Comments

@sckott
Copy link
Member

sckott commented Oct 17, 2023

... and we should get really opinionated what the columns of those data frames should be.

Originally via @seankross in #3 (review)

@sckott
Copy link
Member Author

sckott commented Oct 19, 2023

@seankross With what's on main branch right now, all the inputs to our file fxns are now vectorized. however, this issue is about returning data frames from file and bucket fxns. The vectorized nature of the file fxns makes it easy - as pointed out in the s3fs docs - to pipe these fxns together. However, if we output tibble's we won't be able to do that so easily (though still could be done i guess). One of the file fxns returns a tibble right now, whereas others return vectors. thoughts? if we returned df's i guess we could always run a fxn to get back the contents of the bucket to return?

I think it's easier to think about always returning dfs with bucket fxns

@seankross
Copy link
Collaborator

For the file fxns I think it's mostly okay to be dealing in vectors because ultimately you're acting on and pushing around paths. Where in the case of aws_file_attr you're going to get multiple variables returned for every one file. I'll try to think about a better heuristic but it's something like: no one-column data frames.

this is a poorly formed thought:
I am struggling to think of function where I would want a data frame as input, but I like data frames as outputs when multiple variables are returned per function input, because then you could do tidyverse stuff to that data frame, then grab the columns you need for the next function in the pipeline.

@sckott
Copy link
Member Author

sckott commented Oct 20, 2023

Thanks for your feedback.

ON your last thought, that makes sense. Fxns in this package may not be piped together themselves - more likely the output of a fxn in this pkg will go into a tidyverse pipeline

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants