Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connecting to files in S3 should recognize parquet as a valid file format #32

Closed
bsweger opened this issue May 15, 2024 · 3 comments
Closed

Comments

@bsweger
Copy link
Contributor

bsweger commented May 15, 2024

When working with a cloud-enabled hub that doesn't have parquet specified as a valid file_format in admin.json, hubData is unable to connect to the model-output files in S3.

I assume that's because hubData checks files against the listed file_formats.

But for hubs on the cloud, I imagine we'd want hubData to recognize parquet files, regardless of the formats accepted by the hub during the submission process.


> hub_path <- s3_bucket("bsweger-flusight-forecast/")
> hub_con <- connect_hub(hub_path)
Warning message:
In connect_hub(hub_path) :
  No files of file format "csv" found in model output directory.
@annakrystalli
Copy link
Member

annakrystalli commented May 17, 2024

So there is an option to override the file types in the config. I'm trying to test it out but connect_hub() keeps hanging on me using the "bsweger-flusight-forecast/" bucket , although it could well be my internet as we've been having problems with it.

Can you run

connect_hub(hub_path, file_format = "parquet")

and if so does it work?

Agreed though that overriding it manually isn't ideal. I'm not sure whether just always looking for parquet files to open is something we would want to allow. It moves away from the config but off the top of my head, I can't think why that would be a problem.

@bsweger
Copy link
Contributor Author

bsweger commented May 17, 2024

@annakrystalli it does work, thank you for that! (though if you continue to have trouble using connect_hub() against these files, please let me know.

It moves away from the config but off the top of my head, I can't think why that would be a problem.

I've been thinking about this too. The config files, validations, etc. are critical to the functioning of a hub, but I'm not as clear on how strict our tools should be on the other end.

I'm gonna close this because there's already a way to use hubData with parquet files on a .csv-only hub (and it was right there in the docs, sorry for missing that!)

@bsweger bsweger closed this as completed May 17, 2024
@annakrystalli
Copy link
Member

Great!

though if you continue to have trouble using connect_hub() against these files, please let me know.

Will do! Excited to test it out 😃

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants