Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create proof-of-concept for using S3 triggers for automated conversion of model-output files #52

Closed
4 tasks done
bsweger opened this issue Mar 22, 2024 · 3 comments
Closed
4 tasks done
Assignees
Labels
cloud work related to cloud-enabled hubs

Comments

@bsweger
Copy link
Collaborator

bsweger commented Mar 22, 2024

There was some conversation here that resulting in the conclusion that we should not rely on GitHub CI actions for triggering the conversion of incoming model-output files to parquet format.

As a next step, I'd like to explore using S3 event notifications as way to invoke actions when a model-output file is written to a hub's S3 bucket.

Specifically, these notifications:

  • New object created events
  • Object removal events
  • Object ACL PUT events?

At a high level, the idea is to invoke our prototype "transform model-output file to parquet" function automatically, whenever a model-output file is uploaded to S3 (this happens via GitHub action).

model submission PR merged -> model-output data syncs to S3 -> S3 "new object created" event triggers an AWS lambda version of the "convert data to S3 function"

Definition of done:

  • data conversion function automatically runs when model-output files are sent to s3://hubverse-cloud/raw/model-output
  • converted parquet files are written to s3://hubverse-cloud/model-output
  • converted parquet files are accessible via hubData
  • uploading a new version of a model-output file (i.e., with same name) to s3://hubverse-cloud/raw/model-output triggers the same data conversion process as uploading a new file

The AWS resources for this will be created manually (i.e., no need to incorporate into our infrastructure as code process unless we decide this solution will work for us).

@bsweger bsweger self-assigned this Mar 22, 2024
@bsweger bsweger added the cloud work related to cloud-enabled hubs label Mar 22, 2024
@bsweger
Copy link
Collaborator Author

bsweger commented Mar 22, 2024

Let's scope this work to the "new object created event." If it seems like a good way to proceed, the step would be code the corresponding action when a model-output file is deleted.

@bsweger
Copy link
Collaborator Author

bsweger commented Mar 22, 2024

@bsweger bsweger added this to the hubverse cloud sync milestone Apr 5, 2024
@bsweger
Copy link
Collaborator Author

bsweger commented Apr 9, 2024

This is done--I gave @annakrystalli a demo on how it works and we agreed that we should proceed with the use of AWS event notifications + a lambda function to handle conversion of the model-output files.

I tried (and failed) to record the demo, but can do it at the next dev meeting for anyone interested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cloud work related to cloud-enabled hubs
Projects
Status: Done
Development

No branches or pull requests

1 participant