Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GCS: Bulk uploading #2935

Closed
AkhileshNegi opened this issue Jul 21, 2023 · 2 comments
Closed

GCS: Bulk uploading #2935

AkhileshNegi opened this issue Jul 21, 2023 · 2 comments
Assignees
Labels
enhancement New feature or request

Comments

@AkhileshNegi
Copy link
Member

AkhileshNegi commented Jul 21, 2023

Is your feature request related to a problem? Please describe.
Our current method of uploading files to GCS is not that scalable, where we basically download files from Gupshup and upload files to GCS
Once the file is uploaded we update messages_media table with GCS URLs which organizations can use

Describe the solution you'd like
Finding alternative to bulk upload files to GCS

Some sample files

https://filemanager.gupshup.io/fm/wamedia/ColoredCow/1099e8e4-83e9-4169-b260-ceb394d6574f
https://filemanager.gupshup.io/fm/wamedia/ColoredCow/48d5b347-c6c3-4cf8-9583-0a7b76cd3606
https://filemanager.gupshup.io/fm/wamedia/ColoredCow/8176d04e-970d-4e2a-9987-2b3da974421a
https://filemanager.gupshup.io/fm/wamedia/ColoredCow/492e3e32-755a-4b32-b08a-1cb35870e46e
https://filemanager.gupshup.io/fm/wamedia/ColoredCow/a27697f8-b4c1-4305-8572-17e346aadfa3
https://filemanager.gupshup.io/fm/wamedia/ColoredCow/69edddc3-cd52-453a-af66-f2d3e0959d8a
https://filemanager.gupshup.io/fm/wamedia/ColoredCow/2f08a86e-3e4d-4c7f-9c35-e1f0c06103b5
https://filemanager.gupshup.io/fm/wamedia/ColoredCow/39d5b91b-a152-4fb1-b182-2014921856a6
https://filemanager.gupshup.io/fm/wamedia/ColoredCow/abc6a2ea-2d2d-4828-9197-3566bf0f4bf8
https://filemanager.gupshup.io/fm/wamedia/ColoredCow/61c55695-2e8e-4518-b327-96bf1e064836
https://filemanager.gupshup.io/fm/wamedia/ColoredCow/136ed31e-f2d0-4f89-8137-6a1dfd577dd0
@AkhileshNegi AkhileshNegi added the enhancement New feature or request label Jul 21, 2023
@dlobo
Copy link
Collaborator

dlobo commented Jul 28, 2023

I would say, lets get a standalone script working first that does the following:

  1. To begin with assume all credentials and bucket name comes from a config file / env
  2. Given a set of ids, urls, and relative location in the bucket, do a bulk upload to GCS. Also move the files to the right location in the bucket
  3. Return a set of ids with the GCS url

Once the above is done, we can integrate with the elixir code base and then do:

  1. For each org, generate a set of ids, urls and location
  2. Call the above with the right config for the org
  3. Use the return value to set the gcs_url

@AkhileshNegi
Copy link
Member Author

Closing this for now. As there were two big things we changed

  1. We were syncing outbound media to GCS also, so starting a flow with media for 1000 contacts will create 1000 new entries. So we changed the code to sync only inbound media and it is under control now
  2. We'll tweak the code to reuse same media id instead of creating new entry everytime, mentioned in this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants