New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Firestore source #12
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Nivaldo, thank you for your contribution! I've added some smalls comments, can you please address them before moving forward?
Also, I'm interested in discussing a frontend for this data source. Can you please add your ideas for it here so we can get the discussion going? Thanks.
Hi Caio, thanks for your reply. I've addressed both issues (copyright date and english words), and also cleaned up a few leftover placeholders in metadata_list. As for the frontend, I'll add a comment soon detailing some of the ideas we had, as well as showing a template of what we currently have. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Nivaldo, thanks for your changes!
In the meantime, we had another PR merged that also added another config source and caused some conflicts. Can you please resolve the conflicts and follow the style introduced by #14 , including updating the Readme with instructions on how to use this new source?
Sorry for the additional work and thanks again.
7e81967
to
a362f5b
Compare
Hi @nivaldoh, I see that you made some changes recently. Can you please let me know once this is ready for review? Thanks. |
Hi @caiotomazelli, at the moment I still need to finish testing. Also, there are three uploaders not yet compatible with the Firestore integration. They were originally intended for a later PR, but I'm taking the opportunity to include them for this one. Changes should be finished soon, and I'll let you know once it's ready. Thanks. |
A quick update I'd like to leave here: the code was running perfectly on DirectRunner, but I ran into a few issues while testing on Dataflow. The solution was to update/add a few dependencies on setup.py. I bumped dependencies on minor releases only for this PR, but we should consider fully updating all of them in the near future (i.e. 1.x.x to 2.x.x). |
@caiotomazelli changes ready for review! A quick summary:
|
Had a few conflicts to solve after merging the recent PRs, but the code is ready for review once again. |
Hey Nivaldo, thanks for working on it! |
Yeah, thanks a lot for this contribution Nivaldo! It looks awesome! Let me know if you can find out why GitHub is messing with the diff and I think we're okay to move on with the merge. Cheers, |
Translate constants and remove unnecessary metadata
Add Firestore setup information to README
Refactor Firestore integration following the style introduced in PR #14
Allow user/application to set a custom source_name and destination_name on Firestore documents
Change metadata to avoid forcing the user to fill all Dataflow configuration parameters (Sheets, JSON and Firestore)
Provide support for the following destination types: Google Analytics data import, Google Analytics user list and Google Ads enhanced conversions
Allow user/application to set a custom source_name and destination_name on Firestore documents
Merge with upstream and resolve conflicts
Translate constants and remove unnecessary metadata
Add Firestore setup information to README
Refactor Firestore integration following the style introduced in PR #14
Allow user/application to set a custom source_name and destination_name on Firestore documents
Change metadata to avoid forcing the user to fill all Dataflow configuration parameters (Sheets, JSON and Firestore)
Fix dependency issues when running on Dataflow
Fix conflicts from latest PR
Reinsert gitignore
ac8106c
to
b9e7bac
Compare
Hey @caiotomazelli and @astivi, changes ready for review again. Sorry for the confusion! |
Expand implementation and documentation to support improvements added in #24
I noticed that #24 adds improvements to Google Ads metadata, so I also updated the Firestore Source implementation and docs to maintain parity with the new metadata list in the Sheets template. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Nivaldo!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please review the bit about static methods. Otherwise, looking good!!
Thanks a lot for contributing and sorry for the back and forth!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nivaldo, all set, thanks a lot for your patience and collaboration!
Hello. I've implemented a Firestore source, which is meant to work as an alternative for Sheets for parametrization purposes.
Some of our clients are unable to access the Spreadsheets domain for security purposes, and Firestore proved to be a great option. It provides reliable and dynamic storage, and is quite simple to use. Also, the expected usage level for Megalist should fall into the free tier.
Additionally, Firestore has great integration with App Engine. In a future PR, I’d like to add a highly customizable App Engine form integrated with Firestore, which provides an easy to use alternative to Sheets, especially for non-technical users unable to access it.
Requirements:
For now, Firestore usage requires a GCP project with native Firestore mode.
Usage:
The default fields for any upload type are: active (yes/no), bq_dataset, bq_table, source and type. Valid upload types and their required fields can be seen in the firestore_execution_source file.
As with Sheets, account IDs are included separately. In this case, in a Firestore document called account_config, within the same collection. In other words, the hierarchy is:
Firestore collection -> document entries for each schedule + account_config document.
In order to check Firestore, Megalist requires the setup_firestore_collection command line parameter. If setup_sheet_id is provided, Sheets will be used instead.
For now, the Firestore source expects BigQuery parameters, as it is the only ingest option currently available. This should be made flexible in the future, to allow options such as GCS.
The list of parameter metadata was included in firestore_execution_source, and could be modularized in the future.
I have only been able to test uploads to Google Ads and Google Analytics so far, as we generally lack access/test data to other platforms. Help with further testing would be greatly appreciated.