Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Firestore source #12

Merged
merged 32 commits into from Jun 22, 2021
Merged

Add Firestore source #12

merged 32 commits into from Jun 22, 2021

Conversation

nivaldoh
Copy link
Contributor

Hello. I've implemented a Firestore source, which is meant to work as an alternative for Sheets for parametrization purposes.

  • Why Firestore?
    Some of our clients are unable to access the Spreadsheets domain for security purposes, and Firestore proved to be a great option. It provides reliable and dynamic storage, and is quite simple to use. Also, the expected usage level for Megalist should fall into the free tier.

Additionally, Firestore has great integration with App Engine. In a future PR, I’d like to add a highly customizable App Engine form integrated with Firestore, which provides an easy to use alternative to Sheets, especially for non-technical users unable to access it.

  • Requirements:
    For now, Firestore usage requires a GCP project with native Firestore mode.

  • Usage:
    The default fields for any upload type are: active (yes/no), bq_dataset, bq_table, source and type. Valid upload types and their required fields can be seen in the firestore_execution_source file.

As with Sheets, account IDs are included separately. In this case, in a Firestore document called account_config, within the same collection. In other words, the hierarchy is:
Firestore collection -> document entries for each schedule + account_config document.

In order to check Firestore, Megalist requires the setup_firestore_collection command line parameter. If setup_sheet_id is provided, Sheets will be used instead.

  • Improvement opportunities:
  1. For now, the Firestore source expects BigQuery parameters, as it is the only ingest option currently available. This should be made flexible in the future, to allow options such as GCS.

  2. The list of parameter metadata was included in firestore_execution_source, and could be modularized in the future.

  • Testing
    I have only been able to test uploads to Google Ads and Google Analytics so far, as we generally lack access/test data to other platforms. Help with further testing would be greatly appreciated.

@caiotomazelli caiotomazelli self-requested a review April 30, 2021 16:29
Copy link
Collaborator

@caiotomazelli caiotomazelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Nivaldo, thank you for your contribution! I've added some smalls comments, can you please address them before moving forward?

Also, I'm interested in discussing a frontend for this data source. Can you please add your ideas for it here so we can get the discussion going? Thanks.

megalist_dataflow/sources/firestore_execution_source.py Outdated Show resolved Hide resolved
megalist_dataflow/sources/firestore_execution_source.py Outdated Show resolved Hide resolved
@nivaldoh
Copy link
Contributor Author

Hi Caio, thanks for your reply. I've addressed both issues (copyright date and english words), and also cleaned up a few leftover placeholders in metadata_list.

As for the frontend, I'll add a comment soon detailing some of the ideas we had, as well as showing a template of what we currently have.

Copy link
Collaborator

@caiotomazelli caiotomazelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Nivaldo, thanks for your changes!

In the meantime, we had another PR merged that also added another config source and caused some conflicts. Can you please resolve the conflicts and follow the style introduced by #14 , including updating the Readme with instructions on how to use this new source?

Sorry for the additional work and thanks again.

@nivaldoh nivaldoh force-pushed the firestore-source branch 2 times, most recently from 7e81967 to a362f5b Compare May 7, 2021 15:04
@caiotomazelli
Copy link
Collaborator

Hi Nivaldo, thanks for your changes!

In the meantime, we had another PR merged that also added another config source and caused some conflicts. Can you please resolve the conflicts and follow the style introduced by #14 , including updating the Readme with instructions on how to use this new source?

Sorry for the additional work and thanks again.

Hi @nivaldoh, I see that you made some changes recently. Can you please let me know once this is ready for review? Thanks.

@nivaldoh
Copy link
Contributor Author

Hi @caiotomazelli, at the moment I still need to finish testing. Also, there are three uploaders not yet compatible with the Firestore integration. They were originally intended for a later PR, but I'm taking the opportunity to include them for this one. Changes should be finished soon, and I'll let you know once it's ready. Thanks.

@nivaldoh
Copy link
Contributor Author

A quick update I'd like to leave here: the code was running perfectly on DirectRunner, but I ran into a few issues while testing on Dataflow. The solution was to update/add a few dependencies on setup.py. I bumped dependencies on minor releases only for this PR, but we should consider fully updating all of them in the near future (i.e. 1.x.x to 2.x.x).

@nivaldoh
Copy link
Contributor Author

@caiotomazelli changes ready for review! A quick summary:

  • The Firestore source now supports all currently implemented destinations. I also added support for Google Ads enhanced conversions based on the information on the template Sheet, although it is currently not implemented. GA4 is also not implemented and I wasn't able to find information about the expected metadata, so we'll have to revisit this in the future.

  • I updated the README with configuration instructions as well as information about all currently supported Firestore schemas. However, the wiki might be better suited for schema information. I wasn't able to suggest changes there, so for now, all new info is in the README file.

  • As I mentioned in the previous comment, I ran into dependency issues and had to update the setup file. There seems to be at least one breaking change if we fully update all packages there, so for this PR I tried to modify them as little as possible.

@nivaldoh
Copy link
Contributor Author

Had a few conflicts to solve after merging the recent PRs, but the code is ready for review once again.

@caiotomazelli
Copy link
Collaborator

Had a few conflicts to solve after merging the recent PRs, but the code is ready for review once again.

Hey Nivaldo, thanks for working on it!
I think there is something funny going on with the merge, there are a bunch of files appearing on the diff that do not seem related to your changes. Can you please take a look and make sure that the diff only contains the changes you're proposing and not the ones that happened on previous commit that were already merged? Thanks a lot!

@astivi
Copy link
Collaborator

astivi commented May 26, 2021

Yeah, thanks a lot for this contribution Nivaldo! It looks awesome!
I took a look at it as it is, and everything looks great.

Let me know if you can find out why GitHub is messing with the diff and I think we're okay to move on with the merge.

Cheers,

Translate constants and remove unnecessary metadata
Add Firestore setup information to README
Refactor Firestore integration following the style introduced in PR #14
Allow user/application to set a custom source_name and destination_name on Firestore documents
Change metadata to avoid forcing the user to fill all Dataflow configuration parameters (Sheets, JSON and Firestore)
Provide support for the following destination types: Google Analytics data import, Google Analytics user list and Google Ads enhanced conversions
Allow user/application to set a custom source_name and destination_name on Firestore documents
Merge with upstream and resolve conflicts
Translate constants and remove unnecessary metadata
Add Firestore setup information to README
Refactor Firestore integration following the style introduced in PR #14
Allow user/application to set a custom source_name and destination_name on Firestore documents
Change metadata to avoid forcing the user to fill all Dataflow configuration parameters (Sheets, JSON and Firestore)
Fix dependency issues when running on Dataflow
Fix conflicts from latest PR
Reinsert gitignore
@nivaldoh
Copy link
Contributor Author

nivaldoh commented Jun 6, 2021

Hey @caiotomazelli and @astivi, changes ready for review again. Sorry for the confusion!

Expand implementation and documentation to support improvements added in #24
@nivaldoh
Copy link
Contributor Author

nivaldoh commented Jun 6, 2021

I noticed that #24 adds improvements to Google Ads metadata, so I also updated the Firestore Source implementation and docs to maintain parity with the new metadata list in the Sheets template.

Copy link
Collaborator

@astivi astivi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Nivaldo!

Copy link
Collaborator

@caiotomazelli caiotomazelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please review the bit about static methods. Otherwise, looking good!!

Thanks a lot for contributing and sorry for the back and forth!

megalist_dataflow/sources/firestore_execution_source.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@caiotomazelli caiotomazelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nivaldo, all set, thanks a lot for your patience and collaboration!

@caiotomazelli caiotomazelli merged commit 6e554cb into google:main Jun 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants