Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add “partial data” states infrastructure #8141

Closed
43 tasks done
techanvil opened this issue Jan 24, 2024 · 12 comments
Closed
43 tasks done

Add “partial data” states infrastructure #8141

techanvil opened this issue Jan 24, 2024 · 12 comments
Labels
Module: Analytics Google Analytics module related issues P1 Medium priority PHP QA: Eng Requires specialized QA by an engineer Team M Issues for Squad 2 Type: Enhancement Improvement of an existing feature

Comments

@techanvil
Copy link
Collaborator

techanvil commented Jan 24, 2024

Feature Description

Add the full infrastructure for determining and exposing the "partial data" states for audiences, custom dimensions and properties.

See partial data states in the design doc.


Do not alter or remove anything below. The following sections will be managed by moderators only.

Acceptance criteria

  • Analytics module should have new selectors for detecting whether an audience, custom dimension or analytics property (referred as resource in the following points) is in "partial data" state.
    • A resource is considered to be in "partial data" state until it has been active for the full duration of the currently selected date range.
    • A resource is also considered to be in "partial data" state if the GA4 itself is in gathering data.
  • Partial data state should be determined by retrieving a report of the given resource and checking the date of the earliest event and making a comparison with the start date of the current date range.
  • Similarly how it's done for the gathering data states, the date of the earliest event, once determined, should be persisted on the server and made available in client on page load.
    • The persisted date for a given resource. whenever available. should be used instead of making a report request to determine the partial states in the resolvers of partial data selectors.
    • Persisted dates for all resources should be reset whenever Analytics property or measurement ID changes, Analytics module is deactivated or Site Kit is reset.

Implementation Brief

Note: the following IB is heavily based on and inspired by Data_Available state for modules and custom dimensions. Any gap in the IB may be filled in by reassessing the implementation and comparing with the aforementioned infrastructure.

PHP

  • Create class Google\Site_Kit\Modules\Analytics_4\Resource_Data_Availability_Date.
    • Take Transients $transients in the constructor and initialize as a field.
    • Use Const VALID_CUSTOM_DIMENSION_SLUGS and VALID_AUDIENCE_SLUGS to store valid and allowed custom dimensions and slugs.
    • Have RESOURCE_TYPE_** consts for audience, custom dimension and property resources.
    • Method get_resource_transient_name takes resource name and resource type parameters and returns the computed transient name. ie. return "googlesitekit_{$resource_type}_{$resource_name}_data_availability_date";
    • Method get_resource_dates should return an associative array of the data availability date of resources. This can be multi dimensional array or the resources can be prefixed with the resource type.
    • Other methods get_resource_date, set_resource_date, reset_resource_date etc should be implemented similarly to how it is done on Google\Site_Kit\Modules\Analytics_4\Custom_Dimensions_Data_Available class.
  • In Google\Site_Kit\Modules\Analytics_4 class:
    • Add $resource_data_available_date field and instantiate it with Resource_Data_Availability_Date in the constructor.
    • Create A New REST Endpoint POST:save-resource-data-availability-date in Analytics_4 module.
      • It should check if the passed resource(s) in the $data (audience, customDimension or property) are valid, and then persist the date values as a timestamp in the DB using the $this->resource_data_available_date->set_resource_date method.
    • Expose the persisted dates of resource data availability to client using googlesitekit_inline_modules_data filter in the register method.
    • Call $resource_data_available_date->reset_resource_date() in on_deactivation method to reset all persisted dates on module deactivation.
    • Call $resource_data_available_date->reset_resource_date() in the $this->get_settings()->on_change() when property ID or measurement ID is different, similarly to how it's done with $this->custom_dimensions_data_available->reset_data_available() to reset the persisted dates when analytics property/measurement ID changes.

JS

  • Create assets/js/modules/analytics-4/datastore/partial-data.js file.
    • Create a fetch store for the aforementioned POST API.
    • Actions:
      • saveResourceDataAvailabilityDate takes an array of the object {resource name, resource type and date} and save it to the server using the fetch store.
    • Selectors:
      • getResourceDataAvailabilityDate(resourceName, resourceTyoe): returns the date associated with the given resource if available, otherwise resolves to the first date in the last 90 days that the report data became available using the associated resolver (described below). The 90 days is chosen because that's the longest date range available in Site Kit.
      • is{audience|customDimension|Property}PartialData(resourceName):
        • Return true when GA4 is in gathering Data state.
        • Return false when the dataAvailabilityDate for the is same or earlier than the startDate of currently selected date range.
        • Otherwise, return true. This also handles the case where dataAvailabilityDate for a given resource can not be determined due to errors or being in the shared dashboard.
    • Resolvers
      • getResourceDataAvailabilityDate:
        • Get reportArgs for the given resource.
        • For a property, this ReportArgs is similar to one returned by getSampleReportArgs from assets/js/modules/analytics-4/utils/report-args.js, while the change here being:
          • Start date: creation date of the current GA property.
          • End date: the reference date.
        • For audience, the reportArgs will include audienceResourceName as an additional dimension.
          • This will allow for a single report for all audience resources, and filtering the resulted report for a specific resource in JS to get the earliest date for a given audience resource.
        • For Custom Dimension, report args should be the following:
          • Start date: creation date of the current GA property.
          • End date: the reference date.
          • The dimension: date for property resource, and customEvent:${ resourceName }
          • Metric: eventCount.
          • See getDataAvailabilityReportOptions selector in assets/js/modules/analytics-4/datastore/custom-dimensions-gathering-data.js and getSampleReportArgs in assets/js/modules/analytics-4/datastore/report.js for more complete example. The implementation can largely be followed.
        • Make a simple report request to the given resource using the above report args.
        • Find the date of the first available report.
        • If there is any error or user doesn't have permission (ie. the property creation date can not be accessed in shared dashboard), return null and do not persist anything.
        • Otherwise, persist the date for the given resource using saveResourceDataAvailabilityDate and return the date.
  • Add the newly added store partial to assets/js/modules/analytics-4/datastore/index.js.

Test Coverage

  • Add PHP Unit test for the newly added infrastructure.
  • Add Jest test for the newly added selectors and actions.

QA Brief

As discussed in Slack, due to the nature of this issue, QA has to be performed by an engineer.

  • This is not used in plugin yet, so needs to be tested in developer console.
  • Connect to a analytics property that has some audiences and googlesitekit_post_type custom dimension.
    • One property that has data for above and one without data would be helpful here. To test things fully one should have been created early enough that data is available for at least 7+ days would be preferable. ie. the oi.ie property we use has data for audience.
  • Check isAudiencePartialData, isCustomDimensionPartialData and isPropertyPartialData and ensure their behavior aligns with the AC.
  • Check getResourceDataAvailabilityDate selector and ensures that if the date is resolved, it is being persisted to WP Transients via POST:save-resource-data-availability-date.
    • The date, if available, should be persisted regardless of whether the resource is in partial data so that it can be used subsequently. This is different from how the state for gathering data is persisted.
  • Change the property/measurement ID and ensure the saved googlesitekit_**_**__data_availability_date transients are getting wiped.
    • Same should also happen when Analytics is disconnected or Site kit is reset.

Changelog entry

  • Add partial data states infrastructure for Analytics resources.
@techanvil techanvil added Module: Analytics Google Analytics module related issues P1 Medium priority Type: Enhancement Improvement of an existing feature labels Jan 24, 2024
@ivonac4 ivonac4 added the Next Up Issues to prioritize for definition label Feb 5, 2024
@techanvil techanvil changed the title Add “collecting data” states infrastructure Add “partial data” states infrastructure Feb 8, 2024
@ivonac4 ivonac4 added Next Up Issues to prioritize for definition and removed Next Up Issues to prioritize for definition labels Feb 16, 2024
@bethanylang bethanylang removed the Next Up Issues to prioritize for definition label Feb 20, 2024
@kuasha420 kuasha420 self-assigned this Mar 5, 2024
@ivonac4 ivonac4 added the Next Up Issues to prioritize for definition label Mar 6, 2024
@kuasha420 kuasha420 removed their assignment Mar 7, 2024
@ivonac4 ivonac4 added the Sp Wk 2 Issues to be completed in the second week of the assigned sprint label Mar 11, 2024
@eugene-manuilov eugene-manuilov self-assigned this Mar 11, 2024
@eugene-manuilov
Copy link
Collaborator

AC ✔️

@eugene-manuilov
Copy link
Collaborator

  • Create assets/js/modules/analytics-4/datastore/custom-dimensions-partial-data.js file.

I think the file should be renamed to be more generic, something like partial-data.js because custom-dimensions- prefix refers to the custom dimensions matter which is just one out of three matters of the task.

Add the full infrastructure for determining and exposing the "partial data" states for audiences, custom dimensions and properties.

The "determining" part is missing in IB. We need to add instructions how to detect and save partial data information for all three matters.

@kuasha420
Copy link
Collaborator

Thank you @eugene-manuilov for the review!

I think the file should be renamed to be more generic, something like partial-data.js because custom-dimensions- prefix refers to the custom dimensions matter which is just one out of three matters of the task.

Correct! I've updated the file name accordingly.

The "determining" part is missing in IB. We need to add instructions how to detect and save partial data information for all three matters.

The getResourceDataAvailabilityDate will either determine the first available date with data using a getReport request to the given resource with a 90-day report window (in resolver) or return the persisted date. We then use this date for the current date range in the is{audience|customDimension|Property}PartialData(resourceName) selectors to determine the partial data state. We can't persist the boolean value of this without needlessly complicating this, as this can be different based on the currently selected date range.

My thinking here is that something can be in partial data state for a 28-day range, but still can have all the data it needs for a 7-day range and thus not being in partial data. So by saving the first available date for a 90 day report instead, we can recompute the partial data state for all our supported date range.

Let me know what you think!

@techanvil
Copy link
Collaborator Author

techanvil commented Mar 19, 2024

My thinking here is that something can be in partial data state for a 28-day range, but still can have all the data it needs for a 7-day range and thus not being in partial data. So by saving the first available date for a 90 day report instead, we can recompute the partial data state for all our supported date range.

Hey @kuasha420 @eugene-manuilov, just chipping in here as I had imagined we'd probably want to take the approach of requesting a report with a start date of the property creation time, that way we could get a definitive first-event-date and not keep requesting reports if say a property's events are all prior to the current 90 window. WDYT?

@tofumatt tofumatt removed the Next Up Issues to prioritize for definition label Mar 28, 2024
@kuasha420 kuasha420 self-assigned this Mar 31, 2024
@ivonac4 ivonac4 added the Team M Issues for Squad 2 label Apr 3, 2024
@kuasha420 kuasha420 mentioned this issue Apr 23, 2024
18 tasks
@kuasha420 kuasha420 added the QA: Eng Requires specialized QA by an engineer label Apr 30, 2024
@kuasha420 kuasha420 removed their assignment Apr 30, 2024
@nfmohit nfmohit self-assigned this May 1, 2024
@ivonac4 ivonac4 removed the Sp Wk 2 Issues to be completed in the second week of the assigned sprint label May 1, 2024
@nfmohit nfmohit assigned kuasha420 and unassigned nfmohit May 4, 2024
@kuasha420 kuasha420 assigned nfmohit and unassigned kuasha420 May 6, 2024
@nfmohit nfmohit removed their assignment May 7, 2024
@hussain-t hussain-t self-assigned this May 7, 2024
@hussain-t
Copy link
Collaborator

QA Update ❌

Great work, @kuasha420. The functionalities work as expected except for an issue regarding removing the transient when disconnecting Analytics.

  • Verified: The test environment was set up successfully, and connections were established to two different Analytics properties:
    • A property with existing data (oi.ie), active for over 7 days.
    • A property recently created without data.
  • isAudiencePartialData Selector
    • Verified: Works as expected on both properties. It correctly identified partial data states based on audience data availability relative to the selected date range.
  • isCustomDimensionPartialData Selector
    • Verified: Works correctly, showing partial data states when the data for the custom dimensions is insufficient.
  • isPropertyPartialData Selector
    • Verified: Accurately reflects the partial data state when the GA4 is still in the gathering data state.
  • getResourceDataAvailabilityDate Selector
    • Verified: The selector successfully retrieves the earliest event dates, and the data is persisted through the POST:save-resource-data-availability-date endpoint to WordPress Transients.
    • Verified: The persistence of data availability dates is independent of the partial data state.
  • Resetting Behavior
    • Verified: All related transients are reset upon Site Kit reset.
    • Verified: The following transients related to data availability dates are correctly reset when changing the account or property or measurement ID.
      • _transient_googlesitekit_audience_**_data_availability_date
      • _transient_googlesitekit_customDimension_**_data_availability_date
      • _transient_googlesitekit_property_**_data_availability
    • Issue Found: _transient_googlesitekit_audience_**_data_availability_date is not being deleted upon disconnecting the Analytics module. However, the other two transients are correctly removed. ❌

@nfmohit
Copy link
Collaborator

nfmohit commented May 9, 2024

Excellent catch, thank you @hussain-t! The follow-up PR has been merged and this is now back with you for another QA:Eng round.

@hussain-t
Copy link
Collaborator

QA Verified ✅

Issue Found: transient_googlesitekit_audience**_data_availability_date is not being deleted upon disconnecting the Analytics module. However, the other two transients are correctly removed. ❌

  • Verified: _transient_googlesitekit_audience_**_data_availability_date transient and other transients are removed upon disconnecting the Analytics module. ✅

@hussain-t hussain-t removed their assignment May 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Module: Analytics Google Analytics module related issues P1 Medium priority PHP QA: Eng Requires specialized QA by an engineer Team M Issues for Squad 2 Type: Enhancement Improvement of an existing feature
Projects
None yet
Development

No branches or pull requests

9 participants