Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spike: metadata completeness #39

Open
murdo-moj opened this issue Feb 20, 2024 · 7 comments
Open

Spike: metadata completeness #39

murdo-moj opened this issue Feb 20, 2024 · 7 comments
Assignees

Comments

@murdo-moj
Copy link
Contributor

murdo-moj commented Feb 20, 2024

Following on from #62

Create a report/dashboard which pulls from postgres (DataHub entity store) to display metadata completeness

Could be:

  • Superset dashboard
  • A page on find-moj-data which live queries postgres
  • A scheduled workflow on find-moj-data which feeds a static file for a page, to avoid loading the database so much
@seanprivett
Copy link

seanprivett commented Feb 28, 2024

Investigate how the analytics measures are generated in DataHub - any reliability issues
Present analytics from tile in DataHub UI
Use HTML rather than bespoke tools
Not plugged in to the front end at this stage

@seanprivett seanprivett changed the title Create metadata completeness report Spike: metadata completeness Feb 28, 2024
@jemnery jemnery assigned jemnery and unassigned jemnery Mar 1, 2024
@murdo-moj
Copy link
Contributor Author

@murdo-moj
Copy link
Contributor Author

eg

    getAnalyticsCharts {
        groupId
        title
        charts {
            __typename
        }
    }
}

@tom-webber tom-webber self-assigned this Mar 12, 2024
@tom-webber
Copy link
Contributor

tom-webber commented Mar 13, 2024

We want to know how the analytics from the top tiles on the DataHub Analytics page are generated.

These tiles are generated from the getHighlights operation. Resulting graphql:

query getHighlights {
  getHighlights {
    value
    title
    body
    __typename
  }
}

value

Resolver:

Retrieves the Highlights to be rendered of the Analytics screen of the DataHub application.

Relevant note (Feb 2023):

  /** TODO: Config Driven Charts Instead of Hardcoded. */

@tom-webber
Copy link
Contributor

There was interest in tracking frontend activity with Google Analytics. This is something that's possible to do with DataHub as well. It's currently only available if we maintain a fork of DataHub.

@tom-webber
Copy link
Contributor

tom-webber commented Mar 14, 2024

The 'Datasets' count widget includes the following text:

"0.00% have owners, 83.42% have tags, 0.00% have glossary terms, 100.00% have description, 99.41% have domain assigned!"

The description number isn't matching with the descriptions we're seeing on the platform, as much less than 100% of the entities have populated descriptions. The numerator for this percentage comes from here in the analytics derivation code.

According to a response in the DataHub slack:

there are 2 descriptions in DataHub: native descriptions and source descriptions (those authored in external platform). This hasDescription is only matching against the source description, e.g. that which comes from the original platform

If we want to view the count for the in-catalogue description, we need to inspect a different property: the editableProperties.description, rather than properties.description (as per slack):

DataHub always maintains 2 distinct descriptions - one from the underlying system and another that is "editable" inside of DataHub. (properties.description and editableProperties.description), respectively.
Please use the editableProperties.description field to read the DataHub-native description that was edited in the UI

It may be that more complex logic is required for calculation of the documentation numerator, as there doesn't appear to be an equivalent hasDescription boolean for the editable description, only the editedDescription field.

@seanprivett
Copy link

Investigation complete: team to have a discussion about outcome and which tickets need to be created off the back of this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Review
Development

No branches or pull requests

4 participants