Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sample content #40753

Merged
merged 49 commits into from
Apr 12, 2024
Merged

Sample content #40753

merged 49 commits into from
Apr 12, 2024

Conversation

calherries
Copy link
Contributor

@calherries calherries commented Mar 28, 2024

Epic: #40066

Follows #40907, which was a pre-requisite for this change

To verify:

  • drop the app DB, and if using Postgres or MySQL re-create it
  • start up metabase

During setup, we insert sample content from resources/sample-content.edn into the app DB as a migration.
The sample content includes the "Examples" collection, which contains the "E-commerce insights" dashboard and its dependencies.

The approach avoids having to maintain any code to generate the collection and its contents, because it's just data. This approach ensures the contents is forward compatible to the same extent that any dashboard/card/collection is forward compatible, because we have to keep these entities forward compatible anyway.

The content comes from this collection: https://stats.metabase.com/collection/1449-example-dashboard-prototype

Testing strategy

The testing strategy for this is controversial, and worth questioning. There's two things you should know:

  1. e2e tests run without the sample content
    There are two reasons for this:
    (a) we don't want to couple e2e tests to the sample content to make it easy to update
    (b) updating the e2e tests are a pain, because there are a lot of tests that depend on the sample content not being there

However, I would argue we can get away with minimal test coverage using the following logic:

1. we promise users that "previously created content" will work after any upgrade in the future
2. we (should) have tests that enforce this property
3. whenever an upgrade happens, the sample content is always "previously created content" because it is created when a metabase app DB is initialized as a DB migration
4. therefore, we have tests to enforce that the sample content will work after any upgrade in the future

Also

1. we have tests that check the behaviour of content created with the metabase UI works as expected
2. the sample content is created with the metabase UI
3. therefore, we have tests that prove the sample content works as expected
  1. Most of the backend tests run without the sample content. This is just to keep the test suite fast, because with the sample content the postgres and mysql test suites were taking over 90 minutes to complete. This carries substantial risk, but I don't have an alternative solution. There's potentially room to increase the number of tests where the sample content is present and it could be important though, e.g. loading from h2 tests. My worry is any code that depends on an initialized metabase instance having no content may pass tests but fail in production. As far as I'm aware there's no existing code of this nature but it's possible this could exist in the future. At least in development, we would catch the issue.

@metabase-bot metabase-bot bot added the .Team/BackendComponents also known as BEC label Mar 28, 2024
@calherries calherries changed the title initial dashboard loading Sample dashboard Mar 28, 2024
Copy link

replay-io bot commented Mar 28, 2024

Status Complete ↗︎
Commit a195d56
Results
⚠️ 12 Flaky
2395 Passed

@calherries calherries marked this pull request as draft March 28, 2024 18:05
@calherries calherries added the no-backport Do not backport this PR to any branch label Apr 1, 2024
@calherries calherries marked this pull request as ready for review April 1, 2024 14:28
@calherries calherries changed the title Sample dashboard Sample content Apr 1, 2024
@calherries calherries requested a review from npretto April 1, 2024 14:35
@calherries
Copy link
Contributor Author

calherries commented Apr 4, 2024

I feel that I'm missing something here re: substantial risk. Besides the "load-from-h2" scenario, what's the worst kind of issue you could anticipate slipping through?

This is a very good question, and I don't have a great answer. My worry is any code that depends on an initialized metabase instance having no content may pass tests but fail in production. As far as I'm aware there's no existing code of this nature but it's possible this could exist in the future. At least in development, we would catch the issue.

Copy link
Member

@albertoperdomo albertoperdomo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • It looks like we could tweak a few of the question names - @vbenedetti ?
    • Some questions have - Maz in the name. We should probably drop that
    • (not binned yet) - is this question used or was it replaced with a new one
    • Maybe we should not use the auto-generated names?

Also, is it me? But I can't seem to find the dashboard itself?

@calherries
Copy link
Contributor Author

calherries commented Apr 10, 2024

@albertoperdomo @vbenedetti I've tweaked the content to incorporate these changes:

@calherries
Copy link
Contributor Author

calherries commented Apr 11, 2024

@albertoperdomo @vbenedetti I combed through the content and realised there were a few errors in the queries and places where the dashboard could be improved. I've made the following changes:

  • "Buyer's ages by segments" (which I renamed to "Buyers by age group") was showing sum of counts of people by individual ages, but the x-axis was the count of people in each age, not the age.
    Now it shows the number of people in each 5 year age range.
  • There were a few questions in "Demographics" that had auto-binned ages, which meant strange bins of 7.5 years, e.g. "22.5-30". Now they have bins of 5 years, e.g. "20-25".
  • Trend charts like "Revenue per quarter" had N/A for the previous quarter/month and now they report the last two quarters/months.
  • Time-series graphs had either hard-coded date filters or no date filters applied at all, which meant the data went to 2026. Now I added filters for these questions with orders.created_at in the last 24 months
  • "Unique customers per month" was using people.created_at, which grouped customers by the time they signed up. So I changed it to group by orders.created_at, so the data matches the title
  • The Location filter (which applies to people.state) was an input box, so I changed it to a dropdown list
  • I deleted questions we weren't using

I'd appreciate if you (or one of you) double-checked these changes, comparing against stats, and approve this PR when you're done.

@calherries calherries merged commit bdb34a3 into master Apr 12, 2024
105 checks passed
@calherries calherries deleted the cal-sample-dashboard branch April 12, 2024 16:21
Copy link

@calherries Did you forget to add a milestone to the issue for this PR? When and where should I add a milestone?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
no-backport Do not backport this PR to any branch .Team/BackendComponents also known as BEC
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants