-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exclude sample content and Metabase Analytics from usage stats #41519
Exclude sample content and Metabase Analytics from usage stats #41519
Conversation
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At first I was appalled at doing table scans to filter such potentially-large tables, but then I realized we're already pulling them all into memory in any case - ouch! 😆
:join [[(t2/table-name Dashboard) :d] [:= :d.id :dc.dashboard_id]] | ||
:where [:not [:= :d.creator_id config/internal-mb-user-id]]})] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's probably more efficient to flip this around and skip the join by doing something like [:not [:in :dc.dashboard_id the-really-small-number-of-sample-dashboard-ids]]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it matter? This query gets every dashcard in the app DB anyway
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My intuition for how databases work, and the related costs, is very rotted, so I may be way off.
I suspect that a table scan with a join is a lot more expensive than a plain table scan, I'd expect it to build a temporary table or do sub-selects.
I don't think the query analyzer has access to statistics that would suggest doing this inversion internally, and it's not any more complicated. Not gonna block on it, this namespace is ripe for broader optimizations.
@crisptrutski Sorry, I updated the PR because I decided to also include the Metabase Analytics changes. Can you review again? |
src/metabase/analytics/stats.clj
Outdated
@@ -298,8 +302,8 @@ | |||
(defn- collection-metrics | |||
"Get metrics on Collection usage." | |||
[] | |||
(let [collections (t2/select Collection) | |||
cards (t2/select [Card :collection_id])] | |||
(let [collections (t2/select Collection :creator_id [:not= config/internal-mb-user-id]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to all the blood in the last CI run, Collection
doesn't have such a column 😢
I'm guessing you can't abuse :personal_owner_id
without hiding it from other users, so I'm not sure what you can do here.
One dubious option would be to remove collections which only contain sample cards 🥴
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I've updated this. I had some local state tricking me into thinking there was a creator_id column.
See the latest version https://github.com/metabase/metabase/pull/41519/files#diff-96de37bff643ff1187fa863f5a14541a6c03a6c2c20deaa7e043a0fcf8f54a08R305
tableName: collection | ||
columns: | ||
- column: | ||
name: is_sample |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool 👍
62778f9
into
all-users-curate-example-collection
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re-ticking ✅ ✅
A slightly philosophical question - at what point does sample content become user content?
e.g.
- a customer renames and adds new cards to a collection
- a customer edits a card substantially and adds it to another dashboard
etc.
Made a clone because I accidentally merged this into the parent branch #41522 |
Good question, it won't ever change. Once something is created as sample content it stays sample content. A sample content card that is renamed will stay sample content. A new card/dashboard in a sample content collection will not be counted as sample content though. I think this is ok since this is just for our own internal statistics and probably only need to be like 99% correct to be useful. |
Yeah it's probably better to keep things simple than to navigate such a grey area. If we were going to dip our toes into refining things, I was thinking we could compare created and updated times. |
Now that we're adding sample content to new instances in #40753, this PR excludes the content from usage stats. The relevant section of the product doc is here:
It also excludes Metabase Analytics content (behaviour approved here by Arakaki)
Both changes the same approach:
is_sample=true
from the stats. This required creating a newis_sample
column on thecollection
table.