Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

User-definable start of week #13090

Merged
merged 48 commits into from Oct 7, 2020
Merged

User-definable start of week #13090

merged 48 commits into from Oct 7, 2020

Conversation

sbelak
Copy link
Contributor

@sbelak sbelak commented Aug 10, 2020

Only a few DBs have explicit support for setting start of week (see bellow), but we can active the desired effect using date arithmetics (which we already support) by doing:

  1. add offset days
  2. truncate to week
  3. subtract offset days

where offset is relative to the DB start of week (eg. if start of week is Sunday, then Monday would be +1 and Saturday -1).

To fully support start of week on the BE we therefore need to:

  • add a method to the drivers that returns the DB start of week. For many DBs this will be simply hardcoded.
  • update sql.qp/date for :week, and day-of-week to take start of week into account either by performing the calculation above or using a native solution (sql.qp/date is overridden for every driver anyway so it doesn't make much difference).
  • add a setting for start of week. I think it's best to be explicit and use symbols for values (rather than being clever with indices)
  • add the setting for start of week to the UI.
  • tests
  • fix FE assumption about week always starting on Sunday https://github.com/metabase/metabase/blob/master/frontend/src/metabase/lib/formatting.js#L403-L415
  • correct week of year semantics
  • notify 3rd party driver authors about the changes

Open questions

  • Week of year is tricky. Some DBs use ISO definition of first week of year (week with the year's first Thursday in it), while others use the week that contains January 1st. Do we unify this and if so, what do we use? Any sort of unification will potentially alter results for some users. [I've decided to unify this]

  • What should be the default start of week? [I'd say keep Sunday for consistency sake while upgrading]

  • Should we display all 7 days in the dropdown or only Sat, Sun, & Mon?

* What do we do with :day-of-week date extractor? Currently we normalize this across DBs to ODBC standard (start on Sunday = 1). Should it change in accordance with selected start of week (so that whatever day is start of week has index 1). [My vote: yes]

* Where available should we set start of week as a session variable? If so, do we set it upon connection (and when set by user), or do we play it dumb but robust and do it on every query invocation? [My vote: no -- let's just compute it, feels like there's less chance of stumbling into corner cases. Also setting anything would be another instance of us not being read-only.]

* Is start of week a per instance, per db, or per user? [I think per-instance is fine, unless we need finer control for EE]

* for DBs that have globally settable start of week or follow a system setting, do we sync this setting on each sync? [My vote: yes. Seems harmless enough to do and gives the option of manual override]

* do we use DB specific (where available) means of getting the DB start of week, or simply a probing query (see #13090 (comment)) where start of week is not fixed? The latter has the benefit of not being DB specific.

DBs with explicit support for start of week

The rest of DBs we support either have a fixed day or you can pick between Sunday and Monday by using different functions.

Things left for a followup PR (to limit scope)

  • Getting the session start of week setting in the 3 DBs that support this and store it during sync. In the case of Oracle this is tricky, as it's entirely conceivable that people with DBs in multiple territories (eg availability zones) will have different start of week. To do this absolutely right we'd first have to do driver features dispatch per-db (see Move from per-driver features to per-db (have supports? take a database param as well) #13136).

  • update util.date-2/adjuster

@sbelak sbelak added this to Assigned/In progress in 0.36.1+ and 0.37 (old board -- see milestones instead) via automation Aug 10, 2020
@sbelak sbelak added this to the 0.37.0 milestone Aug 10, 2020
@codecov
Copy link

codecov bot commented Aug 10, 2020

Codecov Report

Merging #13090 into master will increase coverage by 0.03%.
The diff coverage is 75.34%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #13090      +/-   ##
==========================================
+ Coverage   83.01%   83.05%   +0.03%     
==========================================
  Files         326      326              
  Lines       25893    25934      +41     
  Branches     1889     1888       -1     
==========================================
+ Hits        21495    21539      +44     
+ Misses       2509     2507       -2     
+ Partials     1889     1888       -1     
Impacted Files Coverage Δ
src/metabase/models/database.clj 84.41% <ø> (ø)
src/metabase/driver/mysql.clj 35.00% <12.50%> (-0.45%) ⬇️
src/metabase/driver/postgres.clj 44.37% <18.18%> (+1.23%) ⬆️
src/metabase/models/setting.clj 88.85% <80.00%> (-0.55%) ⬇️
src/metabase/driver/sql/query_processor.clj 92.62% <94.73%> (-0.16%) ⬇️
src/metabase/driver.clj 89.51% <100.00%> (+0.17%) ⬆️
src/metabase/driver/common.clj 89.26% <100.00%> (+0.13%) ⬆️
src/metabase/driver/h2.clj 93.65% <100.00%> (+0.15%) ⬆️
src/metabase/public_settings.clj 97.18% <100.00%> (+0.04%) ⬆️
src/metabase/util/date_2.clj 74.53% <100.00%> (ø)
... and 7 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 22a09e3...f1aa790. Read the comment docs.

@sbelak sbelak changed the title User-definable start of week proposal User-definable start of week implementation proposal Aug 10, 2020
@paulrosenzweig
Copy link
Contributor

This is a hacky idea, but I wonder if we could avoid driver changes by deducing the offset in the query itself. i.e. do a date diff between date X and date_trunc('week', date X) to figure out how the db is set.

@sbelak
Copy link
Contributor Author

sbelak commented Aug 10, 2020

If nothing else, it's a good backup strategy. You could even just say day of week of truncated now.

@robdaemon
Copy link
Contributor

robdaemon commented Aug 10, 2020

What do we do with :day-of-week date extractor? Currently we normalize this across DBs to ODBC standard (start on Sunday = 1). Should it change in accordance with selected start of week (so that whatever day is start of week has index 1).

I say yes for consistency / no surprises

for DBs that have globally settable start of week or follow a system setting, do we sync this setting on each sync? [My vote: yes. Seems harmless enough to do and gives the option of manual override]

I say yes to syncing this regularly - this value comes from either a database setting or the locale the database operates in. (Russian locales use Monday as start of week, for example)

Is start of week a per instance, per db, or per user? [I think per-instance is fine, unless we need finer control for EE]

I was thinking per-instance - it seems like this is a business decision, as opposed to an individual user decision. (Imagine the confusion if two different people within the same org send conflicting dashboards because one is using Sunday as SOW and the other is using Monday)

do we use DB specific (where available) means of getting the DB start of week, or simply a probing query

Date truncation in MySQL is funky. This may need to be driver-specific, but the :default case can be the probing query?

Where available should we set start of week as a session variable?

Hrm. If this is asked during the initial setup, then a session variable might be fine. That said, if it's changed on a running instance with active users, they wouldn't see the change until a logout / login. This is a problem we already have with cached metadata. I think at some point we need a event channel where the backend can send events to the frontend when changes are made.

@camsaul
Copy link
Member

camsaul commented Aug 10, 2020

Haven't looked thru the code yet, but the approach as you've described it seems pretty solid.

What do we do with :day-of-week date extractor? Currently we normalize this across DBs to ODBC standard (start on Sunday = 1). Should it change in accordance with selected start of week (so that whatever day is start of week has index 1). [My vote: yes]

I agree, I think we should leave the MBQL the same for now and just have it behave differently based on the system start-of-week Setting.

Where available should we set start of week as a session variable? If so, do we set it upon connection (and when set by user), or do we play it dumb but robust and do it on every query invocation? [My vote: no -- let's just compute it, feels like there's less chance of stumbling into corner cases. Also setting anything would be another instance of us not being read-only.]

I agree that computing it in the query would be more robust and a better approach. I'm hoping in the future we can move away from using session variables entirely and compute timezone offsets directly in the query as well instead of setting session timezones. Session variables are finicky

Is start of week a per instance, per db, or per user? [I think per-instance is fine, unless we need finer control for EE]

I think for now doing it per-instance should be fine, we can always make it finer-grained in the future. My preference would be to make it a Setting sort of like report-timezone, and then various implementations can call (public-settings/first-day-of-week) (or whatever the Setting ends up being called) and do the appropriate calculations. If we wanted to make it finer-grained in the future we could just change the function implementation

for DBs that have globally settable start of week or follow a system setting, do we sync this setting on each sync? [My vote: yes. Seems harmless enough to do and gives the option of manual override]

It seems like this query would be pretty quick so we might as well do it on every sync. I guess we'd only need to do this for DBs that support changing the default start of week via a config setting, right?

do we use DB specific (where available) means of getting the DB start of week, or simply a probing query (see #13090 (comment)) where start of week is not fixed? The latter has the benefit of not being DB specific.

It seems like we'd only need to implement this method for a few databases that have a specific config setting for default start of week. It seems like in those cases it's easy enough to just have a custom DB-specific implementations to fetch that specific parameter. Maybe the implementations would look something like

;; DB that supports custom start of weeks
(defmethod driver/db-start-of-week :sqlserver
  [_ database]
  (let [[{:keys [start]}] (jdbc/query (connection-details database) "SELECT @@datefirst AS start")]
    (case start
      1 :monday
      2 :tuesday
      ...)))

;; hardcode for other DBs
(defmethod driver/db-start-of-week :postgres
  [_ _]
  :sunday)

@salsakran
Copy link
Contributor

Regarding how granular to make the control over day of the week, I agree with @camsaul that per instance is a good starting point.

If people beat down our doors asking for per-user, we can revisit.

@sbelak
Copy link
Contributor Author

sbelak commented Aug 18, 2020

I've punted on looking at per-db start of week as it would upturn the entire MBQL processor API (for all drivers) which I don't think it's worth it. This is only an issue for SQL Server, Oracle, and Snowflake; and only if non-default start of week is set. In that case the users might have to set a "wrong" day in settings for the math to work out (eg. Saturday instead of Sunday). The only unresolvable situation is if you have multiple of these DBs with different start of weeks set, but that seems like enough of a corner case and bad practice that it's fine if we punt on this for now.

The "correct" solution for this would be to have features per-db rather than per-driver. I've compiled other reasons why we'd want to do this (and a proposal how to do it) here: #13136

@sbelak
Copy link
Contributor Author

sbelak commented Aug 19, 2020

Potential migration issue: currently drivers are inconsistent in what we return when aggregating by week, some are normalized to Sunday, but not all. Once this gets merged all will work the same which will potentially change what existing cards show. Is this an issue?

@sbelak
Copy link
Contributor Author

sbelak commented Sep 29, 2020

Holding off on fixing Spark SQL until we decide if we bump minimal supported version to 2.3. See also #11023

@mazameli
Copy link
Contributor

Pushed a commit to fix up the copy and move the setting up on the screen:

image

Copy link
Contributor

@paulrosenzweig paulrosenzweig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall logic/approach look good! I had to scribble down some tables to understand the mods in the adjust functions.

I think it'd be good to add some Cypress tests since there are a few places where the FE and BE need to agree (week ranges and day of week integers).

(- offset) :day)
(truncate-fn expr))))

(defn adjust-day-of-week
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't test, but I think this will break in the client. We need to remap the offset numbers to the correct labels based on the setting.

Comment on lines 7657 to 7670
- changeSet:
id: 267
author: sb
comment: Added 0.37.0
changes:
- addColumn:
tableName: metabase_database
columns:
- column:
name: start_of_week
type: varchar(32)
defaultValue: sunday
constraints:
nullable: false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused why start_of_week is stored for each db. Do we set it during sync? It looks like each driver hardcodes a value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is in preparation for when we do this per-db, but I've punted on this for v1 to limit scope.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we remove the migration now and add it back when it's used? There always some risk that future features get punted indefinitely and then we have an extra column sitting around.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we're not using this yet, might as well remove the migration and pull the next changeset ids up. if you do this make sure to update #migrations on slack

target-start-of-week (.indexOf ^clojure.lang.PersistentVector days-of-week (setting/get-keyword :start-of-week))
delta (int (- target-start-of-week db-start-of-week))]
(* (Integer/signum delta)
(- 7 (Math/abs delta)))))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might have missed them, but I didn't see tests for this. It's probably worth adding some tests around start-of-week-offset and the adjust functions. I feel like it'd be easy for an update down the line to introduce an off-by-one error.

0.36.1+ and 0.37 (old board -- see milestones instead) automation moved this from Waiting for review to Reviewer approved Oct 1, 2020
{$subtract [column
{$multiply [{$subtract [(day-of-week column)
1]}
(* 24 60 60 1000)]}]})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how will this behave during a daylight savings time transition?

Copy link
Contributor Author

@sbelak sbelak Oct 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would imagine poorly, but we had this code for a few years now and nobody complained yet. I don't think it's in scope of this PR, I'll open a new issue for it though -- #13344

@sbelak sbelak merged commit d477106 into master Oct 7, 2020
0.36.1+ and 0.37 (old board -- see milestones instead) automation moved this from Reviewer approved to Done Oct 7, 2020
@sbelak sbelak deleted the configurable-start-of-week branch October 7, 2020 17:34
@sbelak sbelak changed the title User-definable start of week implementation proposal User-definable start of week Oct 7, 2020
@flamber flamber removed this from the 0.37.0 milestone Oct 7, 2020
robdaemon pushed a commit that referenced this pull request Oct 9, 2020
* Add `db-start-of-week` multimethod

* Update drivers & settings [ci drivers]

* Use tru in setting def [ci drivers]

* Remove reflections [ci drivers]

* Fix migration [ci drivers]

* Update adjust

* Punt on adjust for now

* Fix tests [ci drivers]

* Fix tests [ci drivers]

* Add missing args [ci drivers]

* Fix tests [ci all]

* Short-circuit if offset = 0 [ci drivers]

* Fix week of year test [ci drivers]

* h2: use iso-week [ci drivers]

* Typo [ci drivers]

* Add UI for setting

* fix FE assumptions

* Unified week-of-year [ci drivers]

* Make h2 work regardless of locale [ci drivers]

* h2: make `add-interval-honeysql-form` work with expression amounts [ci drivers]

* Fix h2 tests [ci all]

* Don't rely on int division [ci drivers]

* Better week-of-year [ci drivers]

* Make linter happy [ci drivers]

* More celanup [ci drivers]

* Typo [ci drivers]

* Fix hive & snowflake [ci drivers]

* Add missing require [ci drivers]

* Bump hive [ci drivers]

* Mongo: add week-of-year [ci all]

* Druid: week-of-year [ci drivers]

* Fix Mongo [ci drivers]

* More fixes [ci drivers]

* Fix oracle [ci drivers]

* tweak copy and move the setting up

* Oracle: use to_char [ci drivers]

* More tests [ci drivers]

* Remove unneeded migration [ci drivers]

* Druid: don't be fancy with woy [ci drivers]

* Remove references to `start_of_week` from tests [ci drivers]

* Druid: use correct values in test [ci drivers]

* More tests [ci drivers]

* Typo [ci drivers]

* Fix Google driver [ci drivers]

Co-authored-by: Paul Rosenzweig <paul.a.rosenzweig@gmail.com>
Co-authored-by: Maz Ameli <maz@metabase.com>
Co-authored-by: Cam Saul <github@camsaul.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

8 participants