Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Domain entities #10437

Merged
merged 25 commits into from Aug 13, 2019

Conversation

@sbelak
Copy link
Contributor

commented Jul 31, 2019

This is the core domain entities PR. It adds all the parsing and binding machinery and plugs DE into transforms. The followup PR will infer DEs during sync and persist them (this is currently done manually in transforms).

One thing I'm not sure about is if it's correct to enforce that fields that are required are specified by their type -- or -- that transforms check their outputs against the domain entity spec. Perhaps the correct approach is for transforms to have custom validation of results which treats dimension names as (also) types. However that will also necessitate relaxing the schema for DE spec in that it will no longer check if a given :field is in fact derived from :type/*. Enforcing the outputs to be types and transforms to check their output would mean that for most transforms you'd also have to fiddle with types.clj.

Note: for clarity this PR is based on #10105

PR roadmap:

  • sync #10457
  • joins
  • x-rays MVP
  • YAML writing tools & niceties
  • x-rays full port (split for readability/to keep PRs compact)

@sbelak sbelak requested review from camsaul, kdoh and tlrobinson as code owners Jul 31, 2019

@sbelak sbelak requested a review from salsakran Jul 31, 2019

@sbelak sbelak changed the base branch from master to mbtl Jul 31, 2019

@sbelak sbelak removed the request for review from salsakran Jul 31, 2019

(cond
field (some (fn [col]
(or (isa? (field-type col) field)
(= (:name col) (name field))))

This comment has been minimized.

Copy link
@sbelak

sbelak Jul 31, 2019

Author Contributor

This is the part I'm unsure about. Allowing matching by name opens a whole new can of worms but it's useful in 2 scenarios:

  1. transforms
  2. vendor-specific DEs where you then wouldn't have to add new types to account for all the particulars of their schema

This comment has been minimized.

Copy link
@camsaul

camsaul Aug 1, 2019

Member

should the name comparison be case-insensitive?

This comment has been minimized.

Copy link
@sbelak

sbelak Aug 1, 2019

Author Contributor

I think it's fine as you'd only use this to target known names

sbelak added 3 commits Jul 31, 2019
(defn- has-attribute?
[entity {:keys [field domain_entity has_many]}]
(cond
field (some (fn [col]

This comment has been minimized.

Copy link
@camsaul

camsaul Aug 1, 2019

Member

I would just use when here since you only have one condition. Also you probably don't use domain_entity or has_many so you can take those out of the destructuring binding

This comment has been minimized.

Copy link
@sbelak

sbelak Aug 1, 2019

Author Contributor

This one was just me being lazy. The followup which adds joins/nested domain entities has a fully written out cond.

This comment has been minimized.

Copy link
@camsaul
[:field-id id]
[:field-literal name base_type]))

(defn- has-attribute?

This comment has been minimized.

Copy link
@camsaul

camsaul Aug 1, 2019

Member

What exactly does "attribute" mean in this case? If entity has a similar/matching field?

This comment has been minimized.

Copy link
@camsaul

camsaul Aug 1, 2019

Member

Instead of making this function return a boolean value if a matching field (attribute?) exists, how about tweaking it to return the matching Field itself? It can still be used to check whether a matching attribute exists, but now it's more general and you can also use it to get that specific matching attribute if you need to use it somewhere. It's more Lispy

(defn- matching-attribute
  [entity {:keys [field]}]
  (when field
    (some
     (fn [col]
       (when (or (isa? (field-type col) field)
                 (= (:name col) (name field)))
         col))
     ((some-fn :fields :result_metadata) entity))))

This comment has been minimized.

Copy link
@sbelak

sbelak Aug 1, 2019

Author Contributor

Attribute is meant as a catch-all for required properties or constraints. So it can be having a field of some type, or being joined to another domain entity 1-1 or 1-many.

(= (:name col) (name field))))
((some-fn :fields :result_metadata) entity))))

(defn satisfies-requierments?

This comment has been minimized.

Copy link
@camsaul

camsaul Aug 1, 2019

Member

requierments

This comment has been minimized.

Copy link
@camsaul

camsaul Aug 7, 2019

Member

Still mispelled 😉

This comment has been minimized.

Copy link
@sbelak

sbelak Aug 10, 2019

Author Contributor

stale

"MBQL clause (ie. a vector starting with a keyword)"
(s/pred mbql.u/mbql-clause?))

(def FieldType

This comment has been minimized.

Copy link
@camsaul

camsaul Aug 1, 2019

Member

I think we already have this schema in metabase.util.schema


(def ^:private DomainEntityReference s/Str)

(def ^:private DomainEntityType (s/constrained s/Keyword #(isa? % :DomainEntity/*)))

This comment has been minimized.

Copy link
@camsaul

camsaul Aug 1, 2019

Member

There's actually an s/isa schema you can use here instead which will probably give you better error messages if the check fails

This comment has been minimized.

Copy link
@camsaul

This comment has been minimized.

Copy link
@sbelak

sbelak Aug 10, 2019

Author Contributor

stale

(s/optional-key :segments) Segments
(s/optional-key :breakout_dimensions) BreakoutDimensions})

(defn- add-to-hiearchy!

This comment has been minimized.

Copy link
@camsaul

camsaul Aug 1, 2019

Member

hiearchy

This comment has been minimized.

Copy link
@camsaul

camsaul Aug 7, 2019

Member

Still misspelled

This comment has been minimized.

Copy link
@sbelak

sbelak Aug 10, 2019

Author Contributor

stale


(defn topological-sort
"Topologically sorts vertexs in graph g.
https://en.wikipedia.org/wiki/Topological_sorting"

This comment has been minimized.

Copy link
@camsaul

camsaul Aug 1, 2019

Member

Can we get a little example usage? I'm having a hard time picturing how to use this without seeing it

Also, is this general enough to live in metabase.util? My thinking is stuff that is in metabase.util is stuff that could go in a Metabase version of Medley if we were to spin it off. There's definitely some stuff in there that doesn't fit that criteria (e.g. u/host-up?). I've been trying to work on moving those elsewhere to keep this namespace full of super-general things but it's an ongoing project

But it might make more sense to make a separate entities utils namespace for this sort of stuff 🤷‍♂

This comment has been minimized.

Copy link
@sbelak

sbelak Aug 1, 2019

Author Contributor

The followup PRs use it in some more places. I'm not super set on it being in util but it's one of those things that crops up occasionally as it's also useful for cycle detection.

(with-open [ds (Files/newDirectoryStream dir)]
(->> ds
(filter (comp #(str/ends-with? % ".yaml") str/lower-case (memfn ^Path getFileName)))
(mapv (partial load constructor)))))))

This comment has been minimized.

Copy link
@camsaul

camsaul Aug 1, 2019

Member

I don't know if this is already the case (and, if not, not really 100% relevant to this PR), but it would be really cool if the logic for generating the YAML was 100% separate from writing to a file. The YAML generation logic would output a string and some other function would handle writing the string to a file. That way we could easily test the output without having to worry about reading it from files. And debugging in the REPL would be easier too.

If you really wanted to get Lispy you could just have the YAML generation code write everything to standard out with printlns and the like and then wrap the call in with-out-str if you wanted a string, or otherwise rebind *out* as needed if you wanted to do something wacky like stream the YAML as it's generated. That would make debugging even easier than always generating a string

This comment has been minimized.

Copy link
@salsakran

salsakran Aug 6, 2019

Contributor

:whatcamsaid:

(-> test-domain-entity-specs vals (#'de/best-match) :name))


(expect

This comment has been minimized.

Copy link
@camsaul

camsaul Aug 1, 2019

Member

what are these tests testing?

This comment has been minimized.

Copy link
@camsaul

camsaul Aug 7, 2019

Member

Can you please add a comment explaining what we're testing here? That way if the test fails we will have an easier time figuring out what's going on. Or if we need to add new tests it will be easier to tell what's already adequately covered.

This comment has been minimized.

Copy link
@sbelak

sbelak Aug 10, 2019

Author Contributor

stale

(defn- hydrated-table
[table-name]
(let [table (-> table-name data/id Table)]
(assoc table :fields (table/fields table))))

This comment has been minimized.

Copy link
@camsaul

camsaul Aug 1, 2019

Member

you can do (toucan/hydrate table :fields)

This comment has been minimized.

Copy link
@camsaul

This comment has been minimized.

Copy link
@sbelak

sbelak Aug 10, 2019

Author Contributor

stale

required-dimensions
(->> result-step bindings :dimensions keys))))))
(-> result-step bindings :entity u/get-id))))))))
(for [domain-entity-name provides]

This comment has been minimized.

Copy link
@camsaul

camsaul Aug 1, 2019

Member

since you're binding *driver* and the QP store you should use either doall or vec here to force the lazy seq generated by for, otherwise those bindings won't be in play when the body of the for gets evaluated:

e.g.

(def ^:dynamic *dynamic-var* :x)

(defn x []
  (binding [*dynamic-var* :y]
    (for [i (range 100)]
      *dynamic-var*)))

(drop 99 (x)) ;; -> [:x]

This comment has been minimized.

Copy link
@camsaul

camsaul Aug 1, 2019

Member

vs

(defn y []
  (binding [*dynamic-var* :y]
    (doall
     (for [i (range 100)]
       *dynamic-var*))))

(drop 99 (y)) ;-> [:y]

This comment has been minimized.

Copy link
@camsaul

This comment has been minimized.

Copy link
@sbelak

sbelak Aug 10, 2019

Author Contributor

There's no for in there. That code is completely refactored

bindings))))

(defn- store-requirements!
[db-id requirements]
(s/defn ^:private store-requirements!

This comment has been minimized.

Copy link
@camsaul

camsaul Aug 1, 2019

Member

what side effect does this function have?

This comment has been minimized.

Copy link
@sbelak

sbelak Aug 1, 2019

Author Contributor

adds stuff to qp.store

(qp.store/fetch-and-store-database! db-id)
(qp.store/fetch-and-store-fields!
(mapcat (comp (partial map u/get-id) :fields :entity val) requirements)))
(mapcat (comp (partial map u/get-id) :fields :entity val) bindings)))

(s/defn apply-transform!

This comment has been minimized.

Copy link
@camsaul

camsaul Aug 1, 2019

Member

I am really having trouble understanding what this function is supposed to be doing. It looks like it's just returning a sequence of IDs? But it's called apply-transform! 😕

This comment has been minimized.

Copy link
@sbelak

sbelak Aug 1, 2019

Author Contributor

It's the IDs of newly created cards materializing given transform. I'm not 100% on the return value. It could be that it's more correct to return the full cards or something like that.

@camsaul

This comment has been minimized.

Copy link
Member

commented Aug 1, 2019

Another thing I was thinking about:

The QP store is really just an internal cache used by the QP for the duration of a query execution and using it in non-QP related code really creates messy dependencies between different modules that I think are a clear sign we're doing something wrong. Calling individual pieces of QP middleware separately isn't generally something that should ever be needed outside of the QP itself. If it is needed somewhere else, we should consider whether that sort of stuff can be moved into the MBQL library itself or elsewhere.

Generally I don't think this code should be using any QP code besides the public interface in metabase.query-processor (e.g. the new expected columns function), and if there is some reason it needs to use internal QP functions, we should address the reasons why that is the case (or move those QP functions to the MBQL lib)

@sbelak

This comment has been minimized.

Copy link
Contributor Author

commented Aug 1, 2019

Now that we have qp/query->expected-cols, we don't need the qp.store stuff anymore, nor do we need to touch any private part of qp.

@sbelak sbelak referenced this pull request Aug 2, 2019
9 of 11 tasks complete
@sbelak

This comment has been minimized.

Copy link
Contributor Author

commented Aug 2, 2019

I have implemented all the suggestions plus the refactor of transforms.

@salsakran
Copy link
Contributor

left a comment

This seems sane.

Can we refactor the old X-Rays into this rough shape?

@salsakran
Copy link
Contributor

left a comment

I take it back, in looking at #10457, I think we should have a different name for the resource/domain_entities and the instantiated domain entities that are persisted to the application DB in that PR.

I think we settled on something like "domain entity specs" or something similarly horrible (but clear).

@sbelak

This comment has been minimized.

Copy link
Contributor Author

commented Aug 6, 2019

@salsakran yes. My thinking was to keep parity with the namespace, but /domain_entity_specs makes more sense, I'll change it.

@sbelak

This comment has been minimized.

Copy link
Contributor Author

commented Aug 6, 2019

I've pulled some changes from the sync PR to this one so it's more self-contained and with a clear API so that we can defer adding to the sync step as per @salsakran's request

@camsaul
Copy link
Member

left a comment

There's still a bunch of stuff in here I pointed out like missing docstrings or places where lazy sequences have a really good chance of ruining our day that hasn't been fixed

@sbelak

This comment has been minimized.

Copy link
Contributor Author

commented Aug 10, 2019

@camsaul I'm pretty sure you looked at a stale version. All the things you've bumped have been fixed

sbelak added 3 commits Aug 10, 2019

@sbelak sbelak merged commit e80ddea into mbtl Aug 13, 2019

37 checks passed

Cam's Next-Level CLA Bot sbelak does not need to sign the CLA
Details
ci/circleci: be-deps Your tests passed on CircleCI!
Details
ci/circleci: be-linter-bikeshed Your tests passed on CircleCI!
Details
ci/circleci: be-linter-docstring-checker Your tests passed on CircleCI!
Details
ci/circleci: be-linter-eastwood Your tests passed on CircleCI!
Details
ci/circleci: be-linter-namespace-decls Your tests passed on CircleCI!
Details
ci/circleci: be-linter-reflection-warnings Your tests passed on CircleCI!
Details
ci/circleci: be-tests Your tests passed on CircleCI!
Details
ci/circleci: be-tests-bigquery Your tests passed on CircleCI!
Details
ci/circleci: be-tests-druid Your tests passed on CircleCI!
Details
ci/circleci: be-tests-googleanalytics Your tests passed on CircleCI!
Details
ci/circleci: be-tests-java-11 Your tests passed on CircleCI!
Details
ci/circleci: be-tests-migrate-to-mysql Your tests passed on CircleCI!
Details
ci/circleci: be-tests-migrate-to-postgres Your tests passed on CircleCI!
Details
ci/circleci: be-tests-mongo Your tests passed on CircleCI!
Details
ci/circleci: be-tests-mysql Your tests passed on CircleCI!
Details
ci/circleci: be-tests-oracle Your tests passed on CircleCI!
Details
ci/circleci: be-tests-postgres Your tests passed on CircleCI!
Details
ci/circleci: be-tests-presto Your tests passed on CircleCI!
Details
ci/circleci: be-tests-redshift Your tests passed on CircleCI!
Details
ci/circleci: be-tests-snowflake Your tests passed on CircleCI!
Details
ci/circleci: be-tests-sparksql Your tests passed on CircleCI!
Details
ci/circleci: be-tests-sqlite Your tests passed on CircleCI!
Details
ci/circleci: be-tests-sqlserver Your tests passed on CircleCI!
Details
ci/circleci: be-tests-vertica Your tests passed on CircleCI!
Details
ci/circleci: build-uberjar Your tests passed on CircleCI!
Details
ci/circleci: checkout Your tests passed on CircleCI!
Details
ci/circleci: fe-deps Your tests passed on CircleCI!
Details
ci/circleci: fe-linter-eslint Your tests passed on CircleCI!
Details
ci/circleci: fe-linter-flow Your tests passed on CircleCI!
Details
ci/circleci: fe-linter-prettier Your tests passed on CircleCI!
Details
ci/circleci: fe-tests-e2e Your tests passed on CircleCI!
Details
ci/circleci: fe-tests-integration Your tests passed on CircleCI!
Details
ci/circleci: fe-tests-karma Your tests passed on CircleCI!
Details
ci/circleci: fe-tests-unit Your tests passed on CircleCI!
Details
ci/circleci: yaml-linter Your tests passed on CircleCI!
Details
ci/dockercloud Your image was successfully built in Docker Hub
Details
@sbelak sbelak referenced this pull request Aug 13, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.