Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent normalization of records #18

Closed
wants to merge 2 commits into from
Closed

Prevent normalization of records #18

wants to merge 2 commits into from

Conversation

eneroth
Copy link
Contributor

@eneroth eneroth commented Dec 20, 2021

Records behave both like maps and collections, but should probably be treated as opaque by Pyramid.

This prevents,

(pyramid/db [{:my/id  "hello"
              :a-uri  (lambdaisland.uri/uri "http://www.example.com")}])

from turning into,

{:my/id {"hello" {:my/id "hello",
                  :a-uri ([:fragment nil]
                          [:query nil]
                          [:path nil]
                          [:port nil]
                          [:host "www.example.com"]
                          [:password nil]
                          [:user nil]
                          [:scheme "http"])}}}

And instead returns,

{:my/id {"hello" {:my/id "hello"
                  :a-uri #lambdaisland/uri "http://www.example.com"}}}

This is in reference to #17.

- Records behave both like maps and collections, but should probably be treated as opaque by Pyramid.
@eneroth eneroth changed the title Prevent normalization of records Draft: Prevent normalization of records Dec 20, 2021
- Tests if records are (intentionally not) normalized
- Tests if the record is treated differently because it incidentally contains a field named "id"
@eneroth eneroth changed the title Draft: Prevent normalization of records Prevent normalization of records Dec 20, 2021
@eneroth eneroth mentioned this pull request Dec 20, 2021
Comment on lines +56 to +58
;; Don't normalize records
(record? v) v

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not so sure this is right yet. I'm not sure it's wrong yet either.

The root cause of #17 is that in main this takes the (map? v) path which calls (empty v). In Clojure this is an error, it seems that in ClojureScript this returns nil.

Another potential fix would be to appropriately consume and reconstruct the record here. I've been reading the source for clojure.walk a lot recently and the way they do it is by using the record v as the initial arg, e.g.

Suggested change
;; Don't normalize records
(record? v) v
;; Normalize records while preserving type
;; TODO could we move this into the `map?` branch and just change what we pass in to `into`
(record? v) (into v (map (juxt first (comp #(replace-all-nested-entities identify %) second))) v)

I haven't tested this yet but something akin to that should preserve the type of v while also allowing us to descend into it and continue normalizing. In the case of the lambdaisland/uri data, I expect it wouldn't change anything about it unless you configured your identify function to do something special with the shape of URI records. What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting! But isn't the semantics of normalizing something from inside a record rather dubious?

Ie., if Pyramid finds something inside the record that looks like an entity, is it correct to modify the record and leave a reference in its place? I'd guess that it would very, very rarely be correct. And if it's not correct, what's the point of descending into them? Unlike Walk, Pyramid has pretty clear semantics, and doesn't really need to do general purpose processing over arbitrary structures.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it's a bit nebulous at this point. I am not a heavy user of records, and so it's hard for me to anticipate how people might want to use them with pyramid.

Since I have to choose, I think I am leaning towards treating records like maps and normalizing them and their contents, since the point of records is that you have something that acts like clojure data but can be extended with additional protocols.

I think it's unlikely most records would contain information that should be normalized, but if they do, it would be unfortunate as a user of the library to be unable to pierce that veil.

Copy link
Contributor Author

@eneroth eneroth Dec 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the opposite is true as well: if the expectation is that the record subscribes to a certain shape, it would be unfortunate to have that shape changed in an unexpected manner.

Given that records behave like a type (but without having to create a special mini-DSL around it), enumerate their fields, and can have those fields type hinted, if they do take a map as one of the fields, I think it's unlikely that any protocol extending it would also expect to find an ident in that same field.

If you do have a record that incidentally have something that could be normalized in one of its fields, and you want to extract it, intact, after it's been through Pyramid, how would you do it? Will pull reconstruct the record in the correct shape?

I think an expectation of records is that you can create something that is named, doesn't require a special DSL, but also has integrity. I think pushing the normalization straight through them will break the last part of that promise.

Copy link
Owner

@lilactown lilactown Dec 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pull could certainly recursively reconstruct descendants of records, although the type of the result of pulling a record would depend on the user asking for all of the requisite fields of the record in the query else it would have to return a map, just like dissocing a field from a record returns a map.

Perhaps there's a third way - could we create an API that would allow a user to opt in or out of the normalization process per type? I could see this done with a protocol that the record can implement if they choose to opt in, e.g.

(ns pyramid.core)

(defprotocol EntityRecord)

,,,,


(ns foo.bar
  (:require
   [pyramid.core :as p]))

;; opt in to normalization
(defrecord User [id name friends]
  p/EntityRecord
  ,,,)

;; this would not be normalized, nor would any of its fields be normalized,
;; because it does not implement p/EntityRecord
(defrecord SystemOperation [id status])

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opting in to normalization is an intriguing idea. To me, that would be the unsurprising option, that still allows for the functionality.

@phronmophobic
Copy link
Contributor

phronmophobic commented Dec 21, 2021

From a purely conceptual perspective, treating records as anything other than maps seems unidiomatic. If there’s an interface that describes the data, that might be reasonable, but an interface that is very specific to pyramid that is just making data less open and extensible seems like the wrong direction.

Maybe listing some specific use cases that might want this would be helpful.

@eneroth
Copy link
Contributor Author

eneroth commented Jan 8, 2022

From a purely conceptual perspective, treating records as anything other than maps seems unidiomatic.

That would depend on what you mean by "unidiomatic". From Pyramid's perspective, @lilactown would ultimately have to decide what's idiomatic or not.

From Clojure's perspective, I'd argue that it's idiomatic. Records support the interface of maps because we don't want to write a new API every time we create a record. This doesn't mean that they are semantically interchangeable with maps in every circumstance.

Can we safely assume that every record we encounter will not be modelling some type or unit, where there are assumptions as to how the internals of the record looks?

Here's an example. Let's say I'm making a CRDT, and I'm modelling a range:

(defrecord Range [type begin end])

(Range. :bold {:id 0 :target/id 100} {:id 1 :target/id 102})

When normalizing this, Pyramid would effectively convert the record into a new record that looks like this:

(Range. :bold [:id 0] [:id 1])

Is this a good idea? I'm not sure, it depends on the expectations. One thing is for sure: the Range is no longer complying to the conventions I had in mind for them.

Regardless, the two scenarios are:

  1. Pyramid always normalizes records. I have no way of controlling this. If there's a problem, I'll have to fork Pyramid or avoid using it.
  2. Pyramid doesn't normalize the record. Now I have full control, and there's no need to fork/avoid Pyramid. Specifically, if I want Pyramid to normalize the record, I can just convert it to a map. If I don't want it normalized, I just leave it as it is.

In scenario 2, I have both options available to me. In scenario 1, I have no power over what happens.

There's potentially option 3 as well, as described by @lilactown here.

@lilactown
Copy link
Owner

lilactown commented Jan 8, 2022

@eneroth just to be clear, there is already a way to avoid normalizing of any data: you can change the "identify" function that the db is constructed with.

Example:

(require '[pyramid.core :as p])
(require '[pyramid.ident :as ident])

(def db (p/db [] (ident/by-keys :person/id :product/id)))

(defrecord Range [type begin end])

(def db1 (p/add db {:person/id 1 :person/friend {:person/id 2}}))
db1
;; => #:person{:id {2 #:person{:id 2}, 1 #:person{:id 1, :friend [:person/id 2]}}}

(def db2 (p/add db1 {:current-selection (->Range :select {:id 0 :target/id 100} {:id 1 :target/id 102})}))
db2
;; => {:person/id {2 #:person{:id 2}, 1 #:person{:id 1, :friend [:person/id 2]}},
;;     :current-selection
;;     {:type :select, :begin {:id 0, :target/id 100}, :end {:id 1, :target/id 102}}}

You can see here how this will only normalize information about :person/id and :product/id, and ignore :id and :target/id. That may help in your use case, I'm not sure.

With the latest change in master, I would expect that records should have their type preserved (sort of by accident) due to the refactor I did to use cascade.hike/walk for normalization.

I'll think more about opting out on a per-type or per-object basis - checking satisfies? every node in a tree may be expensive...

@phronmophobic
Copy link
Contributor

phronmophobic commented Jan 8, 2022

When normalizing this, Pyramid would effectively convert the record into a new record that looks like this:

Nothing is being "converted". Whether or not you receive (Range. :bold {:id 0 :target/id 100} {:id 1 :target/id 102}) or (Range. :bold [:id 0] [:id 1]) doesn't depend on how the data is stored, but how the data is queried. Which result you get depends on your query and not on whether the entity was stored in a normalized form.

Is this a good idea? I'm not sure, it depends on the expectations. One thing is for sure: the Range is no longer complying to the conventions I had in mind for them.

This exact same argument applies to maps. {:type :bold :begin {:id 0 :target/id 100} :end {:id 1 :target/id 102}} and {:type :bold :begin [:id 0] :end [:id 1]} are also not interchangeable.

The main difference between normalizing and not normalizing is that if you don't normalize then 1) updating the entity with :id 0 would not update all instances of entities with :id 0 and 2) you can't run queries on the entities referenced by :start and :end without knowing which Range they belong to.

I agree that figuring out how to have queries that allow shallow denormalization or deep denormalization is a tough question. I don't see how normalization depends on records when all the same arguments also apply to maps.

Regardless, the two scenarios are:

Pyramid always normalizes records. I have no way of controlling this. If there's a problem, I'll have to fork Pyramid or avoid using it.
Pyramid doesn't normalize the record. Now I have full control, and there's no need to fork/avoid Pyramid. Specifically, if I want Pyramid to normalize the record, I can just convert it to a map. If I don't want it normalized, I just leave it as it is.

There are other options where whether or not a fully denormalized result is returned has nothing to do with whether or not a record was added to the db.

Just to show how other dbs address this problem, asami has a nested? flag when querying entities and replacement annotations for transactions. Datomic uses a schema to make this decision.

@eneroth
Copy link
Contributor Author

eneroth commented Jan 8, 2022

Nothing is being "converted". Whether or not you receive (Range. :bold {:id 0 :target/id 100} {:id 1 :target/id 102}) or (Range. :bold [:id 0] [:id 1]) doesn't depend on how the data is stored, but how the data is queried. Which result you get depends on your query and not on whether the entity was stored in a normalized form.

I don't think that is correct, normalization happens when you're constructing or adding to the DB, and denormalization when you query it.

This exact same argument applies to maps.

True, but only relevant if the basic assumption is that maps and records are equal in all aspects and use cases. I understand that your perspective is that they are, but please understand that my perspective is that they are not.

(def entity
  (Range. :bold {:id 0 :target/id 100} {:id 1 :target/id 102}))

(= entity (into {} entity))
;; => false

99.99997% of the time we use maps (for good reason), and the rest of the time there's some use case where defrecord or deftype happens to make sense. My observation is that in practice, this happens when we want to make something "object-like."

I think the main point of view here boils down to a question of record duck-typing: if it has the interface of a map, should it in all cases be treated like a map?

Consider the original example where this started: I switched from using goog.URI and java.net.URI to https://github.com/lambdaisland/uri. To me it feels like an implementation detail that one uses records, and the others use coincidentally more opaque data types. The intention is the same: model what's effectively a value. That there should be a (potential) difference in the query I make to get it out of the DB seems odd from this perspective.

And yes, not being normalized, I couldn't expect the power that comes from normalization to apply here. Do I want all URIs to change database-wide, because I updated one of them? To me that's equivalent of asking if I want all instances of the number 9 to update when I update one instance. It comes down to what you consider to have boundaries and integrity or not. My contention is that,

  1. You can't know which interpretation is correct with records (is it intended to be structure-like or object-like?), but
  2. Records are rare and used in special circumstances, such as modelling the URI above, most of the time we're going to deal with maps, and therefore
  3. Defaulting to treating them as maps in all circumstances is not a good idea.

@phronmophobic
Copy link
Contributor

phronmophobic commented Jan 10, 2022

I think the main point of view here boils down to a question of record duck-typing: if it has the interface of a map, should it in all cases be treated like a map?

Records aren't just map-like. Records are maps.

(defrecord Foo [])
(map? (->Foo)) ;; true

This isn't even duck-typing. Duck typing means that you can implicitly or incidentally conform to an interface if you implement methods with the same name in a shared, global namespace. Records explicitly and intentionally implement the map interface.

True, but only relevant if the basic assumption is that maps and records are equal in all aspects and use cases.

I do not assume this because it is not true. The different map implementations(eg. hash, array, record, etc) do differ. Some differences include performance and, in some cases, behavior. However, they all adhere to the map interface.

My observation is that in practice, this happens when we want to make something "object-like."

Records are "object-like" in the sense that they directly support polymorphism, but they are different from objects in most OO languages since they are immutable. It's true that a Record can't always be replaced with a map of the same keys and values, but that's also true for maps! Since maps can implement protocols via metadata, a map also can't be substituted for another map with different metadata in all cases.

Generally, Clojure is written in terms of abstractions.

  1. Do you think Pyramid should program against abstractions?
  2. If not, which concretions should it use and why?
  3. If you do think Pyramid should program against abstractions, do you think it should program against the map interface? I think it should.
  4. If not against the map interface, then which interface?

Do I want all URIs to change database-wide, because I updated one of them?

Why would that happen? If they refer to the same URI entity, then yes, they should change. If they refer to different URI entities, then only the entities that are the same should update.


You can't know which interpretation is correct with records (is it intended to be structure-like or object-like?), but

Why not? There are multiple options for achieving this.

Pyramid currently makes a default conventional assumption: your entities are identified by a keyword whose name is "id", e.g. :id, :person/id, :my.corp.product/id, etc.

For eample, Pyramid already has a convention for determining identity. For any record, you could simply assoc a keyword whose name is "id".


Having said that, I think there might be good reasons for not treating records the same as maps. Since records are relatively rare and one of the goals for Pyramid is read performance, it's totally reasonable to treat maps and records differently to meet those goals. It also might make sense to treat records opaquely just to simplify the codebase. However, treating maps differently from records because "records aren't maps" doesn't make sense because records are, in fact, maps. I think it's also totally plausible that it might actually simplify the code base and provide more leverage to utilize the fact that records are maps.

@eneroth
Copy link
Contributor Author

eneroth commented Feb 11, 2022

Hello!

Sorry for the long bout of silence; I finally got that covid I've heard so much about and have been out of commission for a bit.

In the meantime, there seems to have been some changes in the main that has unblocked the main problem we had: lambdaisland/uri being butchered on normalization.

I think we've reached a point for pros and cons have been clearly spelled out, and I don't think we need to delve any further into this subject.

Thanks for the discussion, and feel free to reopen if I'm wrong.

@eneroth eneroth closed this Feb 11, 2022
@eneroth eneroth deleted the fix-records branch February 11, 2022 09:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants