Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add sql schema and entity class for ClaimsList #227

Merged
merged 1 commit into from Aug 26, 2019

Conversation

@hstonec
Copy link
Collaborator

commented Aug 14, 2019

This change is Reviewable

@googlebot googlebot added the cla: yes label Aug 14, 2019

@hstonec hstonec force-pushed the hstonec:add-claimslist-schema branch from 3b6c796 to 8274953 Aug 14, 2019

@hstonec
Copy link
Collaborator Author

left a comment

Tested this schema as well as the entity against a local Postgresql instance, and verified that both claims_list and claims_entry tables can be read/written as expected.

Reviewable status: 0 of 2 files reviewed, all discussions resolved

@hstonec hstonec requested review from CydeWeys and weiminyu Aug 14, 2019

@weiminyu
Copy link
Collaborator

left a comment

Reviewable status: 0 of 2 files reviewed, 4 unresolved discussions (waiting on @CydeWeys, @hstonec, and @weiminyu)

a discussion (no related file):
Just a heads-up, most likely we will move entity classes to a new sub project.

Also, please add a TODO for unit tests.



core/src/main/java/google/registry/model/tmch/ClaimsList.java, line 33 at r1 (raw file):

/** A list of TMCH claims labels and their associated claims keys. */
@Entity
@Table(name = "claims_list")

Let's apply a naming strategy for all tables and fields. I think it is okay to be hibernate-dependent in this case.

Also, let's use quoted CamelCase for tables. Having different patterns for tables and fields improves readability.
Reasoning can be found here: https://github.com/weiminyu/cloud-sql-client/blob/master/subprojects/demo_schema/src/main/java/demoschema/hibernate/NomulusNamingStrategy.java


core/src/main/java/google/registry/model/tmch/ClaimsList.java, line 34 at r1 (raw file):

@Entity
@Table(name = "claims_list")
public class ClaimsList {

The javadoc of every entity should explain how to de-duplicate when a write result is uncertain, e.g.., after DB timeout.
Possible strategies include:

  • No retry. In this case we need to explain why
  • PK or unique constraint will detect duplicate during retry.
  • One or more properties we know before the write uniquely identifies the entity, allowing us to verify if previous write has succeeded.

In this case, the PK, revisionId is autogenerated and not known to us if failure happens. We need to document this.


core/src/main/java/google/registry/model/tmch/ClaimsList.java, line 66 at r1 (raw file):

Optional<Long>

Why optional? Would it be better to call checkState(revisionId != null) then return long?
User should not call this method unless write succeeds.

@CydeWeys
Copy link
Member

left a comment

Reviewable status: 0 of 2 files reviewed, 8 unresolved discussions (waiting on @CydeWeys, @hstonec, and @weiminyu)


core/src/main/java/google/registry/model/tmch/ClaimsList.java, line 33 at r1 (raw file):

Previously, weiminyu (Weimin Yu) wrote…

Let's apply a naming strategy for all tables and fields. I think it is okay to be hibernate-dependent in this case.

Also, let's use quoted CamelCase for tables. Having different patterns for tables and fields improves readability.
Reasoning can be found here: https://github.com/weiminyu/cloud-sql-client/blob/master/subprojects/demo_schema/src/main/java/demoschema/hibernate/NomulusNamingStrategy.java

Agreed with PascalCase for table name.

Does it even have to be quoted? Does it matter so much if, behind the scenes, PostgreSQL is treating it as lower-case?


core/src/main/java/google/registry/model/tmch/ClaimsList.java, line 34 at r1 (raw file):

Previously, weiminyu (Weimin Yu) wrote…

The javadoc of every entity should explain how to de-duplicate when a write result is uncertain, e.g.., after DB timeout.
Possible strategies include:

  • No retry. In this case we need to explain why
  • PK or unique constraint will detect duplicate during retry.
  • One or more properties we know before the write uniquely identifies the entity, allowing us to verify if previous write has succeeded.

In this case, the PK, revisionId is autogenerated and not known to us if failure happens. We need to document this.

Oooof. This strikes me as an extra level of annoyance that we haven't had to deal with for Datastore? How much of a problem is this in actuality?

Fundamentally there's a problem here we can't easily solve, right? If the DB connection goes down, and we were writing something with an autogenerated ID, and then that GAE instance that had the ID in memory goes down (i.e. the only thing that knew exactly what we were trying to write) ... what is really the solution here?

That's the global issues anyway. For ClaimsList in particular, it doesn't matter. The most recent ID (i.e. largest) is always the one that is used, so the worst that could happen is the cron job re-runs it immediately and writes out another version immediately following the previous one that unexpectedly succeeded. We could make this more efficient by checking the time stamp of the current highest revision and abort writing a new one if it's recent enough, i.e. within the past couple hours. So long as the claims list and all its entities succeed or fail writing transactionally together, then we can't introduce any real problems.


core/src/main/java/google/registry/model/tmch/ClaimsList.java, line 36 at r1 (raw file):

public class ClaimsList {
  @Id
  @GeneratedValue(strategy = GenerationType.IDENTITY)

We should come to some agreement on GenerationType.IDENTITY vs GenerationType.SEQUENCE and stick with one globally across our codebase (unless there are good reasons not to). Why is IDENTITY preferable to SEQUENCE for us?


core/src/main/java/google/registry/model/tmch/ClaimsList.java, line 41 at r1 (raw file):

  @Column(nullable = false)
  private ZonedDateTime creationTimestamp;

Is ZonedDateTime the right way to go? It looks like it may have some gotchas? https://thoughts-on-java.org/hibernate-tips-whats-the-best-way-to-persist-a-zoneddatetime

I don't exactly understand the trade-offs with all of the alternatives, but it sounds like, at a minimum, we need to make sure that the connection's time zone is set to UTC. Anything else?

Also, how do we convert back and forth between ZonedDateTimes and the Joda DateTimes we use throughout the rest of the codebase? Is it possible to stick with Joda DateTimes for consistency and use some sort of DB translator? I'd rather continue to use the same type in the models, and convert it once everywhere with a translator, than have to convert between them constantly in various service wrappers between DB objects and the rest of the codebase.


core/src/main/java/google/registry/model/tmch/ClaimsList.java, line 66 at r1 (raw file):

Previously, weiminyu (Weimin Yu) wrote…
Optional<Long>

Why optional? Would it be better to call checkState(revisionId != null) then return long?
User should not call this method unless write succeeds.

What does an absent value mean, that the claims list exists in memory only and hasn't been loaded from the DB?


core/src/main/resources/google/registry/database/init.sql, line 14 at r1 (raw file):

-- See the License for the specific language governing permissions and
-- limitations under the License.

We'll also need the deltas for this, but we haven't figured out exactly how we'll do those yet and it's not necessary to include those in this PR.


core/src/main/resources/google/registry/database/init.sql, line 17 at r1 (raw file):

create table claims_list (
  revision_id bigserial not null,
  creation_timestamp timestamp not null,

'timestamp' is implicitly a type without timezone information. Do we want to use a type with timestamp information, and always set it explicitly to UTC? That seems clearer.

@CydeWeys
Copy link
Member

left a comment

:lgtm: modulo remaining pending comments.

Reviewable status: 0 of 2 files reviewed, 8 unresolved discussions (waiting on @hstonec and @weiminyu)

@hstonec hstonec force-pushed the hstonec:add-claimslist-schema branch 2 times, most recently from e945d12 to 541baf4 Aug 20, 2019

@hstonec
Copy link
Collaborator Author

left a comment

Reviewable status: 0 of 3 files reviewed, 7 unresolved discussions (waiting on @CydeWeys and @weiminyu)

a discussion (no related file):

Previously, weiminyu (Weimin Yu) wrote…

Just a heads-up, most likely we will move entity classes to a new sub project.

Also, please add a TODO for unit tests.

We had an agreement that 1) sql file will be stored in a dedicated gradle subproject. 2) Java entity file still stays in core but in another package.



core/src/main/java/google/registry/model/tmch/ClaimsList.java, line 33 at r1 (raw file):

Previously, CydeWeys (Ben McIlwain) wrote…

Agreed with PascalCase for table name.

Does it even have to be quoted? Does it matter so much if, behind the scenes, PostgreSQL is treating it as lower-case?

Weimin: After PR #230 is pushed, we can add the naming strategy.

Ben: I guess we can tune that part in the naming strategy.


core/src/main/java/google/registry/model/tmch/ClaimsList.java, line 34 at r1 (raw file):

Previously, CydeWeys (Ben McIlwain) wrote…

Oooof. This strikes me as an extra level of annoyance that we haven't had to deal with for Datastore? How much of a problem is this in actuality?

Fundamentally there's a problem here we can't easily solve, right? If the DB connection goes down, and we were writing something with an autogenerated ID, and then that GAE instance that had the ID in memory goes down (i.e. the only thing that knew exactly what we were trying to write) ... what is really the solution here?

That's the global issues anyway. For ClaimsList in particular, it doesn't matter. The most recent ID (i.e. largest) is always the one that is used, so the worst that could happen is the cron job re-runs it immediately and writes out another version immediately following the previous one that unexpectedly succeeded. We could make this more efficient by checking the time stamp of the current highest revision and abort writing a new one if it's recent enough, i.e. within the past couple hours. So long as the claims list and all its entities succeed or fail writing transactionally together, then we can't introduce any real problems.

I added a javadoc to explain the scenario we discussed here.


core/src/main/java/google/registry/model/tmch/ClaimsList.java, line 36 at r1 (raw file):

Previously, CydeWeys (Ben McIlwain) wrote…

We should come to some agreement on GenerationType.IDENTITY vs GenerationType.SEQUENCE and stick with one globally across our codebase (unless there are good reasons not to). Why is IDENTITY preferable to SEQUENCE for us?

It depends on where we want to handle the auto-increment. If we use GenerationType.IDENTITY, the generation will be delegated to PostgreSQL, so the schema generation tool will choose BIGSERIAL as the type of this field. If we use GenerationType.SEQUENCE instead, we actually leverage Hibernate to manage the auto-increment.


core/src/main/java/google/registry/model/tmch/ClaimsList.java, line 41 at r1 (raw file):

Previously, CydeWeys (Ben McIlwain) wrote…

Is ZonedDateTime the right way to go? It looks like it may have some gotchas? https://thoughts-on-java.org/hibernate-tips-whats-the-best-way-to-persist-a-zoneddatetime

I don't exactly understand the trade-offs with all of the alternatives, but it sounds like, at a minimum, we need to make sure that the connection's time zone is set to UTC. Anything else?

Also, how do we convert back and forth between ZonedDateTimes and the Joda DateTimes we use throughout the rest of the codebase? Is it possible to stick with Joda DateTimes for consistency and use some sort of DB translator? I'd rather continue to use the same type in the models, and convert it once everywhere with a translator, than have to convert between them constantly in various service wrappers between DB objects and the rest of the codebase.

I think in general we may want to replace Joda time with Java time eventually, but that can be separated from this database migration.

So, I think we can continue using Jodatime with Hibernate by introducing the official Joda-Time-Hibernate package. I will bring in the package in the next PR and add some unit test there.


core/src/main/java/google/registry/model/tmch/ClaimsList.java, line 66 at r1 (raw file):

Previously, CydeWeys (Ben McIlwain) wrote…

What does an absent value mean, that the claims list exists in memory only and hasn't been loaded from the DB?

Ben: your understanding is correct.

I think I will do what Weimin suggested.


core/src/main/resources/google/registry/database/init.sql, line 17 at r1 (raw file):

Previously, CydeWeys (Ben McIlwain) wrote…

'timestamp' is implicitly a type without timezone information. Do we want to use a type with timestamp information, and always set it explicitly to UTC? That seems clearer.

Yeah, changed to use timestamptz instead.

@CydeWeys
Copy link
Member

left a comment

Reviewable status: 0 of 3 files reviewed, 7 unresolved discussions (waiting on @CydeWeys, @hstonec, and @weiminyu)


core/src/main/java/google/registry/model/tmch/ClaimsList.java, line 41 at r1 (raw file):

Previously, hstonec (Shicong Huang) wrote…

I think in general we may want to replace Joda time with Java time eventually, but that can be separated from this database migration.

So, I think we can continue using Jodatime with Hibernate by introducing the official Joda-Time-Hibernate package. I will bring in the package in the next PR and add some unit test there.

It makes to do it now as part of the migration. Use java.util.DateTime for the new Cloud SQL stuff, convert it in the model code, and then eventually, once the migration is complete, we can change the rest of the code to use java.util.DateTime, without needing to perform any data migrations.

@hstonec
Copy link
Collaborator Author

left a comment

Reviewable status: 0 of 3 files reviewed, 7 unresolved discussions (waiting on @CydeWeys and @weiminyu)


core/src/main/java/google/registry/model/tmch/ClaimsList.java, line 41 at r1 (raw file):

Previously, CydeWeys (Ben McIlwain) wrote…

It makes to do it now as part of the migration. Use java.util.DateTime for the new Cloud SQL stuff, convert it in the model code, and then eventually, once the migration is complete, we can change the rest of the code to use java.util.DateTime, without needing to perform any data migrations.

You mean java.util.Date? There is no java.util.DateTime. If so, it is a very old date in java and many methods of it have been deprecated.

@weiminyu
Copy link
Collaborator

left a comment

Reviewable status: 0 of 3 files reviewed, 8 unresolved discussions (waiting on @CydeWeys, @hstonec, and @weiminyu)


core/src/main/java/google/registry/model/tmch/ClaimsList.java, line 34 at r1 (raw file):

Previously, hstonec (Shicong Huang) wrote…

I added a javadoc to explain the scenario we discussed here.

Ben: yes, the severity depends on the use case. If duplicates cannot be tolerated, we should probably not use autogen in PK. Instead, we can get an id from a sequence and
assign it to the entity manually.


core/src/main/java/google/registry/model/tmch/ClaimsList.java, line 36 at r1 (raw file):

Previously, hstonec (Shicong Huang) wrote…

It depends on where we want to handle the auto-increment. If we use GenerationType.IDENTITY, the generation will be delegated to PostgreSQL, so the schema generation tool will choose BIGSERIAL as the type of this field. If we use GenerationType.SEQUENCE instead, we actually leverage Hibernate to manage the auto-increment.

If we use GenerationType.SEQUENCE, Hibernate will create a sequence generator (can be configured to use shared or dedicated sequence) in the database.
Aside from the extra schema elements, there is also a performance question: do we need cache and batch read for sequence ids.


core/src/main/java/google/registry/schema/tmch/ClaimsList.java, line 78 at r2 (raw file):

checkNotNull(revisionId);

checkNotNull is a special form of checkArgument. checkState() is better.
Also, it is a practice to given a reason to all checkXXX calls.

@hstonec hstonec force-pushed the hstonec:add-claimslist-schema branch from 541baf4 to b39d97c Aug 26, 2019

@hstonec
Copy link
Collaborator Author

left a comment

Reviewable status: 0 of 3 files reviewed, 8 unresolved discussions (waiting on @CydeWeys, @hstonec, and @weiminyu)


core/src/main/java/google/registry/schema/tmch/ClaimsList.java, line 78 at r2 (raw file):

Previously, weiminyu (Weimin Yu) wrote…
checkNotNull(revisionId);

checkNotNull is a special form of checkArgument. checkState() is better.
Also, it is a practice to given a reason to all checkXXX calls.

Done.

@CydeWeys
Copy link
Member

left a comment

Reviewable status: 0 of 3 files reviewed, 8 unresolved discussions (waiting on @CydeWeys, @hstonec, and @weiminyu)


core/src/main/java/google/registry/model/tmch/ClaimsList.java, line 41 at r1 (raw file):

Previously, hstonec (Shicong Huang) wrote…

You mean java.util.Date? There is no java.util.DateTime. If so, it is a very old date in java and many methods of it have been deprecated.

Yes

@CydeWeys
Copy link
Member

left a comment

Reviewable status: 0 of 3 files reviewed, 8 unresolved discussions (waiting on @CydeWeys, @hstonec, and @weiminyu)


core/src/main/java/google/registry/schema/tmch/ClaimsList.java, line 78 at r2 (raw file):

Previously, hstonec (Shicong Huang) wrote…

Done.

There's also different exceptions thrown for each type.

IllegalStateException is the most fitting here.

@CydeWeys
Copy link
Member

left a comment

Reviewable status: 0 of 3 files reviewed, 9 unresolved discussions (waiting on @CydeWeys, @hstonec, and @weiminyu)


core/src/main/java/google/registry/schema/tmch/ClaimsList.java, line 77 at r3 (raw file):

  public Long getRevisionId() {
    checkState(
        revisionId != null, "revisionId is null because it is not persisted in the database.");

No period at end of short exception messages

@hstonec hstonec force-pushed the hstonec:add-claimslist-schema branch from b39d97c to e0a89d3 Aug 26, 2019

@hstonec
Copy link
Collaborator Author

left a comment

Reviewable status: 0 of 3 files reviewed, 9 unresolved discussions (waiting on @CydeWeys, @hstonec, and @weiminyu)


core/src/main/java/google/registry/model/tmch/ClaimsList.java, line 41 at r1 (raw file):

Previously, CydeWeys (Ben McIlwain) wrote…

Yes

Discussed this offline, we decided to use java.time.ZonedDateTime instead.


core/src/main/java/google/registry/schema/tmch/ClaimsList.java, line 77 at r3 (raw file):

Previously, CydeWeys (Ben McIlwain) wrote…

No period at end of short exception messages

Done.

@hstonec
Copy link
Collaborator Author

left a comment

Reviewable status: 0 of 3 files reviewed, 9 unresolved discussions (waiting on @CydeWeys and @weiminyu)


core/src/main/java/google/registry/model/tmch/ClaimsList.java, line 34 at r1 (raw file):

Previously, weiminyu (Weimin Yu) wrote…

Ben: yes, the severity depends on the use case. If duplicates cannot be tolerated, we should probably not use autogen in PK. Instead, we can get an id from a sequence and
assign it to the entity manually.

Done. We can continue the discussion in the implementation doc.


core/src/main/java/google/registry/model/tmch/ClaimsList.java, line 36 at r1 (raw file):

Previously, weiminyu (Weimin Yu) wrote…

If we use GenerationType.SEQUENCE, Hibernate will create a sequence generator (can be configured to use shared or dedicated sequence) in the database.
Aside from the extra schema elements, there is also a performance question: do we need cache and batch read for sequence ids.

Done. We can continue the discussion in the implementation doc.

@hstonec
Copy link
Collaborator Author

left a comment

Dismissed @weiminyu from a discussion.
Reviewable status: 0 of 3 files reviewed, 9 unresolved discussions (waiting on @CydeWeys and @weiminyu)

@hstonec
Copy link
Collaborator Author

left a comment

Dismissed @weiminyu from a discussion.
Reviewable status: 0 of 3 files reviewed, 9 unresolved discussions (waiting on @CydeWeys and @weiminyu)

@CydeWeys
Copy link
Member

left a comment

:lgtm:

Reviewable status: 0 of 3 files reviewed, 9 unresolved discussions (waiting on @CydeWeys and @weiminyu)

@hstonec
Copy link
Collaborator Author

left a comment

Reviewable status: 0 of 3 files reviewed, 4 unresolved discussions (waiting on @weiminyu)


core/src/main/java/google/registry/model/tmch/ClaimsList.java, line 66 at r1 (raw file):

Previously, hstonec (Shicong Huang) wrote…

Ben: your understanding is correct.

I think I will do what Weimin suggested.

Done.

We can continue the discussion in the implementation details doc.

@hstonec hstonec merged commit dcceb0d into google:master Aug 26, 2019

4 of 7 checks passed

code-review/reviewable 3 files, 4 discussions left (weiminyu)
Details
LGTM analysis: JavaScript No code changes detected
Details
LGTM analysis: Python No code changes detected
Details
LGTM analysis: Java No new or fixed alerts
Details
cla/google All necessary CLAs are signed
kokoro-foss Kokoro build finished
Details
kokoro-internal Kokoro build finished
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.