Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: cache timezone in Prepared Statement and Result Set #588

Closed
wants to merge 13 commits into from

Conversation

@Chrriis
Copy link
Contributor

@Chrriis Chrriis commented Jun 20, 2016

This is to fix issue #508
I don't know all the details, but I think the same timezeone could be used for the lifetime of a prepared statement between the time parameters are added until they are executed. So I lazy create a calendar at first setDate/Time/Timestamp(xx) which I clear at executeBatch().

@codecov-io
Copy link

@codecov-io codecov-io commented Jun 20, 2016

Current coverage is 58.18%

Merging #588 into master will increase coverage by 0.05%

@@             master       #588   diff @@
==========================================
  Files           148        148          
  Lines         15549      15580    +31   
  Methods           0          0          
  Messages          0          0          
  Branches       3078       3084     +6   
==========================================
+ Hits           9040       9066    +26   
- Misses         5288       5291     +3   
- Partials       1221       1223     +2   

Powered by Codecov. Last updated by 893c1a4...fd88292

Loading

@vlsi
Copy link
Member

@vlsi vlsi commented Jun 20, 2016

I would rather propose to cache time zone at the first addBatch call. An assumption of "timestamp does not change in the middle of the batch" seems sensible.

Loading

@Chrriis
Copy link
Contributor Author

@Chrriis Chrriis commented Jun 20, 2016

I am not sure I follow your suggestion. If you have 100 setDate(xx) calls before addBatch(), which timezone are you going to use? Because current code resolves the timezone to prepare the object when setDate(xx) is called. I think my approach is a good compromise, but I am all for a better approach of course.

Loading

@vlsi
Copy link
Member

@vlsi vlsi commented Jun 20, 2016

because current code resolves the timezone to prepare the object when setDate(xx) is called

"there are two hard things in computer science"

Well, an assumption of "the default time zone will be the same for all bind variables of a given statement" makes sense.

However, we need to reset the cache in all relevant places, not just executeBatch: execute/executeUpdate/clearParameters/etc/etc

Loading

@Chrriis
Copy link
Contributor Author

@Chrriis Chrriis commented Jun 21, 2016

@vlsi Fair enough, though I was more trying to show the idea than the complete implementation :)
About clearParameters, it acts like a setNull(xx) on all parameters, and manually performing the setNull(xx) is not going to clear the cache. Thus, to be coherent, I am not sure we should clear the cache there.
Do you like it better? Is there any other place that would make sense?

Loading

@vlsi
Copy link
Member

@vlsi vlsi commented Jun 21, 2016

About clearParameters, it acts like a setNull(xx) on all parameters,

There's a difference: clearParameters unsets the parameter values. For instance, if you call clearParameters, then execute you'll end up with "No value specified for parameter 1" error.

I believe we'd clear TZ cache on clearParameters that is outside of batch statement.

Loading

@Chrriis
Copy link
Contributor Author

@Chrriis Chrriis commented Jun 21, 2016

I believe we'd clear TZ cache on clearParameters that is outside of batch statement.

In my code, the cache is initialized when you are within a batch statement (lifecycle is between first addX(xx) until executeX()), so it is already cleared when outside of a batch statement. Or maybe I misunderstand what you are trying to say.

Loading

@vlsi
Copy link
Member

@vlsi vlsi commented Jun 21, 2016

In my code, the cache is initialized when you are within a batch statement (lifecycle is between first addX(xx) until executeX()), so it is already cleared when outside of a batch statement.

Initially I thought "addBatch - executeBatch" is the least unsurprising we can offer.
Later I realized that TZ caching right after first timestamp bind till any execute makes sense as well.

That's a subtle difference for single-row statements that bind lots of timestamp columns.

Loading

@vlsi
Copy link
Member

@vlsi vlsi commented Jun 21, 2016

@Chrriis , I think in general the idea is good.
@davecramer , @ringerc any ideas / objections on caching timezone values in-between setTimestamp ... execute calls?

However, in order to merge the feature in, we would require tests: positive (that the TZ indeed gets cached) and negative (that after failures or executions the cache gets invalidated).

Loading

@Chrriis
Copy link
Contributor Author

@Chrriis Chrriis commented Jun 21, 2016

Would a similar approach work for PgResultSet, so that getDate/Time/Timestamp(int) would initialize a cached calendar? I am not sure if it should be cleared under some conditions: the timezone could remain for the lifetime of the result set.

Loading

@vlsi
Copy link
Member

@vlsi vlsi commented Jun 21, 2016

would initialize a cached calendar?

I think something like that makes sense. Note: instead of new GregorianCalendar() I would rather favor Calendar reuse and use calendar.setTimeZone(TimeZone)

Loading

@Chrriis
Copy link
Contributor Author

@Chrriis Chrriis commented Jun 21, 2016

I would rather favor Calendar reuse

What about adding this method to TimestampUtils:
public Calendar getSharedCalendar() {return setupCalendar(null);}

Loading

@vlsi
Copy link
Member

@vlsi vlsi commented Jun 21, 2016

public Calendar getSharedCalendar() {return setupCalendar(null);}

Not sure if that would play well since the calendar that is stored in the org.postgresql.jdbc.TimestampUtils#calendarWithUserTz can be reused and altered by setTimestamp(int, Timestamp, Calendar) kind of calls.

Loading

@Chrriis
Copy link
Contributor Author

@Chrriis Chrriis commented Jun 21, 2016

Not sure if that would play well since the calendar that is stored (...) can be reused and altered

You were right, so instead I store the timezone which I apply on the shared calendar. The TimezoneTest test was useful. I added TimezoneCachingTest to check how the timeZone field behaves (I just realized I might have to rename "Calendar" to "TimeZone" in the new test class).
Let me know what you think!

Loading

@Chrriis Chrriis changed the title fix: use naive approach to solve timezone overhead fix: cache timezone in Prepared Statement and Result Set Jun 21, 2016
@Chrriis
Copy link
Contributor Author

@Chrriis Chrriis commented Jun 21, 2016

After solving the "cache invalidation" issue, I had to solve the "naming things" one :)

Loading


private TimeZone getTimeZoneCache(Object stmt) {
try {
Field defaultTimeZoneField = stmt.getClass().getDeclaredField("defaultTimeZone");
Copy link
Member

@vlsi vlsi Jun 21, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if the test can be written in terms of TimeZone.setDefault.
In other words, if the test could examine how pgjdbc handles changes TimeZone.setDefault rather than exploring how it stores the cache internally.

Loading

Copy link
Contributor Author

@Chrriis Chrriis Jun 21, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For PreparedStatement and ResultSet, I have 2 different tests: testXXCachedTimezoneInstance and testXXCachedTimezoneUsage. I think what you wish for is the second kind. There is also the existing "TimezoneTest" that cross checks other aspects.

Loading

@vlsi
Copy link
Member

@vlsi vlsi commented Jun 21, 2016

@Chrriis , can you please add a test that explores setTimestamp(int, Timestamp, Calendar) as well?

For instance:

ps.setTimestamp(int, Timestamp);
ps.setTimestamp(int, Timestamp, UTC);
ps.setTimestamp(int, Timestamp);
ps.setTimestamp(int, Timestamp, GMT_PLUS_1);
ps.setTimestamp(int, Timestamp);

Afaik, your shared calendar would get overwritten with UTC then with GMT_PLUS_1

Loading

@Chrriis
Copy link
Contributor Author

@Chrriis Chrriis commented Jun 21, 2016

Afaik, your shared calendar would get overwritten with UTC then with GMT_PLUS_1

@vlsi no. When you specify a timezone (like GMT_PLUS_1), you do not want to change the default timezone, so the parameterless call that follows should use the previously cached timezone and not an overwritten one.

Loading

@vlsi
Copy link
Member

@vlsi vlsi commented Jun 21, 2016

@Chrriis please point at the test that validates setTimestamp(..., Calendar) vs setTimestamp(Timestamp) intermixed usage.

Loading

@Chrriis
Copy link
Contributor Author

@Chrriis Chrriis commented Jun 21, 2016

@vlsi I am working on that test :) But I was correcting the statement that the last call with a calendar should change the default timezone for future calls.

Loading

@vlsi
Copy link
Member

@vlsi vlsi commented Jun 21, 2016

I was correcting the statement that the last call with a calendar should change the default timezone for future calls.

That's debatable. I've no idea how intermixed cases should work, though they should be tested explicitly.

PS. as you write tests, please prefer smaller ones (less asserts per test method), so it is easier to pick up what the test does & rerun a specific test.

Loading

@Chrriis
Copy link
Contributor Author

@Chrriis Chrriis commented Jun 21, 2016

That's debatable.

Without cache, I don't think the current code retains the last value that was passed to use as the default for future call. This allows to request default environment time zone as well as specific time zones.

as you write tests, please prefer smaller ones

I would if the nature of the test was less about potential problems with sequences. It is very similar to the TimezoneTest in that sense. But feel free to split if you don't like it and think of a better structure.

Loading

rs.next();
rs.getInt(1);
TimeZone.setDefault(tz2);
assertTrue("Timestamps sould NOT be in same time zone", !rs.getTimestamp(2).equals(ts));
Copy link
Member

@vlsi vlsi Jun 21, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is "happy debugging" test anti-pattern.

Please convert assertTrue to assertEquals as described here: https://github.com/pgjdbc/pgjdbc#test

"Timestamps should NOT be in same time zone" vs "timestamp was passed as with TZ1 (default at time of execute) and default zone was changed to TZ2, thus the resulting value should be ...."
Well, it takes time to produce those messages, however they are extremely valuable.
E.g. https://github.com/pgjdbc/pgjdbc/pull/580/files#diff-efc8c3eee49c38f1557353bff58b7e94R1238

I mean the following: either use "why" kind of messages, or remove the messages and just use assertEquals. Of course, whys are better.

Loading

Copy link
Contributor Author

@Chrriis Chrriis Jun 21, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you of course. I only had limited free time hence the quick line duplication of validation statements between use cases. I will review that when I get more free time.
Note that I would love to know whether the timezone caching feature is using an approach we all agree on or if this code is going to be thrown to the bin... preferably before I get to work on it again :)

Loading

Copy link
Member

@vlsi vlsi Jun 21, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that I would love to know whether the timezone caching feature is using an approach we all agree

In general, the approach is fine.

Loading

@vlsi
Copy link
Member

@vlsi vlsi commented Jun 22, 2016

@Chrriis , you are going to replace assertTrue with assertEquals, aren't you?

Loading

@Chrriis
Copy link
Contributor Author

@Chrriis Chrriis commented Jun 23, 2016

you are going to replace assertTrue with assertEquals, aren't you?

Er, hmm, yes! Of course! :)

Loading

"Cache never initialized: must be null",
getTimeZoneCache(pstmt) == null);
getTimeZoneCache(pstmt), null);
Copy link
Member

@vlsi vlsi Jun 23, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please flip arguments around.
The proper order is assertEquals(String message, Object expected, Object actual)

Loading

!rs.getTimestamp(2, c2).equals(ts));
assertTrue(
"Explicit tz2 calendar, so timestamps must be equal",
rs.getTimestamp(2, c1).equals(ts));
Copy link
Member

@vlsi vlsi Jun 23, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you please dodge those assertTrues as well?

Loading

Copy link
Contributor Author

@Chrriis Chrriis Jun 23, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about the !rs.getTimestamp(2, c2).equals(ts)? How would you convert that?

Loading

Copy link
Member

@vlsi vlsi Jun 23, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use assertEquals, and know which outcome is expected.
At the end of the day, it is a test case in a controlled environment, so it should always return some known-ahead result.

Loading

@Override
protected void updateProperties(Properties props) {
props.setProperty(PGProperty.REWRITE_BATCHED_INSERTS.getName(),
Boolean.TRUE.toString());
Copy link
Member

@vlsi vlsi Jun 23, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it required for the test? Does it explore something different in rewrite=true mode?

Loading


/* Generally recommended with batch updates. By default we run all
tests in this test case with autoCommit disabled. */
con.setAutoCommit(false);
Copy link
Member

@vlsi vlsi Jun 23, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove that

Loading

@@ -943,7 +956,7 @@ private ParsedBinaryTimestamp toParsedTimestampBin(TimeZone tz, byte[] bytes, bo
* @return The parsed local date time object.
* @throws PSQLException If binary format could not be parsed.
*/
public LocalDateTime toLocalDateTimeBin(byte[] bytes) throws PSQLException {
public LocalDateTime toLocalDateTimeBin(TimeZone tz, byte[] bytes) throws PSQLException {
Copy link
Member

@vlsi vlsi Jun 23, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change required?

Loading

Copy link
Contributor Author

@Chrriis Chrriis Jun 23, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but you are right that something is incorrect: the new tz parameter should be passed to toParsedTimestampBin.

Loading

@Chrriis
Copy link
Contributor Author

@Chrriis Chrriis commented Jun 23, 2016

@vlsi what do you think of PgResultSet:3260? Should that Calendar use the default time zone?

Loading

@Chrriis
Copy link
Contributor Author

@Chrriis Chrriis commented Jun 23, 2016

@vlsi We need to clarify if my approach in the result set is correct. PgResultSet:3260 is one example where it is not clear as well as the new timezone parameter I pass in PgResultSet:613.

I am sleep deprived so I think it is a sane decision for me to stop coding for today. will take a look tomorrow evening if time allows. In any case, feel free to play with it, fix anything you like and/or let me know if you have ideas for these cases.

Loading

@vlsi
Copy link
Member

@vlsi vlsi commented Jun 23, 2016

@Chrriis re 3260: I think it would better look like (it still has to produce new calendar instance):

Calendar calendar = Calendar.getInstance(get_tz_from_resultset_somehow);
Timestamp timestampValue = getTimestamp(columnIndex, calendar);
calendar.setTimeInMillis(timestampValue.getTime());

Loading

@Chrriis
Copy link
Contributor Author

@Chrriis Chrriis commented Jun 23, 2016

Will fix that one before shutting down then.

Loading

@Chrriis
Copy link
Contributor Author

@Chrriis Chrriis commented Jun 25, 2016

@vlsi If we are in tz1 and we call getDate(x), then the result set caches tz1. But then if we switch to tz2 and call setDate(xx): should we use tz1 or tz2? What will be the result of a subsequent call to getDate(x)? I am wondering whether we should limit caching time zones to result sets of concurrency CONCUR_READ_ONLY to avoid the issue altogether while still improving the more common case.
I checked other result set properties and I think others are fine. I pondered over result set types to see if there was a need to limit to TYPE_FORWARD_ONLY, but I don't think other types cause problems.
What is your opinion on these?

Loading

@vlsi
Copy link
Member

@vlsi vlsi commented Jun 25, 2016

                                                                                  I think we can assume that default timezone would not change in between getDate.PS The more I think the more I incline to implement "fast path" for TimeZone.getDefault. We can try access the defaultTimeZone field via reflection, and if it works, that can be used as a quick zone change detector.‎ In any case reflection should be a supplemental implementation. 

Loading

@Chrriis
Copy link
Contributor Author

@Chrriis Chrriis commented Jun 25, 2016

The more I think the more I incline to implement "fast path" for TimeZone.getDefault.

Variation on that approach: access the field by reflection and use it directly. I think in our use of the default timezone we never give that object to the user and we never mutate that timezone object. So I think we can simply use that field directly, if reflection allows it. Otherwise, default to performance problematic approach. I think this can be adapted depending on the version of Java: a different version of Java could have solved that slowness anyway, and if not there would probably be a similar way we can implement. Don't you agree?

Check that simple approach here:
https://github.com/Chrriis/pgjdbc/tree/DefaultTimeZoneFastPath

Loading

@vlsi vlsi closed this in d2b86a0 Jul 11, 2016
vlsi added a commit that referenced this issue Jul 11, 2016
…s accessible through reflection

In case TimeZone#defaultTimeZone field is accessible, get that field
to see if the default zone was changed.

see #588
@vlsi
Copy link
Member

@vlsi vlsi commented Jul 11, 2016

@Chrriis , thanks for the fix

Loading

@vlsi vlsi added this to the 9.4.1209 milestone Jul 11, 2016
vlsi added a commit that referenced this issue Jul 11, 2016
…s accessible through reflection

In case TimeZone#defaultTimeZone field is accessible, get that field
to see if the default zone was changed.

see #588
zemian pushed a commit to zemian/pgjdbc that referenced this issue Oct 6, 2016
zemian pushed a commit to zemian/pgjdbc that referenced this issue Oct 6, 2016
…s accessible through reflection

In case TimeZone#defaultTimeZone field is accessible, get that field
to see if the default zone was changed.

see pgjdbc#588
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants