Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#1774] Fix unintended behavior related to --since d1 #1776

Merged

Conversation

yhtMinceraft1010X
Copy link
Contributor

@yhtMinceraft1010X yhtMinceraft1010X commented May 7, 2022

Fixes #1774.
Fixes #1778.

Proposed commit message

The use of Instant.ofEpochMilli(Long.MIN_VALUE) as the arbitrary first
commit date now fails shallow cloning for
testSinceBeginningDateRangeWithShallowCloning system test.

In addition, using this arbitrary date in convertToGitDateRangeArgs
when converting timezone results in an overflow causing sinceDate to
be greater than untilDate. Thus, no commit results are retrieved,
leading to failing system tests.

Also, the arbitrary first commit date is in UTC. However, in the
convertToGitDateRangeArgs method, the zoneId used for the conversion
is from the config, which may be different. This leads to an erroneous
timezone conversion.

Finally, using --since d1 with --period results in an unintended date
range because the --period and the arbitrary commit date from 
--since d1 are used to calculate the until date as the actual 
earliest commit date is not immediately available.

Let's change Instant.ofEpochMilli() to use 0 so that the arbitrary date
is 1970-01-01 and the overflow of the sinceDate can be avoided.

At the same time, let's convert the arbitrary date to the correct
timezone within the ArgsParser class and add a component test for
parsing --since d1 to check that the correct timezone is used for each
RepoConfiguration before cloning any repos.

Finally, let's raise a warning when --since d1 is being used with 
--period.

Other information

Screenshot showing the overflow of the sinceDate when Instant.ofEpochMilli(Long.MIN_VALUE) is used:
Screenshot (277)

Notice how the sinceDate has + sign instead of - sign.

Timezone Trivia

There also seems to be something strange with the timezone here:
Screenshot (278)

Although the code for this method has been changed, the timezone conversion logic in lines 60-61 is still preserved. The time appears to be converted to +07:30 instead of +08:00. However, tests still work correctly. This is likely following the old SG timezone being used before 1 January 1982.

The use of Instant.ofEpochMilli(Long.MIN_VALUE) as the arbitrary first
commit date now fails shallow cloning for
testSinceBeginningDateRangeWithShallowCloning system test.

In addition, using this arbitrary date in convertToGitDateRangeArgs
when converting timezone results in an overflow causing sinceDate to
be greater than untilDate. Thus, no commit results are retrieved,
leading to failing system tests.

Finally, the arbitrary first commit date is in UTC. However, in the
convertToGitDateRangeArgs method, the zoneId used for the conversion
is from the config, which may be different. This leads to an erroneous
timezone conversion.

Let's change Instant.ofEpochMilli() to use 0 so that the arbitrary date
is 1970-01-01. At the same time, let's convert the arbitrary date to
the correct timezone within the ArgsParser class and add a component
test for parsing --since d1.

This ensures that an arbitrary since date is adjusted to the
correct timezone for each RepoConfiguration before cloning any repos.
Copy link
Contributor

@gok99 gok99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for identifying the issue! Looks good to me, except for some minor questions.

@@ -290,6 +290,10 @@ public static CliArguments parse(String[] args) throws HelpScreenException, Pars

if (isSinceDateProvided) {
sinceDate = TimeUtil.getSinceDate(cliSinceDate.get());
// For --since d1, need to adjust the arbitrary date based on timezone
if (sinceDate.equals(SinceDateArgumentType.getArbitraryFirstCommitDateUtcLocal())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this logic go into TimeUtil::getSinceDate?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would just like to get this clarified too - there seems to be an identical clause in getSinceDate. If it can't/shouldn't be moved into getSinceDate, I'll approve the PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that you've mentioned it, I think the clause in TimeUtil.getSinceDate can be safely removed. Since the new arbitrary date is 1970-01-01 00:00:00 in UTC time, adding withHour(0).withMinute(0).withSecond(0) does nothing to it so it is effectively returning the same date. The equality check in ArgsParser should still pass.

That being said, we still have to keep withHour(0).withMinute(0).withSecond(0) in TimeUtil.getSinceDate since the getDateMinusAMonth and getDateMinusNDays methods still need to convert the since date to midnight.

As the until date is calculated using the --period with reference to
the arbitrary date of --since d1, the result is an incorrect date range.
This may result in no data being provided.

Let's throw a warning when the user may be using --since d1 with the
--period flag.
@yhtMinceraft1010X
Copy link
Contributor Author

For #1778, I decided to use a warning instead of throwing an exception and terminating the program before cloning. This is due to the possibility that someone might put --since 1/1/1970 --period 30d. In this case, an exception would be thrown since the given date would be equal to the arbitrary date that is used to check for whether --since d1 might be used and no cloning would be done.

Although this is an unlikely scenario, I believe no exception should be thrown since setting 31/1/1970 as the until date is expected behaviour.

docs/ug/cli.md Outdated
@@ -172,6 +172,7 @@ This flag overrides the `Ignore file size limit` field in the CSV config file.
<box type="info" seamless>

* If both start date and end date are not specified, the date of generating the report will be taken as the end date.
* May analyse the incorrect date range if used with `--since d1`. The program will throw a warning.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* May analyse the incorrect date range if used with `--since d1`. The program will throw a warning.
* May analyze the incorrect date range if used with `--since d1`. The program will throw a warning.

docs/ug/cli.md Outdated
@@ -216,7 +217,8 @@ Cannot be used with `--last-modified-date`. This may result in an incorrect last
<box type="info" seamless>

* If the start date is not specified, only commits made one month before the end date (if specified) or the date of generating the report, will be captured and analyzed.
* If `d1` is specified as the start date (`--since d1` or `-s d1`), then the earliest commit date of all repositories will be taken as the start date.
* If `d1` is specified as the start date (`--since d1` or `-s d1`), then the program will search for the earliest commit date of all repositories and use that as the start date.
* If `d1` is specified together with `--period`, then the program will warn that the date range being analysed may be incorrect.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* If `d1` is specified together with `--period`, then the program will warn that the date range being analysed may be incorrect.
* If `d1` is specified together with `--period`, then the program will warn that the date range being analyzed may be incorrect.

Comment on lines 296 to 300
if (TimeUtil.isEqualToArbitraryFirstDateUtc(sinceDate)) {
if (isPeriodProvided) {
logger.warning(MESSAGE_SINCE_D1_WITH_PERIOD);
}
sinceDate = TimeUtil.getArbitraryFirstCommitDateConverted(zoneId);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The indentation for this part seems to be quite deep. Is there any way to flatten this?

public static final String FIRST_COMMIT_DATE_SHORTHAND = "d1";
private static final ZonedDateTime ARBITRARY_FIRST_COMMIT_DATE = ZonedDateTime.ofInstant(
Instant.ofEpochMilli(0), ZoneId.of("Z"));
private static final LocalDateTime ARBITRARY_FIRST_COMMIT_DATE_UTC_LOCAL = ARBITRARY_FIRST_COMMIT_DATE
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable naming seems a little confusing. May I suggest using ARBITRARY_FIRST_COMMIT_DATE_UTC and ARBITRARY_FIRST_COMMIT_DATE_LOCAL instead?

@yhtMinceraft1010X yhtMinceraft1010X changed the title [#1774] Fix CI failing due to --since d1 [#1774] Fix unintended behavior related to --since d1 May 18, 2022
Copy link
Member

@dcshzj dcshzj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@dcshzj dcshzj merged commit 538365a into reposense:master Jun 5, 2022
@github-actions
Copy link
Contributor

github-actions bot commented Jun 5, 2022

The following links are for previewing this pull request:

@yhtMinceraft1010X yhtMinceraft1010X deleted the 1774-investigate-failing-ci branch June 15, 2022 05:00
Punpun1643 pushed a commit to Punpun1643/RepoSense that referenced this pull request Jun 17, 2022
…ense#1776)

The use of Instant.ofEpochMilli(Long.MIN_VALUE) as the arbitrary first
commit date now fails shallow cloning for
testSinceBeginningDateRangeWithShallowCloning system test.

In addition, using this arbitrary date in convertToGitDateRangeArgs
when converting timezone results in an overflow causing sinceDate to
be greater than untilDate. Thus, no commit results are retrieved,
leading to failing system tests.

Also, the arbitrary first commit date is in UTC. However, in the
convertToGitDateRangeArgs method, the zoneId used for the conversion
is from the config, which may be different. This leads to an erroneous
timezone conversion.

Finally, using --since d1 with --period results in an unintended date
range because the --period and the arbitrary commit date from 
--since d1 are used to calculate the until date as the actual 
earliest commit date is not immediately available.

Let's change Instant.ofEpochMilli() to use 0 so that the arbitrary date
is 1970-01-01 and the overflow of the sinceDate can be avoided.

At the same time, let's convert the arbitrary date to the correct
timezone within the ArgsParser class and add a component test for
parsing --since d1 to check that the correct timezone is used for each
RepoConfiguration before cloning any repos.

Finally, let's raise a warning when --since d1 is being used with 
--period.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants