Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a method to calculate the bounds of a range from its textual representation #66

Merged
merged 11 commits into from
Nov 29, 2017
Merged

Add a method to calculate the bounds of a range from its textual representation #66

merged 11 commits into from
Nov 29, 2017

Conversation

flx5
Copy link
Contributor

@flx5 flx5 commented Nov 26, 2017

This adds the functionality to obtain the beginning and ending of a network range.

Usage example:

## parse date range
 start.end.posix = get.range.bounds("2012-07-10 15:58:00-2012-07-15 16:02:00")
start.end.commit = get.range.bounds("f86391e7d7eaf4234fc742c1b61f32cb8a65782e-63654c1b089b8abe9a52d21fd1b53b1631539e10")
start.end.versions = get.range.bounds("v1.0.0-v2.0.0_alpha")

This addresses #58.

Copy link
Collaborator

@clhunsen clhunsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pull request looks really good basically. I just put some comments for minor stuff, which should be addressed before merging.

Additionally, please add your signed-off tags to the first two commits (i.e., do a rebase with a rewording) and the issue ID you are addressing in, at least, the first commit's message body.

## prassefe@fim.uni-passau.de

test_that("Parse range", {
range.input = c("2012-07-10 15:58:00-2012-07-15 16:02:00",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a test for a midnight-timestamp. The reason is that a date 2012-07-10 00:00:00 is represented as 2012-07-10 in POSIXct format and, thus, appears in this format inside a range string.

E.g.: "2012-07-10-2012-07-15 16:02:00" and "2012-07-10 15:58:00-2012-07-15".

util-data.R Outdated
#' @return Returns a vector with two entries (start, end) of type POSIXct if input was a date;
#' or of type character if input was a commit hash or version;
#' or NULL if the string could not be parsed
get.bounds = function() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rename this method to get.range.bounds to attribute the reference to the get.range method.

Additionally, please move it "one function up", i.e., right after the get.range method.

@@ -734,6 +734,15 @@ RangeData = R6::R6Class("RangeData", inherit = ProjectData,
#' @return the revision callgraph
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add yourself to the copyright header of this file.

util-misc.R Outdated
## the patterns to test with appropriate conversions (if any)
tests = list(
## date format (assuming dates are GMT)
c("\\d{4}-\\d{2}-\\d{2}(\\s\\d{2}:\\d{2}:\\d{2})?", as.POSIXct),
Copy link
Collaborator

@clhunsen clhunsen Nov 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything below is just a remark and discussion point!

Basically, we need to set the timezone to UTC in this file, too, analogously to the files util-init.R and util-read.R (see here, for example). But we need a more general solution for that, so I will open an issue.

Consequently: No need right now to add the configuration lines in this file, but later we might add them -- depending on the outcome of the issue I will open.

Update: See #68 for this discussion.

This was referenced Nov 28, 2017
flx5 and others added 8 commits November 28, 2017 22:42
This implements issue #58

Signed-off-by: Felix Prasse <prassefe@fim.uni-passau.de>
Signed-off-by: Felix Prasse <prassefe@fim.uni-passau.de>
Signed-off-by: Felix Prasse <prassefe@fim.uni-passau.de>
Signed-off-by: Felix Prasse <prassefe@fim.uni-passau.de>
Signed-off-by: Felix Prasse <prassefe@fim.uni-passau.de>
Signed-off-by: Felix Prasse <prassefe@fim.uni-passau.de>
Signed-off-by: Felix Prasse <prassefe@fim.uni-passau.de>
Copy link
Collaborator

@clhunsen clhunsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please just remove the last commit, then we are fine for merging. For the reasons, have a look at the only comment. 😉

util-misc.R Outdated
tests = list(
## date format (assuming dates are GMT)
## Add zeros for dates without time. Will be ignored if there is a time already.
c("\\d{4}-\\d{2}-\\d{2}(\\s\\d{2}:\\d{2}:\\d{2})?", function(x) as.POSIXct(paste(x, " 00:00:00"))),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You just need to put as.POSIXct as second entry here. The function as.POSIXct parses a date like 2017-01-02 as it was 2017-01-02 00:00:00, so you do not need to add a time here.

Sorry for my -- obviously -- misunderstandable comment earlier.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the hint. I've added these zeros because the result kept dropping the times. But I've just realized that the method ´as.POSIXct´ gets applied to the entire vector instead of each component in line 117.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thomas and I worked on a global solution for the problem, because we noticed that there are also other implications. Stay tuned for some patches for your branch and this pull request.

@clhunsen clhunsen dismissed their stale review November 29, 2017 16:44

Error in thinking.

When parsing a vector of timestamps using `as.POSIXct` where the hours
are missing from, at least, one timestamp (indicating midnight), the
hours are stripped from *all* timestamps! To circumvent the problem, we
now parse timestamps given as character strings using the function
`lubridate::ymd_hms(..., truncated = 3)`, which adapts appropriately
according to the given timestamps.

This change is, most importantly, relevant for the parsing of range
bounds from a given range identifier (e.g., "2017-03-04
12:45:38-2018-05-27 12:45:38").

Note that the range identifiers now *always* include a substring
indicating the hours of the timestamps, even when the time is midnight!

Example: `str(as.POSIXct(c("2017-01-02", "2017-03-04 12:45:38")))`
yields `POSIXct[1:2], format: "2017-01-02" "2017-03-04"`, which is
clearly wrong.

Adapt tests appropriately to also test the changed functionality.

Update README, showcase file and install.R on the way.

This fixes the problem in PR #66.

Signed-off-by: Claus Hunsen <hunsen@fim.uni-passau.de>
Signed-off-by: Thomas Bock <bockthom@fim.uni-passau.de>
Signed-off-by: Thomas Bock <bockthom@fim.uni-passau.de>
Signed-off-by: Claus Hunsen <hunsen@fim.uni-passau.de>
Update code for data splitting that was not using the NetworkBuilder.

Signed-off-by: Claus Hunsen <hunsen@fim.uni-passau.de>
Signed-off-by: Thomas Bock <bockthom@fim.uni-passau.de>
Copy link
Collaborator

@clhunsen clhunsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine now.

@clhunsen clhunsen merged commit b75403c into se-sic:dev Nov 29, 2017
@clhunsen
Copy link
Collaborator

Hooray! 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants