Skip to content

Conversation

@davitbzh
Copy link
Contributor

@davitbzh davitbzh commented Nov 8, 2022

This PR adds/fixes/changes...
Updated converting to epoch milliseconds in spark engine and use util function to be consistent

JIRA Issue: -

Priority for Review: -

Related PRs: -

How Has This Been Tested?

  • Unit Tests
  • Integration Tests
  • Manual Tests on VM

Checklist For The Assigned Reviewer:

- [ ] Checked if merge conflicts with master exist
- [ ] Checked if stylechecks for Java and Python pass
- [ ] Checked if all docstrings were added and/or updated appropriately
- [ ] Ran spellcheck on docstring
- [ ] Checked if guides & concepts need to be updated
- [ ] Checked if naming conventions for parameters and variables were followed
- [ ] Checked if private methods are properly declared and used
- [ ] Checked if hard-to-understand areas of code are commented
- [ ] Checked if tests are effective
- [ ] Built and deployed changes on dev VM and tested manually
- [x] (Checked if all type annotations were added and/or updated appropriately)

@tdoehmen
Copy link
Contributor

tdoehmen commented Nov 9, 2022

I wonder what was the initial problem here. Was it that the conversion didn't work for string-based date columns?
I'm afraid using the convert_event_time_to_timestamp function would lead to inconstistencies between python and java (see https://github.com/logicalclocks/feature-store-api/blob/master/java/src/main/java/com/logicalclocks/hsfs/engine/SparkEngine.java#L309)

result_dfs = {}
ts_type = dataset.select(event_time).dtypes[0][1]
ts_col = (
unix_timestamp(col(event_time)) * 1000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the udf should be able to handle "date", "timestamp" as well, so we do not need the if-clause.

@kennethmhc
Copy link
Contributor

I wonder what was the initial problem here. Was it that the conversion didn't work for string-based date columns? I'm afraid using the convert_event_time_to_timestamp function would lead to inconstistencies between python and java (see https://github.com/logicalclocks/feature-store-api/blob/master/java/src/main/java/com/logicalclocks/hsfs/engine/SparkEngine.java#L309)

The problem was that we expect event time column has precision to second and in the backend we will multiple the value by 1000 for comparison. However, Davit use a column which is in millisecond, so the comparison did not work properly.

@davitbzh Like Till said, we should handle millisecond value in Java client as well.

@davitbzh davitbzh requested a review from kennethmhc November 9, 2022 21:02

def _get_start_time(self):
# minimum start time is 1 second
return 1000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we eventually will support only 10 digit timestamp yes

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But now we should any timestamp in second right?

return util.convert_event_time_to_timestamp(event_time)

# registering the UDF
_convert_event_time_to_timestamp = udf(_check_event_time_type, LongType())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can use util.convert_event_time_to_timestamp directly here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well I don't want to put this part in util.convert_event_time_to_timestamp

            # for backward compatibility
            if isinstance(event_time, int) and len(str(event_time)) == 13:
                event_time = int(event_time / 1000)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in util.py, we have

    elif isinstance(event_time, int):
        if event_time == 0:
            raise ValueError("Event time should be greater than 0.")
        # jdbc supports timestamp precision up to second only.
        if len(str(event_time)) < 13:
            event_time = event_time * 1000
        return event_time

so, below is redundant.

if isinstance(event_time, int) and len(str(event_time)) == 13:
                event_time = int(event_time / 1000)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

honestly this is wrong:

        if len(str(event_time)) < 13:
            event_time = event_time * 1000

but I don't know how to replace without braking API.

If we don't use this then how do we guarantee that even_time will have seconds granularity?

if isinstance(event_time, int) and len(str(event_time)) == 13:
                event_time = int(event_time / 1000)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can check either 10 or 13?

@davitbzh davitbzh requested a review from kennethmhc November 14, 2022 23:54
davitbzh and others added 2 commits November 15, 2022 10:11
Co-authored-by: kennethmhc <kennethmhc@users.noreply.github.com>
@davitbzh davitbzh requested a review from kennethmhc November 15, 2022 09:20
Copy link
Contributor

@kennethmhc kennethmhc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Co-authored-by: kennethmhc <kennethmhc@users.noreply.github.com>
@davitbzh davitbzh merged commit 6c70614 into logicalclocks:master Nov 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants