Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert to Python 3 #674

Merged
merged 28 commits into from Jan 3, 2020
Merged

Convert to Python 3 #674

merged 28 commits into from Jan 3, 2020

Conversation

rw-access
Copy link
Contributor

@rw-access rw-access commented Dec 4, 2019

Getting ahead of python2's EOL. There were a few minor changes, and I believe that whitespace changes in json.dump, and dictionary default ordering may have changed some of the autogenerated code.

Unit tests are still passing, although there's one deprecation warning that's easy to address

build/ve/bin/python -m unittest discover --start-directory scripts/tests
..
ecs/scripts/tests/test_ecs_helpers.py:72: DeprecationWarning: Please use assertEqual instead.
  'base': {'group': 1, 'name': 'base'}
........................
----------------------------------------------------------------------

If you disable whitespace changes, it'll be easier to view the diff.

@rw-access rw-access requested a review from webmat December 4, 2019 22:52
@@ -6,11 +6,11 @@
# Dictionary helpers


def dict_copy_keys_ordered(dict, copied_keys):
def dict_copy_keys_ordered(dct, copied_keys):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed this to avoid variable shadowing, since dict already means something.
It wasn't strictly necessary, but it makes the code easier to read, and makes PyCharm more pleased with me.

current_nesting = nestings[0]
rest_nestings = nestings[1:]
if len(rest_nestings) > 0:
if current_nesting not in dict:
dict[current_nesting] = {'properties': {}}
elif 'type' in dict[current_nesting] and 'object' == dict[current_nesting]['type']:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We got lucky before, because with Python 2, the ordering was deterministic

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous version was passing unit tests but generated code that was inconsistent, because "type": "object" was missing. I can make it so this is always present, but I don't exactly know the intendend behavior.

"dns": {
"properties": {
"answers": {
"properties": {
"class": {
"ignore_above": 1024,
"type": "keyword"
},
"data": {
"ignore_above": 1024,
"type": "keyword"
},
"name": {
"ignore_above": 1024,
"type": "keyword"
},
"ttl": {
"type": "long"
},
"type": {
"ignore_above": 1024,
"type": "keyword"
}
}
},
"header_flags": {
"ignore_above": 1024,
"type": "keyword"
},
"id": {
"ignore_above": 1024,
"type": "keyword"
},
"op_code": {
"ignore_above": 1024,
"type": "keyword"
},
"question": {
"properties": {
"class": {
"ignore_above": 1024,
"type": "keyword"
},
"name": {
"ignore_above": 1024,
"type": "keyword"
},
"registered_domain": {
"ignore_above": 1024,
"type": "keyword"
},
"subdomain": {
"ignore_above": 1024,
"type": "keyword"
},
"top_level_domain": {
"ignore_above": 1024,
"type": "keyword"
},
"type": {
"ignore_above": 1024,
"type": "keyword"
}
}
},
"resolved_ip": {
"type": "ip"
},
"response_code": {
"ignore_above": 1024,
"type": "keyword"
},
"type": {
"ignore_above": 1024,
"type": "keyword"
}
}
},

Replacing these lines

if current_nesting not in dict:
dict[current_nesting] = {'properties': {}}
elif 'type' in dict[current_nesting] and 'object' == dict[current_nesting]['type']:
dict[current_nesting] = {'type': dict[current_nesting]['type'], 'properties': {}}

with this should do what I think is our intended behavior, but it's very clear to me what that is.

        dct[current_nesting].setdefault("properties", {})

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah iirc it's a nasty bit that I added when adding dns.answers. Adding type: object everywhere we have further nesting would be fine and a noop, as far as I understand.

I'd prefer to keep it as it was, and only handle the explicit type: object in place when it's already there, for now.

@webmat
Copy link
Contributor

webmat commented Dec 5, 2019

Huge thanks for this! I'll review shortly :-)

I'm specifically using OrderedDict in a few strategic places. But the default dict becoming ordered does show up in a few more places, like the labels example :-)

@rw-access rw-access self-assigned this Dec 6, 2019
@rud
Copy link

rud commented Dec 19, 2019

Tiny suggestion: update the python version in CONTRIBUTING.md.

@webmat
Copy link
Contributor

webmat commented Dec 23, 2019

The build is now failing because the Golang base image for 1.13.x comes with 5 year old Python 3.5.2. Not sure why the Travis config for python 3.7 isn't kicking in 🤦‍♂

@webmat
Copy link
Contributor

webmat commented Dec 23, 2019

Related: #630

@webmat
Copy link
Contributor

webmat commented Dec 24, 2019

The only oddity left is around the two timestamp field's examples. I had never noticed that it was getting interpreted before. It was unquoted in the YAML source files, and was being changed slightly by Python 2.7. Upgrading to 3.6+ was changing it slightly, but in a different manner.

event.created:

2.7

  • source value was: 2016-05-23T08:05:34.857Z
  • output value (e.g. in asciidoc) was: 2016-05-23 08:05:35.101000

3.6/3.7

  • source value was: 2016-05-23T08:05:34.857Z
  • output value (e.g. in asciidoc) was: 2016-05-23 08:05:34.857000+00:00

I've quoted the example in the source YAML, to avoid the timestamp being interpreted. In the end, ECS is not actually mandating a precise timestamp format. So this is different than before, but this makes the timestamp examples consistent with what's in the source YAML everywhere.

Now:

  • source value is: '2016-05-23T08:05:34.857Z'
  • output value (e.g. in asciidoc) is: 2016-05-23T08:05:34.857Z

@webmat
Copy link
Contributor

webmat commented Dec 24, 2019

@rw-access @ruflin This PR is ready for final review.

@webmat webmat requested a review from ruflin December 24, 2019 13:13
@rw-access
Copy link
Contributor Author

thanks for the updates @webmat. LGTM

@webmat webmat merged commit 985e984 into elastic:master Jan 3, 2020
@webmat webmat mentioned this pull request Jan 3, 2020
@rw-access rw-access deleted the py3-conversion branch January 24, 2020 21:45
dcode pushed a commit to dcode/ecs that referenced this pull request Apr 15, 2020
Notable changes:

* `date` field examples are now quoted in the schemas/*.yml files. This avoids them being interpreted and output in a different format.
* Changed the TravisCI distro to use, to get Python 3.6
* Virtualenv forces the use of Python3
* Update the Python version used in contributor guidelines
* avoid aliasing problems by renaming some `dict` function param names to `dct`
* Python 3 adjustments:
  * added parens for function calls such as `print` and `exit`
  * Updated all dependencies in scripts/requirements.txt to most recent versions
  * simplify a `sorted` lambda
  * use `yaml.safe_load`
  * Force sorting in a few places where Python 2's deterministic dict ordering was accidentally giving us sorted values
  * `assertEquals` => `assertEqual`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants