Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Autoupdate roundtrips. Resolves #210 #211

Closed
wants to merge 1 commit into from
Closed

Conversation

asottile
Copy link
Member

No description provided.

@asottile
Copy link
Member Author

@asottile
Copy link
Member Author

This also doesn't seem to roundtrip lists / indentation. Just comments. So it's an improvement, but not a fix :/

@asottile
Copy link
Member Author

asottile commented Feb 8, 2016

I've rebased this, it's still not ready to ship -- I need to work out some bugs with upstream first

@asottile
Copy link
Member Author

asottile commented Feb 8, 2016

TODO: this doesn't roundtrip for some reason: (it drops the comment)

-   repo: https://github.com/pre-commit/pre-commit-hooks.git
    sha: cf550fcab3f12015f8676b8278b30e1a5bc10e70
    # foo
    hooks:
    -   id: trailing-whitespace

@asottile
Copy link
Member Author

asottile commented Feb 8, 2016

@AvdN
Copy link

AvdN commented Feb 20, 2016

I cannot reproduce the dropping of the comment in you indicate in your post starting with TODO:

I wrote a first version of a brute force determination of indent and ran that on that snippet and the YAML files in your repo. Only the travis file doesn't roundtrip and that needs investigating, but its semantics do not change.

from __future__ import print_function

import os
import sys

import ruamel.yaml as yaml
from ruamel.yaml.compat import text_type, binary_type
import difflib

d = difflib.Differ()

def load_yaml_guess_indent(stream):
    # load a yaml file guess the indentation, if you use TABs ...
    if isinstance(stream, text_type):
        yaml_str = stream
    elif isinstance(stream, binary_type):
        yaml_str = stream.decode('utf-8')  # most likely, but the Reader checks BOM for this
    else:
        yaml_str = stream.read()
    indent = 2  # default if not found for some reason
    prev_line_key_only = None
    for line in yaml_str.splitlines():
        rline = line.rstrip()
        if rline.startswith('- '):
            idx = 1
            while line[idx] == ' ':  # this will end as we rstripped
                idx += 1
            if line[idx] == '#':     # comment after -
                continue
            indent = idx
            break
        if rline.endswith(':'):
            idx = 0
            while line[idx] == ' ':  # this will end on ':'
                idx += 1
            prev_line_key_only = idx
            continue
        if prev_line_key_only is not None and rline:
            idx = 0
            while line[idx] in ' -':  # this will end on ':'
                idx += 1
            if idx > prev_line_key_only:
                indent = idx - prev_line_key_only
                break
        prev_line_key_only = None
    return yaml.round_trip_load(yaml_str), indent

yaml_str = """\
-   repo: https://github.com/pre-commit/pre-commit-hooks.git
    sha: cf550fcab3f12015f8676b8278b30e1a5bc10e70
    # foo
    hooks:
    -   id: trailing-whitespace
"""


data, indent = load_yaml_guess_indent(yaml_str)
print('indent', indent)
res = yaml.dump(data, Dumper= yaml.RoundTripDumper, indent=indent)
if res != yaml_str:
    result = list(d.compare(yaml_str.splitlines(True), res.splitlines(True)))
    sys.stdout.writelines(result)

for root, directory_names, file_names in os.walk('.'):
    if '.git' in directory_names:
        directory_names.remove('.git')
    for file_name in file_names:
        if file_name.endswith('.yml') or file_name.endswith('.yaml'):
            print('-' * 60)
            full_name = os.path.join(root, file_name)
            with open(full_name) as fp:
                yaml_str = fp.read()
                data, indent = load_yaml_guess_indent(yaml_str)
                print(full_name, 'indent', indent)
                res = yaml.dump(data, Dumper= yaml.RoundTripDumper, indent=indent)
                if res != yaml_str:
                    result = list(d.compare(yaml_str.splitlines(True), res.splitlines(True)))
                    sys.stdout.writelines(result)
                    print('>>>>>>')
                    print(res)

@AvdN
Copy link

AvdN commented Feb 20, 2016

The only way I get a comment to drop is when it is before a !!tag entry. That is a known xfail in the tests.

If you want to use ruamel.yaml to rewrite a file and be sure that it doesn't:

  • change the semantics
  • drop any comments

the best thing to do IMO is

  • compare the loaded original against a reloaded roundtrip dump (i.e. load, dump, reload. Had I done so I would have noticed the recently found/solved bug with larger reindents pushing against the '#' without whitespace)
  • check if the same characters, disregarding whitspace and order, are in the dumped file.

and don't rewrite/throw a warning that ruamel.yaml couldn't handle the file.
Do you want some code for that?

@asottile
Copy link
Member Author

@AvdN I can probably handle that -- while we're on the subject of !!tags is there a "Safe" RoundTripLoader / RoundTripDumper ?

@AvdN
Copy link

AvdN commented Feb 21, 2016

There RoundTripLoader is the Safe version. The relevant part, its constructor, derives from the SafeConstructor used in safe_load, not from the Constructor used by load.
That means it errors on non-recognised tags, just like safe_load does, unless you add a specific constructor for these tags.

I have a todo item to add functionality to just make a CommentedMap out of mappings with unrecognised tags and preserve the tag value as an attribute ready to write it out. So far nobody asked for this (and I didn't need it myself), because the non-safe loading (and using object tags) is shunted by everyone.

@asottile
Copy link
Member Author

@AvdN ah, the main reason I ask is this:

>>> from ruamel.yaml.compat import ordereddict
>>> import ruamel.yaml
>>> ruamel.yaml.dump(ordereddict(), Dumper=ruamel.yaml.RoundTripDumper)
'!!omap []\n'
>>> ruamel.yaml.load(ruamel.yaml.dump(ordereddict(), Dumper=ruamel.yaml.RoundTripDumper))
ordereddict([])

Is there a different type I should use in my tests other than ordereddict for making yaml structures?

@AvdN
Copy link

AvdN commented Feb 21, 2016

Yes ordereddict is the superclass of the one actually used to hang the attributes (and above all methods) on, needed for roundtripping:

import ruamel.yaml as yaml
x = yaml.round_trip_load('a: 1\nb: 2')
print(type(x))

from ruamel.yaml.comments import CommentedMap
cm = CommentedMap()
cm['a'] = 1
cm['b'] = 2
cm.yaml_add_eol_comment('# can also use 0o001', key='a', column=10)

print(yaml.dump(cm, Dumper=yaml.RoundTripDumper))

@asottile
Copy link
Member Author

asottile commented Mar 6, 2017

We'll continue to explore options in #414.

@asottile asottile closed this Mar 6, 2017
@asottile asottile deleted the autoupdate_round_trip branch August 14, 2017 20:53
droctothorpe pushed a commit to droctothorpe/pre-commit that referenced this pull request Mar 23, 2022
…ave-shebangs

Add a checker for executables without shebangs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants