Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

All the list is convert into string if '-' is present #229

Open
outscale-frl opened this issue Mar 2, 2020 · 12 comments
Open

All the list is convert into string if '-' is present #229

outscale-frl opened this issue Mar 2, 2020 · 12 comments

Comments

@outscale-frl
Copy link

Hello,

Capture d’écran 2020-03-02 à 09 56 52

I reuse the introduction example code to illustrate.
When you give a list as arg, and into this list there is a string including this specific charactere '-', Fire turns all the list into one and unique string 😢. As you can see with the capture.

I'm looking into the fire code to see where the modification could be.

Thanks !

@dbieber
Copy link
Member

dbieber commented Mar 2, 2020

Thanks for reporting the issue.

Here's the reason for it:

Your shell is stripping the quotes from the input to fire.
So all Fire sees is [hel-lo,hello] in the first example and [hello,hello] in the second example (no quotes).
Fire is smart enough to put the quotes back in around bare words, so it gives a list in the second example, but in the first example this yields ["hel"-"lo","hello"] which isn't a valid input, so Fire falls back to treating the whole thing as a string.

As a workaround you can put quotes around the whole input:
python main.py --name="['hel-lo','hello']"
but I know that's not a great experience.

The place in the code where an improvement would be made is here

def DefaultParseValue(value):

Also #226 (strict mode) will help in this situation once it's available (but its not implemented yet) because you'll be able to specify whether an input expects a string or a list and Fire wont surprise you with a different type.

@outscale-frl
Copy link
Author

Thanks for your answer @dbieber !

@outscale-frl
Copy link
Author

I tried a modification but there are so many usecases...

if --name=['hello-eei38223','342efw3']
root = ast.parse(value, mode='eval')
return a SyntaxError because the second element starts with a number.

if --name=['hello-eei38223','hello']
root = ast.parse(value, mode='eval')
return a root.body.elts with BinOp and Name object

if --name=['hello-38223']
root = ast.parse(value, mode='eval')
return root.body.elts[0].left as Name object and root.body.elts[0].right as Num object

if --name=['hello-eei38223']
root = ast.parse(value, mode='eval')
return a root.body.elts[0].left and right as Name object

So maybe wait for the strict mode is a better solution than modification into _LiteralEval ?

@dbieber
Copy link
Member

dbieber commented Mar 2, 2020

I think we'll want both. We definitely want to improve this.
And while strict mode will remove surprises, we also should separately come up with an intuitive and unambiguous way to pass in lists (balanced with providing backwards compatibility where reasonable to do so.)

@dhawalmalot08
Copy link

Hello!
We are a group of undergrads from LNMIIT, Jaipur from India and looking for a contribution to open source as a course project under Prof. Philip Miller . We are highly interested to work on this issue. Can we expect your guidance and support and proceed further?

@stephenvincent27
Copy link

@dbieber I believe that we have a working solution for this. As it stands, the current implementation relies heavily on the AST library to identify individual elements in containers and put quotes around them when necessary:

python-fire/fire/parser.py

Lines 100 to 109 in b77f5ae

for node in ast.walk(root):
for field, child in ast.iter_fields(node):
if isinstance(child, list):
for index, subchild in enumerate(child):
if isinstance(subchild, ast.Name):
child[index] = _Replacement(subchild)
elif isinstance(child, ast.Name):
replacement = _Replacement(child)
node.__setattr__(field, replacement)

What we try to do instead, is perform this identification of individual elements manually, by considering some general delimiters (take for example, when in a list, a comma (,) when not in a string, generally denotes a demarcation between two elements). When the identified element is further identified as a String (we still rely on AST for this purpose), we put back the stripped quotes around it. This takes care of all special characters, including hyphens, that occur in arguments.

Our current implementation passes parser_test.py easily, and with the exception of 1 type of case (which needs some discussion), parser_fuzz_test.py as well. I can send over a flowchart and pseudocode so that you can get a clearer picture of what I’m talking about (we made these as part of our uni project).

Do let us know what you think of this solution!

@dbieber
Copy link
Member

dbieber commented Dec 22, 2020

Thanks @stephenvincent27!

I'd love to hear more. This is definitely an area we want to improve in Fire.

@stephenvincent27
Copy link

@dbieber Thanks for your prompt reply!

Here’s a rough flowchart for our implementation: Drive Link

To summarize: We parse the argument by dividing it into one of the three major cases:

  1. The argument is a container. In that case, ignore the braces, and perform the check again. In the case of dictionaries, we start considering the colon (:) as a delimiter as well. To know whether we are currently inside a dictionary, we use a stack.
  2. The argument is a string with quotes intact (different shells strip quotes differently, we found, hence this case.) In this case, we directly continue parsing from the closing quotes. Our implementation covers both single and triple quoted strings.
  3. The argument is a literal (or a string with quotes stripped). In that case, we move to the first delimiter that we find after it (this denotes the end of the element). Now, we pass this ‘literal’ element to AST, and only when AST identifies it as a string, we add quotes around it (numerals, booleans are thus safe).

Our general delimiter set consists of a comma (,), a space (‘ ’) and all possible closing braces (‘)’, ‘}’, ‘]’). Adding space to the delimiter set ensures that variable spaces added by the user are taken care of.

Now, as I had mentioned earlier, we fail test cases of a specific type in fuzz tests, one example of which is:

INPUT: K,\x1d
OUTPUT: ('K', '\x1d')
EXPECTED(?): 'K,\x1d'

Here, because of the comma in between, our implementation treats it as two different elements, and when passed to AST, it converts it into a tuple. So many additions leads to a high edit distance between the input and output strings, hence the failure. Again, this can be easily avoided by checking whether we are currently in a container or not. However, we were unsure about which behavior is more appropriate. Do let us know.

There are two other test cases where failure is declared (again, due to high edit distances), but I think they can be ignored:

  1. ‘...’ gets converted to Ellipsis when passed to AST.
  2. ‘1e5’ gets converted to 100000.0.

@dbieber
Copy link
Member

dbieber commented Dec 24, 2020

This is looking good! I agree we can ignore most of those failures of the fuzz test (not sure about what we should do for '...'; see end). We should update the fuzz test so those don't count as failures any more.

I think our next step should be to list out new test cases so that we can some concrete examples on which to evaluate your approach. This will make discussing potential changes easier, give us test cases to prevent regressions, and will also help with describing parsing in the documentation once the change is released.

We should be sure our examples cover nested lists, tuples, sets, and dicts, strings, bools, ints, floats, complex numbers, quoted and unquoted strings with and without non-alphanumeric characters. Here are two proposed tests; feel free to suggest more:

[1234,test-item,[nested-list!, with spaces, 1e5]] -> [1234, "test-item", ["nested-list!", "with spaces", 1e5]]
{12.34,test-item,(2+3j, with spaces , True)} -> {12.34, "test-item", (2+3j, "with spaces", True)}


What do you think we should do with ...? Do you think we should treat it as Ellipsis or a string? I think a string might be better as it would be surprising to many users for it to be treated as Ellipsis.

@stephenvincent27
Copy link

I do have some cases myself, I think we can add them to parser_test.py. I’ll suggest more when it comes to me:

[{abc:3+4j,'colon:here':('spaces, and commas', 123.45),1e5:{'''abc''':True}}] -> [{'abc': (3+4j), 'colon:here': ('spaces, and commas', 123.45), 100000.0: {'abc': True}}]
{'''[all (delimiters)], are present {in} this:string!'''} -> {'[all (delimiters)], are present {in} this:string!'}

Other than that, yes! We can update fuzz tests to accommodate our implementation. I had no idea how that formula for maximum permissible edit distance came to be in the first place, so kindly guide me on how to change it accordingly.

Our implementation does cover every data type that you mentioned, along with other remaining data types in the Python documentation as well (as long as AST can parse them correctly).

Regarding ‘...’: I honestly didn’t know that such an object existed until I came across one in tests; however I do agree that converting ‘...’ to ‘Ellipsis’ would be unexpected by users. We’ll treat it as a string, all it would take is an extra condition in the code.

@stephenvincent27
Copy link

If all seems good, should we go for the PR? We can further discuss things there.

@dbieber
Copy link
Member

dbieber commented Jan 8, 2021

Yes, that sounds good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants