Skip to content

Non-ascii name is not recognized by the old parser #916

@Thirumalai-Shaktivel

Description

@Thirumalai-Shaktivel

Code:

def test_frompyfunc_name(self):
    # name conversion was failing for python 3 strings
    # resulting in the default '?' name. Also test utf-8
    # encoding using non-ascii name.
    def cassé(x):
        return x

https://github.com/numpy/numpy/blob/5028e407bf7c98662bfe247895292087e1372737/numpy/core/tests/test_regression.py#L2516-L2521
Error in Old Parser:

$ lpython --show-ast ./numpy/core/tests/test_regression.py
Internal Compiler Error: Unhandled exception
Traceback (most recent call last):
  Binary file "/home/thirumalai/Open_Source/lpython/src/bin/lpython", in _start()
  File "./csu/../csu/libc-start.c", line 392, in __libc_start_main_impl()
  File "./csu/../sysdeps/nptl/libc_start_call_main.h", line 58, in __libc_start_call_main()
  File "/home/thirumalai/Open_Source/lpython/src/bin/lpython.cpp", line 1008, in ??
    return emit_ast(arg_file, runtime_library_dir, compiler_options);
  File "/home/thirumalai/Open_Source/lpython/src/bin/lpython.cpp", line 114, in ??
    al, runtime_library_dir, infile, diagnostics, compiler_options.new_parser);
  File "/home/thirumalai/Open_Source/lpython/src/lpython/parser/parser.cpp", line 145, in LFortran::parse_python_file(Allocator&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, LFortran::diag::Diagnostics&, bool)
    ast = LPython::deserialize_ast(al, input);
  File "/home/thirumalai/Open_Source/lpython/src/lpython/python_serialization.cpp", line 44, in LFortran::LPython::deserialize_ast(Allocator&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
    return v.deserialize_node();
  File "/home/thirumalai/Open_Source/lpython/src/lpython/python_ast.h", line 10507, in LFortran::LPython::AST::DeserializationBaseVisitor<LFortran::LPython::ASTDeserializationVisitor>::deserialize_node()
    case (LPython::AST::astType::mod) : return self().deserialize_mod();
  File "/home/thirumalai/Open_Source/lpython/src/lpython/python_ast.h", line 10577, in LFortran::LPython::AST::DeserializationBaseVisitor<LFortran::LPython::ASTDeserializationVisitor>::deserialize_mod()
    case (LPython::AST::modType::Module) : return self().deserialize_Module();
  File "/home/thirumalai/Open_Source/lpython/src/lpython/python_ast.h", line 10527, in LFortran::LPython::AST::DeserializationBaseVisitor<LFortran::LPython::ASTDeserializationVisitor>::deserialize_Module()
    v_body.push_back(al, LPython::AST::down_cast<LPython::AST::stmt_t>(self().deserialize_stmt()));
  File "/home/thirumalai/Open_Source/lpython/src/lpython/python_ast.h", line 11101, in LFortran::LPython::AST::DeserializationBaseVisitor<LFortran::LPython::ASTDeserializationVisitor>::deserialize_stmt()
    case (LPython::AST::stmtType::ClassDef) : return self().deserialize_ClassDef();
  File "/home/thirumalai/Open_Source/lpython/src/lpython/python_ast.h", line 10683, in LFortran::LPython::AST::DeserializationBaseVisitor<LFortran::LPython::ASTDeserializationVisitor>::deserialize_ClassDef()
    v_body.push_back(al, LPython::AST::down_cast<LPython::AST::stmt_t>(self().deserialize_stmt()));
  File "/home/thirumalai/Open_Source/lpython/src/lpython/python_ast.h", line 11099, in LFortran::LPython::AST::DeserializationBaseVisitor<LFortran::LPython::ASTDeserializationVisitor>::deserialize_stmt()
    case (LPython::AST::stmtType::FunctionDef) : return self().deserialize_FunctionDef();
  File "/home/thirumalai/Open_Source/lpython/src/lpython/python_ast.h", line 10598, in LFortran::LPython::AST::DeserializationBaseVisitor<LFortran::LPython::ASTDeserializationVisitor>::deserialize_FunctionDef()
    v_body.push_back(al, LPython::AST::down_cast<LPython::AST::stmt_t>(self().deserialize_stmt()));
  File "/home/thirumalai/Open_Source/lpython/src/lpython/python_ast.h", line 11104, in LFortran::LPython::AST::DeserializationBaseVisitor<LFortran::LPython::ASTDeserializationVisitor>::deserialize_stmt()
    case (LPython::AST::stmtType::Assign) : return self().deserialize_Assign();
  File "/home/thirumalai/Open_Source/lpython/src/lpython/python_ast.h", line 10730, in LFortran::LPython::AST::DeserializationBaseVisitor<LFortran::LPython::ASTDeserializationVisitor>::deserialize_Assign()
    m_value = LPython::AST::down_cast<LPython::AST::expr_t>(self().deserialize_expr());
  File "/home/thirumalai/Open_Source/lpython/src/lpython/python_ast.h", line 11611, in LFortran::LPython::AST::DeserializationBaseVisitor<LFortran::LPython::ASTDeserializationVisitor>::deserialize_expr()
    case (LPython::AST::exprType::ConstantStr) : return self().deserialize_ConstantStr();
  File "/home/thirumalai/Open_Source/lpython/src/lpython/python_ast.h", line 11388, in LFortran::LPython::AST::DeserializationBaseVisitor<LFortran::LPython::ASTDeserializationVisitor>::deserialize_ConstantStr()
    m_value = self().read_cstring();
  File "/home/thirumalai/Open_Source/lpython/src/lpython/python_serialization.cpp", line 34, in LFortran::LPython::ASTDeserializationVisitor::read_cstring()
    std::string s = read_string();
LCompilersException: read_string: Space expected.
$ python --show-ast --new-parser ./numpy/core/tests/test_regression.py
tokenizer error: Token '�' is not recognized
    --> ./numpy/core/tests/test_regression.py:2520:17
     |
2520 |         def cassé(x):
     |                 ^ token not recognized


Note: if any of the above error or warning messages are not clear or are lacking
context please report it to us (we consider that a bug that needs to be fixed).

cc @certik

Metadata

Metadata

Assignees

No one assigned

    Labels

    ParserIssues or improvements related to parsercould closeIssues/PRs which can be closed.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions