Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python file read twice ? #2475

Open
rebcabin opened this issue Feb 1, 2024 · 8 comments
Open

Python file read twice ? #2475

rebcabin opened this issue Feb 1, 2024 · 8 comments
Labels
enhancement New feature or request good first issue Good for newcomers Parser Issues or improvements related to parser

Comments

@rebcabin
Copy link
Contributor

rebcabin commented Feb 1, 2024

repro:

lldb lpython -- --show-asr tests/expr8.py

first time it's read is line 128 of lpython.cpp

    frame #0: 0x000000010000e794 lpython`(anonymous namespace)::emit_asr(infile="tests/expr8.py", pass_manager=0x000000016fdfe7d8, runtime_library_dir="/Users/brian/Dropbox/Mac/Documents/GitHub/lpython/src/bin/../runtime", with_intrinsic_modules=false, compiler_options=0x000000016fdfe870) at lpython.cpp:200:5
   197 	        std::string input = LCompilers::read_file(infile);
   198 	        lm.init_simple(input);
   199 	        lm.file_ends.push_back(input.size());
-> 200 	    }
   201 	    LCompilers::Result<LCompilers::LPython::AST::ast_t*> r1 = parse_python_file(
   202 	        al, runtime_library_dir, infile, diagnostics, 0, compiler_options.new_parser);
   203 	    std::cerr << diagnostics.render(lm, compiler_options);

second time it's read is line 125 of parser.cpp

    frame #0: 0x00000001000c9f78 lpython`LCompilers::LPython::parse_python_file(al=0x000000016fdfd2b0, (null)="/Users/brian/Dropbox/Mac/Documents/GitHub/lpython/src/bin/../runtime", infile="tests/expr8.py", diagnostics=0x000000016fdfd298, prev_loc=0, new_parser=true) at parser.cpp:126:49
   123 	    new_parser = true;
   124 	    LCOMPILERS_ASSERT(new_parser)
   125 	    std::string input = read_file(infile);
-> 126 	    Result<LPython::AST::Module_t*> res = parse(al, input, prev_loc, diagnostics);
   127 	    if (res.ok) {
   128 	        ast = (LPython::AST::ast_t*)res.result;
   129 	    } else {

intentional?

@rebcabin
Copy link
Contributor Author

rebcabin commented Feb 1, 2024

I also see a read_file on line 197 of lpython.cpp. I'm still learning what's going on.

@certik
Copy link
Contributor

certik commented Feb 1, 2024

The file should only be read once. If it is read twice, that's a bug that we need to fix. There are many file reads, but only one should be executed.

@rebcabin
Copy link
Contributor Author

rebcabin commented Feb 4, 2024

Looks like, during parsing, read_file is called at least twice, once to parse and once to [gs]et_module_path, and maybe more than once for that purpose. Some of the following breaks are redundant, but not all of them are.

I set breakpoints on every one of these lines:

└─(11:13:31 on main ✹ ✭)──> find . "(" -iname "*.cpp" -o -name "*.h" ")" -print0 | xargs -0 -P 0 grep -n read_file                       ──(Sun,Feb04)─┘
./src/lpython/semantics/python_ast_to_asr.cpp:74:    bool status = read_file(file_path, input);
./src/lpython/semantics/python_ast_to_asr.cpp:81:            status = read_file(file_path, input);
./src/lpython/semantics/python_ast_to_asr.cpp:90:            status = read_file(file_path, input);
./src/lpython/semantics/python_ast_to_asr.cpp:129:        std::string input = read_file(infile);
./src/lpython/parser/parser.cpp:125:    std::string input = read_file(infile);
./src/bin/lpython.cpp:90:    std::string input = LCompilers::read_file(infile);
./src/bin/lpython.cpp:103:        std::string input = LCompilers::read_file(infile);
./src/bin/lpython.cpp:140:            std::string input = LCompilers::read_file(infile);
./src/bin/lpython.cpp:161:            std::string input = LCompilers::read_file(infile);
./src/bin/lpython.cpp:172:            std::string input = LCompilers::read_file(infile);
./src/bin/lpython.cpp:197:        std::string input = LCompilers::read_file(infile);
./src/bin/lpython.cpp:250:        std::string input = LCompilers::read_file(infile);
./src/bin/lpython.cpp:295:        std::string input = LCompilers::read_file(infile);
./src/bin/lpython.cpp:346:        std::string input = LCompilers::read_file(infile);
./src/bin/lpython.cpp:399:        std::string input = LCompilers::read_file(infile);
./src/bin/lpython.cpp:443:    std::string input = LCompilers::read_file(infile);
./src/bin/lpython.cpp:491:            std::string input = LCompilers::read_file(infile);
./src/bin/lpython.cpp:592:            std::string input = LCompilers::read_file(infile);
./src/bin/lpython.cpp:707:        std::string input = LCompilers::read_file(infile);
./src/bin/lpython.cpp:762:        std::string input = LCompilers::read_file(infile);
./src/bin/lpython.cpp:872:        std::string input = LCompilers::read_file(infile);
./src/bin/lpython.cpp:945:        std::string input = LCompilers::read_file(infile);
./src/bin/lpython.cpp:1019:        std::string input = LCompilers::read_file(infile);
./src/bin/lpython.cpp:1296://     std::string input = read_file(infile);
./src/libasr/utils.h:105:bool read_file(const std::string &filename, std::string &text);
./src/libasr/string_utils.h:29:std::string read_file(const std::string &filename);
./src/libasr/utils2.cpp:31:bool read_file(const std::string &filename, std::string &text)
./src/libasr/diagnostics.cpp:126:    read_file(s.filename, input);
./src/libasr/codegen/asr_to_llvm.cpp:368:        std::string input = read_file(infile);
./src/libasr/string_utils.cpp:106:std::string read_file(const std::string &filename)
./src/libasr/asr_utils.cpp:455:        if (read_file(full_path.string(), modfile)) {

then I ran lpython --show-asr ./tests/expr8.py. The debugger stopped several times on read_file.

image

image

image

image

image

image

image

image

image

image

image

@certik
Copy link
Contributor

certik commented Feb 5, 2024

If so, we need to fix it.

@rebcabin rebcabin added bug Something isn't working severe not blocker, but causes undesirable workarounds enhancement New feature or request good first issue Good for newcomers Parser Issues or improvements related to parser and removed bug Something isn't working severe not blocker, but causes undesirable workarounds labels Feb 7, 2024
@rebcabin
Copy link
Contributor Author

rebcabin commented Feb 7, 2024

marking enhancement and good first issue. It likely means some architectural "re-plumbing" of the front-end. Some file metadata or even the whole file itself will have to be cached somewhere.

@syheliel
Copy link
Contributor

I write a gdb script to have a dig into the problem. It will record the call stack when read_file is called and print them into log file (gdb.txt by default). You can use gdb -x ./read_file_count.gdb to use this script:

# read_file_count.gdb
set logging enabled on
set pagination off

file ./src/bin/lpython
set args --show-asr ./tests/expr8.py
set $count=0
break LCompilers::read_file
commands
  silent
  set $count = $count + 1
  printf "Called function: %s\n", "your_function_name"
  printf "Here is the function input args: "
  info args
  printf "function call location: "
  where 1
  list
  cont
  end
r
set logging enabled off

In logging file, The read_file is indeed called two times:

Called function: your_function_name
Here is the function input args: filename = "./tests/expr8.py"
function call location: #0  LCompilers::read_file (filename="./tests/expr8.py") at /home/zzh/lpython/src/libasr/string_utils.cpp:109
104     }
105
106     std::string read_file(const std::string &filename)
107     {
108         std::ifstream ifs(filename.c_str(), std::ios::in | std::ios::binary
109                 | std::ios::ate);
110
111         std::ifstream::pos_type filesize = ifs.tellg();
112         if (filesize < 0) return std::string();
113
Called function: your_function_name
Here is the function input args: filename = "./tests/expr8.py"
function call location: #0  LCompilers::read_file (filename="./tests/expr8.py") at /home/zzh/lpython/src/libasr/string_utils.cpp:109
104     }
105
106     std::string read_file(const std::string &filename)
107     {
108         std::ifstream ifs(filename.c_str(), std::ios::in | std::ios::binary
109                 | std::ios::ate);
110
111         std::ifstream::pos_type filesize = ifs.tellg();
112         if (filesize < 0) return std::string();
113

Maybe we can cache the read_file's result in a unordered_map<string,string>, and return the file content if the filename has been cached?

@certik
Copy link
Contributor

certik commented Feb 17, 2024

Maybe we can cache the read_file's result in a unordered_map<string,string>, and return the file content if the filename has been cached?

Let's refactor the frontend to only call the read function once. I wonder what is going on.

@syheliel
Copy link
Contributor

Maybe we can cache the read_file's result in a unordered_map<string,string>, and return the file content if the filename has been cached?

Let's refactor the frontend to only call the read function once. I wonder what is going on.

Ok, let me explore the frontend's execution flow and try to figure out the root cause

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers Parser Issues or improvements related to parser
Projects
None yet
Development

No branches or pull requests

3 participants