Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KeyError on like 212 in engine.py on request. #226

Closed
benbot opened this issue Sep 20, 2023 · 16 comments · Fixed by #248 or #442
Closed

KeyError on like 212 in engine.py on request. #226

benbot opened this issue Sep 20, 2023 · 16 comments · Fixed by #248 or #442
Labels
bug Something isn't working

Comments

@benbot
Copy link

benbot commented Sep 20, 2023

Just installed it on my work laptop (running macos)

Server usually crashes once I make a request on like 212 in engine.py complaining about a KeyError on one of the files.

I had it working one time on my 3rd try starting the server. Not sure I did anything different though

I can't post the log here unfortunately :(

@kantord
Copy link
Owner

kantord commented Sep 20, 2023

Can you please help me reproduce this error by sharing a little bit more information.

Also I'm curious if it only crashed on one specific repository, or if it crashes for everything

@BreakTheBeta
Copy link

I'm getting the same issue.

M2 Max
pipx installed
seagoat, version 0.28.0
Tinygrad/tinygrad repo.

Traceback (most recent call last):
  File "/opt/homebrew/Cellar/python@3.11/3.11.5/Frameworks/Python.framework/Versions/3.11/lib/python3.11/threading.py", line 1038, in _bootstrap_inner
    self.run()
  File "/opt/homebrew/Cellar/python@3.11/3.11.5/Frameworks/Python.framework/Versions/3.11/lib/python3.11/threading.py", line 975, in run
    self._target(*self._args, **self._kwargs)
  File "/Users/user/.local/pipx/venvs/seagoat/lib/python3.11/site-packages/seagoat/queue/base_queue.py", line 79, in _worker_function
    self._handle_task(context, task)
  File "/Users/user/.local/pipx/venvs/seagoat/lib/python3.11/site-packages/seagoat/queue/base_queue.py", line 66, in _handle_task
    result = handler(context, *task.args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/user/.local/pipx/venvs/seagoat/lib/python3.11/site-packages/seagoat/queue/task_queue.py", line 83, in handle_query
    results = context["seagoat_engine"].get_results(kwargs["limit_clue"])
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/user/.local/pipx/venvs/seagoat/lib/python3.11/site-packages/seagoat/engine.py", line 209, in get_results
    sorted(
  File "/Users/user/.local/pipx/venvs/seagoat/lib/python3.11/site-packages/seagoat/engine.py", line 213, in <lambda>
    + 0.3 * normalize_file_position(top_files[x.path])
                                    ~~~~~~~~~^^^^^^^^
KeyError: 'state.py'

Printing out the top__files:
{'tensor.py': -0.6955769938501167}

@kantord
Copy link
Owner

kantord commented Sep 20, 2023

I'm getting the same issue.

M2 Max pipx installed seagoat, version 0.28.0 Tinygrad/tinygrad repo.

Traceback (most recent call last):
  File "/opt/homebrew/Cellar/python@3.11/3.11.5/Frameworks/Python.framework/Versions/3.11/lib/python3.11/threading.py", line 1038, in _bootstrap_inner
    self.run()
  File "/opt/homebrew/Cellar/python@3.11/3.11.5/Frameworks/Python.framework/Versions/3.11/lib/python3.11/threading.py", line 975, in run
    self._target(*self._args, **self._kwargs)
  File "/Users/user/.local/pipx/venvs/seagoat/lib/python3.11/site-packages/seagoat/queue/base_queue.py", line 79, in _worker_function
    self._handle_task(context, task)
  File "/Users/user/.local/pipx/venvs/seagoat/lib/python3.11/site-packages/seagoat/queue/base_queue.py", line 66, in _handle_task
    result = handler(context, *task.args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/user/.local/pipx/venvs/seagoat/lib/python3.11/site-packages/seagoat/queue/task_queue.py", line 83, in handle_query
    results = context["seagoat_engine"].get_results(kwargs["limit_clue"])
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/user/.local/pipx/venvs/seagoat/lib/python3.11/site-packages/seagoat/engine.py", line 209, in get_results
    sorted(
  File "/Users/user/.local/pipx/venvs/seagoat/lib/python3.11/site-packages/seagoat/engine.py", line 213, in <lambda>
    + 0.3 * normalize_file_position(top_files[x.path])
                                    ~~~~~~~~~^^^^^^^^
KeyError: 'state.py'

Printing out the top__files: {'tensor.py': -0.6955769938501167}

regarding this, just out of curiosity, is the file state.py gitignored? Or perhaps it's a new file that has not been committed yet?

Just trying to figure out why it would not be included in top_files as that is generated based on git history

@BreakTheBeta
Copy link

BreakTheBeta commented Sep 20, 2023

I'm getting the same issue.
M2 Max pipx installed seagoat, version 0.28.0 Tinygrad/tinygrad repo.

Traceback (most recent call last):
  File "/opt/homebrew/Cellar/python@3.11/3.11.5/Frameworks/Python.framework/Versions/3.11/lib/python3.11/threading.py", line 1038, in _bootstrap_inner
    self.run()
  File "/opt/homebrew/Cellar/python@3.11/3.11.5/Frameworks/Python.framework/Versions/3.11/lib/python3.11/threading.py", line 975, in run
    self._target(*self._args, **self._kwargs)
  File "/Users/user/.local/pipx/venvs/seagoat/lib/python3.11/site-packages/seagoat/queue/base_queue.py", line 79, in _worker_function
    self._handle_task(context, task)
  File "/Users/user/.local/pipx/venvs/seagoat/lib/python3.11/site-packages/seagoat/queue/base_queue.py", line 66, in _handle_task
    result = handler(context, *task.args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/user/.local/pipx/venvs/seagoat/lib/python3.11/site-packages/seagoat/queue/task_queue.py", line 83, in handle_query
    results = context["seagoat_engine"].get_results(kwargs["limit_clue"])
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/user/.local/pipx/venvs/seagoat/lib/python3.11/site-packages/seagoat/engine.py", line 209, in get_results
    sorted(
  File "/Users/user/.local/pipx/venvs/seagoat/lib/python3.11/site-packages/seagoat/engine.py", line 213, in <lambda>
    + 0.3 * normalize_file_position(top_files[x.path])
                                    ~~~~~~~~~^^^^^^^^
KeyError: 'state.py'

Printing out the top__files: {'tensor.py': -0.6955769938501167}

regarding this, just out of curiosity, is the file state.py gitignored? Or perhaps it's a new file that has not been committed yet?

Just trying to figure out why it would not be included in top_files as that is generated based on git history

Repo I'm using: https://github.com/tinygrad/tinygrad

Running server in ..../tinygrad folder

The state.py is not gitignored

@benbot
Copy link
Author

benbot commented Sep 20, 2023

Just had the crash happen again in https://github.com/Oneirocom/Magick/

This time the server wasn't finished processing all the chunks (60K) but this was the same error on the other project which was finished processing everything.

Magick is a large js project and the other was a medium sized java project.

Also this time i'm on Arch Linux. So this is happening at least on Arch and macos.

  File "/usr/lib/python3.11/threading.py", line 1038, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.11/threading.py", line 975, in run
    self._target(*self._args, **self._kwargs)
  File "/home/benbot/.local/pipx/venvs/seagoat/lib/python3.11/site-packages/seagoat/queue/base_queue.py", line 79, in _worker_function
    self._handle_task(context, task)
  File "/home/benbot/.local/pipx/venvs/seagoat/lib/python3.11/site-packages/seagoat/queue/base_queue.py", line 66, in _handle_task
    result = handler(context, *task.args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/benbot/.local/pipx/venvs/seagoat/lib/python3.11/site-packages/seagoat/queue/task_queue.py", line 83, in handle_query
    results = context["seagoat_engine"].get_results(kwargs["limit_clue"])
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/benbot/.local/pipx/venvs/seagoat/lib/python3.11/site-packages/seagoat/engine.py", line 208, in get_results
    sorted(
  File "/home/benbot/.local/pipx/venvs/seagoat/lib/python3.11/site-packages/seagoat/engine.py", line 212, in <lambda>
    + 0.3 * normalize_file_position(top_files[x.path])
                                    ~~~~~~~~~^^^^^^^^
KeyError: 'packages/@types/rete-connection-reroute-plugin.d.ts'

@benbot
Copy link
Author

benbot commented Sep 20, 2023

that file isn't in the .gitignore either

@janeshchhabra
Copy link

Hitting this on mac as well on a file which is in not in gitignore.

I am doing it one level into the folder, not from root, so there is that.

@kantord kantord added the bug Something isn't working label Sep 22, 2023
@yubrshen
Copy link

yubrshen commented Sep 22, 2023

I might also got the KeyError, here is the trace:

Analyzing source code: 0it [00:00, ?it/s] 2023-09-22 08:57:07,014 Analyzed the minimum number of chunks needed to operate. 2023-09-22 08:57:07,014 Analyzed all chunks! 2023-09-22 08:57:07,014 Handling task: query /home/yshen/.cache/chroma/onnx_models/all-MiniLM-L6-v2/onnx.tar.gz: 100%|██████████████████████████████████| 79.3M/79.3M [00:07<00:00, 10.6MiB/s] Exception in thread Thread-1 (_worker_function): Traceback (most recent call last): File "/home/yshen/miniconda3/envs/seagoat-python311/lib/python3.11/threading.py", line 1038, in _bootstrap_inner self.run() File "/home/yshen/miniconda3/envs/seagoat-python311/lib/python3.11/threading.py", line 975, in run self._target(*self._args, **self._kwargs) File "/home/yshen/.local/pipx/venvs/seagoat/lib/python3.11/site-packages/seagoat/queue/base_queue.py", line 79, in _worker_function self._handle_task(context, task) File "/home/yshen/.local/pipx/venvs/seagoat/lib/python3.11/site-packages/seagoat/queue/base_queue.py", line 66, in _handle_task result = handler(context, *task.args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yshen/.local/pipx/venvs/seagoat/lib/python3.11/site-packages/seagoat/queue/task_queue.py", line 83, in handle_query results = context["seagoat_engine"].get_results(kwargs["limit_clue"]) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yshen/.local/pipx/venvs/seagoat/lib/python3.11/site-packages/seagoat/engine.py", line 208, in get_results sorted( File "/home/yshen/.local/pipx/venvs/seagoat/lib/python3.11/site-packages/seagoat/engine.py", line 212, in <lambda> + 0.3 * normalize_file_position(top_files[x.path]) i ~~~~~~~~~^^^^^^^^ KeyError: 'src/script/junk-yard.py'

The file as the key for the KeyError is indeed a file in code base.

I just started SeaGOAT a minutes before, then I type:
> gt "sourcetypes"
and got the above error and trace.

I might need to wait for longer time, even after the server finish scanning the code base?

I'm running in Ubuntu 24.4, in WSL2/Window 11. The files complained of KeyError is not tracked by git.
but in the same repo, the same error also happended with a file tracked by git, not ignored:

  File "/home/yshen/.local/pipx/venvs/seagoat/lib/python3.11/site-packages/seagoat/engine.py", line 212, in <lambda>
    + 0.3 * normalize_file_position(top_files[x.path])
                                    ~~~~~~~~~^^^^^^^^

I'll try a different repo.

@kantord
Copy link
Owner

kantord commented Sep 22, 2023

I might need to wait for longer time, even after the server finish scanning the code base?

No, that should not be necessary at all!

@yubrshen
Copy link

What is the expectation to a repo to be working with gt?

  • Must be a git repository
  • All files must be checked-in, not ignored, all committed?

@kantord
Copy link
Owner

kantord commented Sep 22, 2023

What is the expectation to a repo to be working with gt?

  • Must be a git repository
  • All files must be checked-in, not ignored, all committed?

It just needs to be a git repository. Even if there are no files that are actually committed, it should still work. Actually by design it even works with files that you have just recently created.

@kantord
Copy link
Owner

kantord commented Sep 22, 2023

I have a suspicion that

KeyError on like 212 in engine.py on request.

has to do with 2 competing versions of the file existing somehow, or maybe the file no longer has the line that it was last analyzed with. I think that this would be solved by grouping the results by SHA1 hash and using git to retrieve the correct version of the file

I suspect this is a different error, I have only one theory for it which is maybe a result appears through ripgrep, but it is not anywhere in git history. Maybe there is a bug that files that have not been committed yet are not included in top_files, but that would only be possible if the file is not in any previous commit 🤔

    + 0.3 * normalize_file_position(top_files[x.path])
                                    ~~~~~~~~~^^^^^^^^
KeyError: 'packages/@types/rete-connection-reroute-plugin.d.ts'

@elephanter
Copy link

find out that error is because of x.path is lowercase but key inside top_files has uppercase symbol. I think that goes from repository class, where processed commit on files, that line

if not (self.path / filename).exists():
    continue

Perhaps I renamed that file from uppercase.
I'm not checked, but people say that .exists() on mac works case insensitive. So I get method from here and replace .exists()
https://stackoverflow.com/questions/6710511/case-sensitive-path-comparison-in-python

Now I got same error. but that file with uppercase is not in the top_files hash anymore, but current lowercase file not in there too, but it is in results and failing here again.

@elephanter
Copy link

temporarily fixed that error with changing to

return list(
                sorted(
                    results_to_sort,
                    key=lambda x: (
                        0.7 * normalize_score(x.get_best_score(self.query_string))
                        + 0.3 * normalize_file_position(top_files.get(Path(x.path).as_posix(), 0))
                    ),
                )

@kantord
Copy link
Owner

kantord commented Sep 24, 2023

find out that error is because of x.path is lowercase but key inside top_files has uppercase symbol. I think that goes from repository class, where processed commit on files, that line

if not (self.path / filename).exists():
    continue

Perhaps I renamed that file from uppercase. I'm not checked, but people say that .exists() on mac works case insensitive. So I get method from here and replace .exists() https://stackoverflow.com/questions/6710511/case-sensitive-path-comparison-in-python

Now I got same error. but that file with uppercase is not in the top_files hash anymore, but current lowercase file not in there too, but it is in results and failing here again.

I noticed that one way this error can happen is if a file is found the ripgrep before the repo was analyzed. This can happen if you create a file while the server is analyzing files, and then make a query before all files are analyzed. That is because the server is not looking for more files to analyze while there are still files in the queue.

But I'm curious if the same error can happen in other circumstances as well 🤔

kantord added a commit that referenced this issue Sep 24, 2023
kantord added a commit that referenced this issue Sep 24, 2023
@kantord
Copy link
Owner

kantord commented Sep 24, 2023

Reopening because only the error regarding files not being found was fixed, the error regarding lines not being found probably still persists

kantord added a commit that referenced this issue Dec 8, 2023
kantord added a commit that referenced this issue Dec 8, 2023
kantord added a commit that referenced this issue Dec 9, 2023
kantord added a commit that referenced this issue Dec 9, 2023
kantord added a commit that referenced this issue Dec 9, 2023
kantord added a commit that referenced this issue Dec 9, 2023
kantord added a commit that referenced this issue Dec 9, 2023
kantord added a commit that referenced this issue Dec 9, 2023
kantord added a commit that referenced this issue Dec 9, 2023
kantord added a commit that referenced this issue Dec 9, 2023
kantord added a commit that referenced this issue Dec 9, 2023
kantord added a commit that referenced this issue Dec 9, 2023
kantord added a commit that referenced this issue Dec 9, 2023
kantord added a commit that referenced this issue Dec 9, 2023
kantord added a commit that referenced this issue Dec 9, 2023
kantord added a commit that referenced this issue Dec 10, 2023
kantord added a commit that referenced this issue Dec 10, 2023
kantord added a commit that referenced this issue Dec 10, 2023
kantord added a commit that referenced this issue Dec 11, 2023
kantord added a commit that referenced this issue Dec 22, 2023
kantord added a commit that referenced this issue Dec 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
6 participants