Can't run examples on Windows 10 #55

mhamra · 2023-08-28T15:42:24Z

Hi,
I've tried to run the examples, but I received this error.

(CodeLlama) PS C:\Users\marce\OneDrive\mah-docs\CodeLlama\codellama> python -m torch.distributed.run --nproc_per_node 1 example_infilling.py --ckpt_dir CodeLlama-7b-Python --tokenizer_path ./CodeLlama-7b-Python/tokenizer.model
NOTE: Redirects are currently not supported in Windows or MacOs.
[W C:\cb\pytorch_1000000000000\work\torch\csrc\distributed\c10d\socket.cpp:601] [c10d] The client socket has failed to connect to [kubernetes.docker.internal]:29500 (system error: 10049 - unknown error).
[W C:\cb\pytorch_1000000000000\work\torch\csrc\distributed\c10d\socket.cpp:601] [c10d] The client socket has failed to connect to [kubernetes.docker.internal]:29500 (system error: 10049 - unknown error).
[W C:\cb\pytorch_1000000000000\work\torch\csrc\distributed\c10d\socket.cpp:601] [c10d] The client socket has failed to connect to [kubernetes.docker.internal]:29500 (system error: 10049 - unknown error).
[W C:\cb\pytorch_1000000000000\work\torch\csrc\distributed\c10d\socket.cpp:601] [c10d] The client socket has failed to connect to [kubernetes.docker.internal]:29500 (system error: 10049 - unknown error).
> initializing model parallel with size 1
> initializing ddp with size 1
> initializing pipeline with size 1
Traceback (most recent call last):
  File "C:\Users\marce\OneDrive\mah-docs\CodeLlama\codellama\example_infilling.py", line 79, in <module>
    fire.Fire(main)
  File "C:\ProgramData\anaconda3\envs\CodeLlama\Lib\site-packages\fire\core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\CodeLlama\Lib\site-packages\fire\core.py", line 475, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
                                ^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\CodeLlama\Lib\site-packages\fire\core.py", line 691, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
                ^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\marce\OneDrive\mah-docs\CodeLlama\codellama\example_infilling.py", line 18, in main
    generator = Llama.build(
                ^^^^^^^^^^^^
  File "C:\Users\marce\OneDrive\mah-docs\CodeLlama\codellama\llama\generation.py", line 90, in build
    checkpoint = torch.load(ckpt_path, map_location="cpu")
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\CodeLlama\Lib\site-packages\torch\serialization.py", line 815, in load
    return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\CodeLlama\Lib\site-packages\torch\serialization.py", line 1033, in _legacy_load
    magic_number = pickle_module.load(f, **pickle_load_args)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
_pickle.UnpicklingError: invalid load key, '<'.
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 18284) of binary: C:\ProgramData\anaconda3\envs\CodeLlama\python.exe
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "C:\ProgramData\anaconda3\envs\CodeLlama\Lib\site-packages\torch\distributed\run.py", line 798, in <module>
    main()
  File "C:\ProgramData\anaconda3\envs\CodeLlama\Lib\site-packages\torch\distributed\elastic\multiprocessing\errors\__init__.py", line 346, in wrapper
    return f(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\CodeLlama\Lib\site-packages\torch\distributed\run.py", line 794, in main
    run(args)
  File "C:\ProgramData\anaconda3\envs\CodeLlama\Lib\site-packages\torch\distributed\run.py", line 785, in run
    elastic_launch(
  File "C:\ProgramData\anaconda3\envs\CodeLlama\Lib\site-packages\torch\distributed\launcher\api.py", line 134, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\CodeLlama\Lib\site-packages\torch\distributed\launcher\api.py", line 250, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
example_infilling.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2023-08-28_12:39:51
  host      : DESKTOP-THP4I5R
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 18284)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs

The text was updated successfully, but these errors were encountered:

mhamra · 2023-08-30T04:35:49Z

UPDATE

I've made a mistake running the download.sh script. I've passed my email instead of the URL received from FB.

manoj21192 · 2023-08-31T16:08:26Z

Did your issue resolved? I am unable to run on windows 10 as well. I am getting "Distributed package doesnt have NCCL built-in error"

realhaik · 2023-09-12T03:24:36Z

@manoj21192 This will work on windows


temperature  = 0
top_p  = 0
max_seq_len  = 4096
max_batch_size  = 1
max_gen_len  = None
num_of_worlds = 1

torch.distributed.init_process_group(backend='gloo', init_method='tcp://localhost:23455', world_size=num_of_worlds, rank=0)


generator = Llama.build(
    
    ckpt_dir="C:/AI/LLaMA2_Docker_FileSystem/codellama/CodeLlama-7b-Instruct",
    tokenizer_path="C:/AI/LLaMA2_Docker_FileSystem/codellama/CodeLlama-7b-Instruct/tokenizer.model",
    max_seq_len=max_seq_len,
    max_batch_size=max_batch_size,
    model_parallel_size = num_of_worlds
)

99991 · 2023-09-18T06:57:40Z

UPDATE

I've made a mistake running the download.sh script. I've passed my email instead of the URL received from FB.

Thank you! I can reproduce this. I at first entered my email and then noticed my error and entered the correct URL when running download.sh, but loading was still not possible.

I cloned the repository again, entered the correct URL on first try and then it worked.

bronzwikgk · 2023-10-01T11:37:59Z

What mistake am I making here?
from typing import Optional

import fire

from llama import Llama

def main(
ckpt_dir: "D:\pathto\codellama\CodeLlama-7b",
tokenizer_path: "D:\pathto\codellama\CodeLlama-7b\tokenizer.model",
temperature: float = 0.2,
top_p: float = 0.9,
max_seq_len: int = 256,
max_batch_size: int = 4,
max_gen_len: Optional[int] = None,
):
generator = Llama.build(
ckpt_dir=ckpt_dir,
tokenizer_path=tokenizer_path,
max_seq_len=max_seq_len,
max_batch_size=max_batch_size,
)
"

I Am getting this error: "

D:\path2\codellama>python example_completion.py
ERROR: The function received no value for the required argument: ckpt_dir
Usage: example_completion.py CKPT_DIR TOKENIZER_PATH
optional flags: --temperature | --top_p | --max_seq_len |
--max_batch_size | --max_gen_len

For detailed information on this command, run:
example_completion.py --help "

realhaik · 2023-10-01T14:10:42Z

What mistake am I making here? from typing import Optional

import fire

from llama import Llama

def main( ckpt_dir: "D:\pathto\codellama\CodeLlama-7b", tokenizer_path: "D:\pathto\codellama\CodeLlama-7b\tokenizer.model", temperature: float = 0.2, top_p: float = 0.9, max_seq_len: int = 256, max_batch_size: int = 4, max_gen_len: Optional[int] = None, ): generator = Llama.build( ckpt_dir=ckpt_dir, tokenizer_path=tokenizer_path, max_seq_len=max_seq_len, max_batch_size=max_batch_size, ) "

I Am getting this error: "

D:\path2\codellama>python example_completion.py ERROR: The function received no value for the required argument: ckpt_dir Usage: example_completion.py CKPT_DIR TOKENIZER_PATH optional flags: --temperature | --top_p | --max_seq_len | --max_batch_size | --max_gen_len

For detailed information on this command, run: example_completion.py --help "

@bronzwikgk

Based on the code and error message you've provided, here are some issues I've identified:

The type hints in the function arguments are actually string literals, which is incorrect syntax for Python.
The paths should be properly escaped or defined as raw strings.

Here's a revised version of the code:

from typing import Optional
import fire
from llama import Llama

def main(
    ckpt_dir: str = r"D:\pathto\codellama\CodeLlama-7b",
    tokenizer_path: str = r"D:\pathto\codellama\CodeLlama-7b\tokenizer.model",
    temperature: float = 0.2,
    top_p: float = 0.9,
    max_seq_len: int = 256,
    max_batch_size: int = 4,
    max_gen_len: Optional[int] = None,
):
    generator = Llama.build(
        ckpt_dir=ckpt_dir,
        tokenizer_path=tokenizer_path,
        max_seq_len=max_seq_len,
        max_batch_size=max_batch_size,
    )
    
if __name__ == "__main__":
    fire.Fire(main)

Fixed the type hints for ckpt_dir and tokenizer_path to be str.
Used raw string literals for the Windows paths (by prefixing the string with an r), which allow for backslashes to be interpreted correctly.
Added if __name__ == "__main__": fire.Fire(main) to run the function when the script is executed.

Try running the updated code and see if the error persists.

bronzwikgk · 2023-10-01T14:44:55Z

Thanks, Moved One step ahead.
Getting this error now: {{
Traceback (most recent call last):
File "D:\shunyadotek\codellama\example_completion.py", line 55, in
fire.Fire(main)
File "C:\Users\shunya-desk-01\AppData\Roaming\Python\Python311\site-packages\fire\core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\shunya-desk-01\AppData\Roaming\Python\Python311\site-packages\fire\core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
^^^^^^^^^^^^^^^^^^^^
File "C:\Users\shunya-desk-01\AppData\Roaming\Python\Python311\site-packages\fire\core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^
File "D:\shunyadotek\codellama\example_completion.py", line 20, in main
generator = Llama.build(
^^^^^^^^^^^^
File "D:\shunyadotek\codellama\llama\generation.py", line 68, in build
torch.distributed.init_process_group("nccl")
File "C:\Users\shunya-desk-01\AppData\Roaming\Python\Python311\site-packages\torch\distributed\distributed_c10d.py", line 900, in init_process_group
store, rank, world_size = next(rendezvous_iterator)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\shunya-desk-01\AppData\Roaming\Python\Python311\site-packages\torch\distributed\rendezvous.py", line 235, in _env_rendezvous_handler
rank = int(_get_env_or_raise("RANK"))
^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\shunya-desk-01\AppData\Roaming\Python\Python311\site-packages\torch\distributed\rendezvous.py", line 220, in _get_env_or_raise
raise _env_error(env_var)
ValueError: Error initializing torch.distributed using env:// rendezvous: environment variable RANK expected, but not set
}}

realhaik · 2023-10-01T15:30:38Z

torch.distributed.init_process_group(backend='gloo', init_method='tcp://localhost:23455', world_size=num_of_worlds, rank=0)

@bronzwikgk
I don't see this line in your code : torch.distributed.init_process_group(backend='gloo', init_method='tcp://localhost:23455', world_size=num_of_worlds, rank=0)

Are you sure you have it in your code?
See my answer with the full code with this line, few answers above.

realhaik · 2023-10-01T15:31:52Z

@bronzwikgk Right, I see that you are using torch.distributed.init_process_group("nccl")
nccl is for linux only, use my example above.

mhamra changed the title ~~Can't run sample on windows 10~~ Can't run examples on windows 10 Aug 28, 2023

mhamra changed the title ~~Can't run examples on windows 10~~ Can't run examples on Windows 10 Aug 28, 2023

zouzhe1 mentioned this issue Aug 30, 2023

run win10 is error #65

Open

hijkw added the compability issues arising from specific hardware or system configs label Sep 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can't run examples on Windows 10 #55

Can't run examples on Windows 10 #55

mhamra commented Aug 28, 2023

mhamra commented Aug 30, 2023

manoj21192 commented Aug 31, 2023

realhaik commented Sep 12, 2023 •

edited

Loading

99991 commented Sep 18, 2023

UPDATE

bronzwikgk commented Oct 1, 2023

realhaik commented Oct 1, 2023

bronzwikgk commented Oct 1, 2023

realhaik commented Oct 1, 2023

realhaik commented Oct 1, 2023

Can't run examples on Windows 10 #55

Can't run examples on Windows 10 #55

Comments

mhamra commented Aug 28, 2023

mhamra commented Aug 30, 2023

UPDATE

manoj21192 commented Aug 31, 2023

realhaik commented Sep 12, 2023 • edited Loading

99991 commented Sep 18, 2023

UPDATE

bronzwikgk commented Oct 1, 2023

realhaik commented Oct 1, 2023

bronzwikgk commented Oct 1, 2023

realhaik commented Oct 1, 2023

realhaik commented Oct 1, 2023

realhaik commented Sep 12, 2023 •

edited

Loading