Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Couldn't inference with gpu #24

Closed
linggong2023 opened this issue Jul 11, 2023 · 2 comments
Closed

Couldn't inference with gpu #24

linggong2023 opened this issue Jul 11, 2023 · 2 comments

Comments

@linggong2023
Copy link

When I load the model and perform inference using the Hugging Face framework, I noticed that although the model is loaded into GPU memory, the GPU usage remains at 0% while the CPU usage is at 100%. Here is the code:
def load_openchat_model(model_path:str,device_map):
model = LlamaForCausalLM.from_pretrained(
model_path,
load_in_8bit=False,
torch_dtype=torch.float16,
low_cpu_mem_usage=True,
)
model.to("cuda:0")
model.eval()
return model

inference code:
def infer_hf(input_text:str,model,tokenizer,device):
generation_config = dict(
temperature=0.8,
top_k=40,
top_p=0.9,
do_sample=True,
num_beams=1,
repetition_penalty=1.3,
max_new_tokens=400,
eos_token_id=tokenizer.eos_token_id,
pad_token_id=tokenizer.pad_token_id,
)
with torch.inference_mode():
input_ids = tokenizer(input_text, return_tensors="pt")
generation_output = model.generate(
input_ids=input_ids["input_ids"].to(device),
attention_mask=input_ids['attention_mask'].to(device),
**generation_config
)
s = generation_output[0]
output = tokenizer.decode(s)
print(output)
I set device to "cuda:0"

@imoneoi
Copy link
Owner

imoneoi commented Jul 21, 2023

Have you solved the issue? By the way, you can use the provided inference server.

@imoneoi
Copy link
Owner

imoneoi commented Aug 1, 2023

Closing this issue now. If you have further questions, plz re-open.

@imoneoi imoneoi closed this as completed Aug 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants