-
Notifications
You must be signed in to change notification settings - Fork 298
Closed
Description
RuntimeError: Native API failed. Native API returns: 39 (UR_RESULT_ERROR_OUT_OF_DEVICE_MEMORY)
When monitoring the resource manager, I didn’t notice significant GPU memory usage. Could this be caused by other issues? Here’s my code:
import torch
import torch.nn as nn
from torch.utils.data import DataLoader, Dataset
import numpy as np
# ---------------------------
# Transformer 模型
# ---------------------------
class TimeSeriesTransformer(nn.Module):
def __init__(self, input_dim, seq_len, d_model=64, nhead=4, num_layers=2, dim_feedforward=128, dropout=0.1):
super().__init__()
self.seq_len = seq_len
self.input_dim = input_dim
# 输入映射到 d_model
self.input_projection = nn.Linear(input_dim, d_model)
# Positional Encoding
self.pos_embedding = nn.Parameter(torch.randn(seq_len, d_model))
# Transformer Encoder
encoder_layer = nn.TransformerEncoderLayer(
d_model=d_model,
nhead=nhead,
dim_feedforward=dim_feedforward,
dropout=dropout,
batch_first=True # batch_first=True方便输入(batch, seq_len, d_model)
)
self.transformer_encoder = nn.TransformerEncoder(encoder_layer, num_layers=num_layers)
# 输出映射到预测维度
self.output_layer = nn.Linear(d_model, input_dim)
def forward(self, x):
# x: (batch, seq_len, input_dim)
x = self.input_projection(x) # (batch, seq_len, d_model)
x = x + self.pos_embedding.unsqueeze(0) # 加上位置编码
x = self.transformer_encoder(x) # (batch, seq_len, d_model)
x = x[:, -1, :] # 取序列最后一个时间步
out = self.output_layer(x) # (batch, input_dim)
return out
device = torch.device(config["device"])
print(device)
# ---------------------------
# 模型、损失、优化器
# ---------------------------
model = TimeSeriesTransformer(input_dim=18, seq_len=50).to(device)
criterion = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=config["learning_rate"])
Just initializing the model caused the system to throw the aforementioned error.
Metadata
Metadata
Assignees
Labels
No labels