-
Notifications
You must be signed in to change notification settings - Fork 21.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[1/2] Intel GPU Runtime Upstreaming for Generator #118528
Closed
Closed
Changes from all commits
Commits
Show all changes
65 commits
Select commit
Hold shift + click to select a range
8115ce3
[1/2] Intel GPU Runtime Upstreaming for Generator
guangyey 1e7a78a
Update on "[WIP] [1/2] Intel GPU Runtime Upstreaming for Generator"
guangyey b79871d
Update on "[WIP] [1/2] Intel GPU Runtime Upstreaming for Generator"
guangyey cfa8d20
Update on "[WIP] [1/2] Intel GPU Runtime Upstreaming for Generator"
guangyey 6e04041
Update on "[WIP] [1/2] Intel GPU Runtime Upstreaming for Generator"
guangyey f326cac
Update on "[WIP] [1/2] Intel GPU Runtime Upstreaming for Generator"
guangyey 015afbc
Update on "[WIP] [1/2] Intel GPU Runtime Upstreaming for Generator"
guangyey b124234
Update on "[WIP] [1/2] Intel GPU Runtime Upstreaming for Generator"
guangyey b08c032
Update on "[WIP] [1/2] Intel GPU Runtime Upstreaming for Generator"
guangyey a86685f
Update on "[WIP] [1/2] Intel GPU Runtime Upstreaming for Generator"
guangyey 3436cab
Update on "[WIP] [1/2] Intel GPU Runtime Upstreaming for Generator"
guangyey a9eef30
Update on "[WIP] [1/2] Intel GPU Runtime Upstreaming for Generator"
guangyey 97d46c1
Update on "[WIP] [1/2] Intel GPU Runtime Upstreaming for Generator"
guangyey 9cfb90f
Update on "[WIP] [1/2] Intel GPU Runtime Upstreaming for Generator"
guangyey 6a980a6
Update on "[WIP] [1/2] Intel GPU Runtime Upstreaming for Generator"
guangyey 01195cc
Update on "[WIP] [1/2] Intel GPU Runtime Upstreaming for Generator"
guangyey dde8977
Update on "[WIP] [1/2] Intel GPU Runtime Upstreaming for Generator"
guangyey ffcc027
Update on "[WIP] [1/2] Intel GPU Runtime Upstreaming for Generator"
guangyey 9cf585d
Update on "[WIP] [1/2] Intel GPU Runtime Upstreaming for Generator"
guangyey 3fd1741
Update on "[1/2] Intel GPU Runtime Upstreaming for Generator"
guangyey 4d3e231
Update on "[1/2] Intel GPU Runtime Upstreaming for Generator"
guangyey 1c22b4b
Update on "[1/2] Intel GPU Runtime Upstreaming for Generator"
guangyey cbf0fd0
Update on "[1/2] Intel GPU Runtime Upstreaming for Generator"
guangyey eb7ce1b
Update on "[1/2] Intel GPU Runtime Upstreaming for Generator"
guangyey adc16bd
Update on "[1/2] Intel GPU Runtime Upstreaming for Generator"
guangyey 9244bd9
Update on "[1/2] Intel GPU Runtime Upstreaming for Generator"
guangyey 92c7960
Update on "[1/2] Intel GPU Runtime Upstreaming for Generator"
guangyey 1a22017
Update on "[1/2] Intel GPU Runtime Upstreaming for Generator"
guangyey 394810a
Update on "[1/2] Intel GPU Runtime Upstreaming for Generator"
guangyey 8425282
Update on "[1/2] Intel GPU Runtime Upstreaming for Generator"
guangyey 69be3b0
Update on "[1/2] Intel GPU Runtime Upstreaming for Generator"
guangyey e9918aa
Update on "[1/2] Intel GPU Runtime Upstreaming for Generator"
guangyey 7d166f9
Update on "[1/2] Intel GPU Runtime Upstreaming for Generator"
guangyey 3ec9249
Update on "[1/2] Intel GPU Runtime Upstreaming for Generator"
guangyey 164479b
Update on "[1/2] Intel GPU Runtime Upstreaming for Generator"
guangyey 94d76f2
Update on "[1/2] Intel GPU Runtime Upstreaming for Generator"
guangyey a718e2a
Update on "[1/2] Intel GPU Runtime Upstreaming for Generator"
guangyey 0d01458
Update on "[1/2] Intel GPU Runtime Upstreaming for Generator"
guangyey 2dde0b8
Update on "[1/2] Intel GPU Runtime Upstreaming for Generator"
guangyey 1f4b17a
Update on "[1/2] Intel GPU Runtime Upstreaming for Generator"
guangyey 4c96b2b
Update on "[1/2] Intel GPU Runtime Upstreaming for Generator"
guangyey ef37ac5
Update on "[1/2] Intel GPU Runtime Upstreaming for Generator"
guangyey 3c0b395
Update on "[1/2] Intel GPU Runtime Upstreaming for Generator"
guangyey 2e463d3
Update on "[1/2] Intel GPU Runtime Upstreaming for Generator"
guangyey 0e2d656
Update on "[1/2] Intel GPU Runtime Upstreaming for Generator"
guangyey a1cf537
Update on "[1/2] Intel GPU Runtime Upstreaming for Generator"
guangyey b100940
Update on "[1/2] Intel GPU Runtime Upstreaming for Generator"
guangyey feb8c86
Update on "[1/2] Intel GPU Runtime Upstreaming for Generator"
guangyey 10cc032
Update on "[1/2] Intel GPU Runtime Upstreaming for Generator"
guangyey b7b1a81
Update on "[1/2] Intel GPU Runtime Upstreaming for Generator"
guangyey f12e8ad
Update on "[1/2] Intel GPU Runtime Upstreaming for Generator"
guangyey c847086
Update on "[1/2] Intel GPU Runtime Upstreaming for Generator"
guangyey b5ddd52
Update on "[1/2] Intel GPU Runtime Upstreaming for Generator"
guangyey b6d21d9
Update on "[1/2] Intel GPU Runtime Upstreaming for Generator"
guangyey 859f118
Update on "[1/2] Intel GPU Runtime Upstreaming for Generator"
guangyey e9cef1a
Update on "[1/2] Intel GPU Runtime Upstreaming for Generator"
guangyey c015f4a
Update on "[1/2] Intel GPU Runtime Upstreaming for Generator"
guangyey ae25166
Update on "[1/2] Intel GPU Runtime Upstreaming for Generator"
guangyey 89790a2
Update on "[1/2] Intel GPU Runtime Upstreaming for Generator"
guangyey 954b19c
Update on "[1/2] Intel GPU Runtime Upstreaming for Generator"
guangyey 9cf7d35
Update on "[1/2] Intel GPU Runtime Upstreaming for Generator"
guangyey 2a20a29
Update on "[1/2] Intel GPU Runtime Upstreaming for Generator"
guangyey 75f6754
Update on "[1/2] Intel GPU Runtime Upstreaming for Generator"
guangyey b1c711b
Update on "[1/2] Intel GPU Runtime Upstreaming for Generator"
guangyey 03217cd
Update on "[1/2] Intel GPU Runtime Upstreaming for Generator"
guangyey File filter
Filter by extension
Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,82 @@ | ||
#include <gtest/gtest.h> | ||
|
||
#include <ATen/ATen.h> | ||
#include <ATen/xpu/XPUContext.h> | ||
#include <ATen/xpu/XPUGeneratorImpl.h> | ||
#include <ATen/core/PhiloxRNGEngine.h> | ||
|
||
#include <assert.h> | ||
#include <thread> | ||
|
||
TEST(XpuGeneratorTest, testGeneratorDynamicCast) { | ||
if (!at::xpu::is_available()) { | ||
return; | ||
} | ||
auto foo = at::xpu::detail::createXPUGenerator(); | ||
auto result = foo.get<at::XPUGeneratorImpl>(); | ||
EXPECT_EQ(typeid(at::XPUGeneratorImpl*).hash_code(), typeid(result).hash_code()); | ||
} | ||
|
||
TEST(XpuGeneratorTest, testDefaultGenerator) { | ||
if (!at::xpu::is_available()) { | ||
return; | ||
} | ||
auto foo = at::xpu::detail::getDefaultXPUGenerator(); | ||
auto bar = at::xpu::detail::getDefaultXPUGenerator(); | ||
EXPECT_EQ(foo, bar); | ||
|
||
auto offset = foo.get_offset() << 1; | ||
foo.set_offset(offset); | ||
EXPECT_EQ(foo.get_offset(), offset); | ||
|
||
if (c10::xpu::device_count() >= 2) { | ||
foo = at::xpu::detail::getDefaultXPUGenerator(0); | ||
bar = at::xpu::detail::getDefaultXPUGenerator(0); | ||
EXPECT_EQ(foo, bar); | ||
|
||
foo = at::xpu::detail::getDefaultXPUGenerator(0); | ||
bar = at::xpu::detail::getDefaultXPUGenerator(1); | ||
EXPECT_NE(foo, bar); | ||
} | ||
} | ||
|
||
TEST(XpuGeneratorTest, testCloning) { | ||
if (!at::xpu::is_available()) { | ||
return; | ||
} | ||
auto gen1 = at::xpu::detail::createXPUGenerator(); | ||
gen1.set_current_seed(123); // modify gen1 state | ||
auto xpu_gen1 = at::check_generator<at::XPUGeneratorImpl>(gen1); | ||
xpu_gen1->set_philox_offset_per_thread(4); | ||
auto gen2 = at::xpu::detail::createXPUGenerator(); | ||
gen2 = gen1.clone(); | ||
auto xpu_gen2 = at::check_generator<at::XPUGeneratorImpl>(gen2); | ||
EXPECT_EQ(gen1.current_seed(), gen2.current_seed()); | ||
EXPECT_EQ( | ||
xpu_gen1->philox_offset_per_thread(), | ||
xpu_gen2->philox_offset_per_thread() | ||
); | ||
} | ||
|
||
void thread_func_get_set_current_seed(at::Generator generator) { | ||
std::lock_guard<std::mutex> lock(generator.mutex()); | ||
auto current_seed = generator.current_seed(); | ||
current_seed++; | ||
generator.set_current_seed(current_seed); | ||
} | ||
|
||
TEST(XpuGeneratorTest, testMultithreadingGetSetCurrentSeed) { | ||
// See Note [Acquire lock when using random generators] | ||
if (!at::xpu::is_available()) { | ||
return; | ||
} | ||
auto gen1 = at::xpu::detail::getDefaultXPUGenerator(); | ||
auto initial_seed = gen1.current_seed(); | ||
std::thread t0{thread_func_get_set_current_seed, gen1}; | ||
std::thread t1{thread_func_get_set_current_seed, gen1}; | ||
std::thread t2{thread_func_get_set_current_seed, gen1}; | ||
t0.join(); | ||
t1.join(); | ||
t2.join(); | ||
EXPECT_EQ(gen1.current_seed(), initial_seed+3); | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,171 @@ | ||
#include <ATen/Utils.h> | ||
#include <ATen/xpu/XPUGeneratorImpl.h> | ||
#include <c10/core/StreamGuard.h> | ||
#include <c10/util/CallOnce.h> | ||
#include <c10/xpu/XPUFunctions.h> | ||
|
||
namespace at { | ||
namespace xpu::detail { | ||
namespace { | ||
|
||
/* | ||
* Currently, there is one generator pool containing XPU generator per device. | ||
* Each generator is lazily initialized the first time generator is | ||
* requested for a device. | ||
*/ | ||
c10::once_flag init_flag; | ||
DeviceIndex num_gpus = -1; | ||
std::deque<c10::once_flag> xpu_gens_init_flag; | ||
std::vector<Generator> default_gens_xpu; | ||
|
||
void initXPUGenVector() { | ||
num_gpus = device_count(); | ||
xpu_gens_init_flag.resize(num_gpus); | ||
default_gens_xpu.resize(num_gpus); | ||
} | ||
|
||
inline void check_device(DeviceIndex device) { | ||
TORCH_CHECK( | ||
device >= 0 && device < num_gpus, | ||
"device is out of range, device is ", | ||
static_cast<int16_t>(device), | ||
", total number of device is ", | ||
static_cast<int16_t>(num_gpus), | ||
"."); | ||
} | ||
|
||
} // anonymous namespace | ||
|
||
// Get the default generator with a random seed for a specific xpu device. | ||
const Generator& getDefaultXPUGenerator(DeviceIndex device) { | ||
c10::call_once(init_flag, initXPUGenVector); | ||
if (device == -1) { | ||
device = c10::xpu::current_device(); | ||
} | ||
check_device(device); | ||
c10::call_once(xpu_gens_init_flag[device], [&]() { | ||
default_gens_xpu[device] = make_generator<XPUGeneratorImpl>(device); | ||
default_gens_xpu[device].seed(); | ||
guangyey marked this conversation as resolved.
Show resolved
Hide resolved
|
||
}); | ||
return default_gens_xpu[device]; | ||
} | ||
|
||
// Create a generator with a fixed seed for a specific xpu device. | ||
Generator createXPUGenerator(DeviceIndex device) { | ||
c10::call_once(init_flag, initXPUGenVector); | ||
albanD marked this conversation as resolved.
Show resolved
Hide resolved
|
||
if (device == -1) { | ||
device = c10::xpu::current_device(); | ||
} | ||
check_device(device); | ||
auto gen = make_generator<XPUGeneratorImpl>(device); | ||
auto xpu_gen = check_generator<XPUGeneratorImpl>(gen); | ||
xpu_gen->set_current_seed(default_rng_seed_val); | ||
xpu_gen->set_philox_offset_per_thread(0); | ||
return gen; | ||
} | ||
|
||
} // namespace xpu::detail | ||
|
||
XPUGeneratorImpl::XPUGeneratorImpl(DeviceIndex device_index) | ||
: GeneratorImpl{ | ||
Device(DeviceType::XPU, device_index), | ||
DispatchKeySet(c10::DispatchKey::XPU)} {} | ||
|
||
void XPUGeneratorImpl::set_current_seed(uint64_t seed) { | ||
seed_ = seed; | ||
set_philox_offset_per_thread(0); | ||
} | ||
|
||
void XPUGeneratorImpl::set_offset(uint64_t offset) { | ||
set_philox_offset_per_thread(offset); | ||
} | ||
|
||
uint64_t XPUGeneratorImpl::get_offset() const { | ||
return philox_offset_per_thread_; | ||
} | ||
|
||
uint64_t XPUGeneratorImpl::current_seed() const { | ||
return seed_; | ||
} | ||
|
||
uint64_t XPUGeneratorImpl::seed() { | ||
auto random = c10::detail::getNonDeterministicRandom(true); | ||
this->set_current_seed(random); | ||
return random; | ||
} | ||
|
||
c10::intrusive_ptr<c10::TensorImpl> XPUGeneratorImpl::get_state() const { | ||
// The RNG state comprises the seed, and an offset used for Philox. | ||
static const size_t seed_size = sizeof(uint64_t); | ||
static const size_t offset_size = sizeof(uint64_t); | ||
static const size_t total_size = seed_size + offset_size; | ||
|
||
// The internal state is returned as a CPU byte tensor. | ||
auto state_tensor = at::detail::empty_cpu( | ||
{static_cast<int64_t>(total_size)}, | ||
ScalarType::Byte, | ||
c10::nullopt, | ||
c10::nullopt, | ||
c10::nullopt, | ||
c10::nullopt); | ||
auto rng_state = state_tensor.data_ptr<uint8_t>(); | ||
auto current_seed = this->current_seed(); | ||
auto offset = this->philox_offset_per_thread(); | ||
memcpy(rng_state, ¤t_seed, seed_size); | ||
memcpy(rng_state + seed_size, &offset, offset_size); | ||
|
||
return state_tensor.getIntrusivePtr(); | ||
} | ||
|
||
void XPUGeneratorImpl::set_state(const c10::TensorImpl& new_state) { | ||
static const size_t seed_size = sizeof(uint64_t); | ||
static const size_t offset_size = sizeof(uint64_t); | ||
static const size_t total_size = seed_size + offset_size; | ||
|
||
at::detail::check_rng_state(new_state); | ||
auto new_state_size = new_state.numel(); | ||
TORCH_CHECK(new_state_size == total_size, "RNG state is wrong size"); | ||
|
||
uint64_t input_seed; | ||
auto new_rng_state = new_state.data_dtype_initialized<uint8_t>(); | ||
memcpy(&input_seed, new_rng_state, seed_size); | ||
this->set_current_seed(input_seed); | ||
uint64_t philox_offset; | ||
memcpy(&philox_offset, new_rng_state + seed_size, offset_size); | ||
this->set_philox_offset_per_thread(philox_offset); | ||
} | ||
|
||
void XPUGeneratorImpl::set_philox_offset_per_thread(uint64_t offset) { | ||
TORCH_CHECK(offset % 4 == 0, "offset must be a multiple of 4"); | ||
philox_offset_per_thread_ = offset; | ||
} | ||
|
||
uint64_t XPUGeneratorImpl::philox_offset_per_thread() const { | ||
return philox_offset_per_thread_; | ||
} | ||
|
||
std::pair<uint64_t, uint64_t> XPUGeneratorImpl::philox_engine_inputs( | ||
uint64_t increment) { | ||
increment = ((increment + 3) / 4) * 4; | ||
TORCH_INTERNAL_ASSERT(this->philox_offset_per_thread_ % 4 == 0); | ||
uint64_t offset = this->philox_offset_per_thread_; | ||
this->philox_offset_per_thread_ += increment; | ||
return std::make_pair(this->seed_, offset); | ||
} | ||
|
||
DeviceType XPUGeneratorImpl::device_type() { | ||
return DeviceType::XPU; | ||
} | ||
|
||
std::shared_ptr<XPUGeneratorImpl> XPUGeneratorImpl::clone() const { | ||
return std::shared_ptr<XPUGeneratorImpl>(this->clone_impl()); | ||
} | ||
|
||
XPUGeneratorImpl* XPUGeneratorImpl::clone_impl() const { | ||
auto gen = new XPUGeneratorImpl(this->device().index()); | ||
gen->set_current_seed(this->seed_); | ||
gen->set_philox_offset_per_thread(this->philox_offset_per_thread_); | ||
return gen; | ||
} | ||
|
||
} // namespace at |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
#pragma once | ||
|
||
#include <ATen/core/Generator.h> | ||
|
||
namespace at { | ||
|
||
struct TORCH_API XPUGeneratorImpl : public GeneratorImpl { | ||
// Constructors | ||
XPUGeneratorImpl(DeviceIndex device_index = -1); | ||
~XPUGeneratorImpl() override = default; | ||
|
||
// XPUGeneratorImpl methods | ||
std::shared_ptr<XPUGeneratorImpl> clone() const; | ||
void set_current_seed(uint64_t seed) override; | ||
void set_offset(uint64_t offset) override; | ||
uint64_t get_offset() const override; | ||
uint64_t current_seed() const override; | ||
uint64_t seed() override; | ||
void set_state(const c10::TensorImpl& new_state) override; | ||
c10::intrusive_ptr<c10::TensorImpl> get_state() const override; | ||
void set_philox_offset_per_thread(uint64_t offset); | ||
uint64_t philox_offset_per_thread() const; | ||
std::pair<uint64_t, uint64_t> philox_engine_inputs(uint64_t increment); | ||
static c10::DeviceType device_type(); | ||
|
||
private: | ||
XPUGeneratorImpl* clone_impl() const override; | ||
uint64_t seed_ = default_rng_seed_val; | ||
uint64_t philox_offset_per_thread_ = 0; | ||
}; | ||
|
||
namespace xpu::detail { | ||
|
||
TORCH_XPU_API const Generator& getDefaultXPUGenerator(DeviceIndex device = -1); | ||
|
||
TORCH_XPU_API Generator createXPUGenerator(DeviceIndex device = -1); | ||
|
||
} // namespace xpu::detail | ||
} // namespace at |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not a vector like below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change to
std::vector<c10::once_flag> xpu_gens_init_flag;
Here
std::vector
is more efficient thanstd::deque
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry @albanD, looks like
std::vector<c10::once_flag>
doesn't supportresize
method becauseonce_flag
lacks a copy constructor. seepytorch/c10/util/CallOnce.h
Line 37 in 685d862
I have also tried on godbolt. So, here I change back to
std::deque
.I think I missed this error before because I didn't save my code change when I rebuilt it in my local machine. I'm very sorry about this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@albanD Could you help review again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ho interesting. Sounds good!