Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate and remove torch.set_default_tensor_type #53124

Open
ezyang opened this issue Mar 2, 2021 · 1 comment
Open

Deprecate and remove torch.set_default_tensor_type #53124

ezyang opened this issue Mar 2, 2021 · 1 comment
Labels
module: deprecation triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@ezyang
Copy link
Contributor

ezyang commented Mar 2, 2021

One of the things @gchanan keeps telling me we should do (and we still haven't done, all these years later), is get rid of torch.set_default_tensor_type. Why is it bad? Let me count the ways:

Unfortunately, it is also widely used (FB only: https://fburl.com/codesearch/b3bmi3lo ) (GitHub: https://github.com/search?q=torch.set_default_tensor_type&type=code)

Let's get rid of it. Because it is widely used, we will have to hard deprecate some aspects of the functionality, and soft deprecate others. Here is how I propose we go about doing it.

Deprecation cycle one. We narrow the API so it only supports exactly six call patterns:

torch.set_default_tensor_type(torch.HalfTensor)
torch.set_default_tensor_type(torch.FloatTensor)
torch.set_default_tensor_type(torch.DoubleTensor)
torch.set_default_tensor_type(torch.cuda.HalfTensor)
torch.set_default_tensor_type(torch.cuda.FloatTensor)
torch.set_default_tensor_type(torch.cuda.DoubleTensor)

All other call patterns are deprecated and to be removed. This list of call patterns is effectively all calls that actually work in set_default_tensor_type today; however, the purpose of hardcoding only six known patterns is to make sure we don't "accidentally" add support for more surface (as is what happened for HalfTensor) and to give us more leeway in refactoring our code (as we no longer have to assume, e.g., that set_default_tensor_type would work with XLA tensors).

Once other call types are removed, we can then refactor set_default_tensor_type into two pieces of functionality: setting default dtype (which exists today as set_default_dtype) and _bc_only_set_default_cuda, which is a BC-only knob for "turning on CUDA" as was done before by the torch.cuda invocations. set_default_tensor_type is updated to call these two pieces of functionality and no longer is a direct part of the C bindings; furthermore, we can refactor tensor_new.cpp to

Deprecation cycle two. Ideally, we come up with some resolution to the #27878 cluster of issues. Now we can deprecate and remove set_default_tensor_type entirely, telling users to use set_default_dtype or some hypothetical #27878. Defaulting can be removed entirely from tensor_new.cpp

@mruberry mruberry added module: deprecation triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Mar 2, 2021
ezyang added a commit that referenced this issue Mar 15, 2021
Fixes #53544

I had to touch a bunch of lines but the refactoring was fairly
mechanical.  Here's how it works.

The basic concept behind this PR is that tensor_new.cpp was previously
abusing DispatchKey when it actually meant TensorOptions.  The provided
DispatchKey argument to most of the constructor functions typically
comes from torch::tensors::get_default_dispatch_key();  it doesn't
really make sense for people to set the default dispatch key, but
this got grandfathered in due to the old API set_default_tensor_type
(where the "Type" concept got refactored into "DispatchKey" concept
over time).  See also #53124.  But the upshot is that, semantically,
what we refer to as the default dispatch key really is more like
torch.set_default_tensor_type(torch.Tensor) versus
torch.set_default_tensor_type(torch.cuda.Tensor): clearly the user
wants to do something about *construction* of the tensor, and
TensorOptions captures that exactly.

So, how exactly to translate from one to the other?
- Sources (things that used to PRODUCE DispatchKey)
  - Most top level functions take a DispatchKey as their argument.  I
    use the new function dispatchKeyToTensorOptions to convert it into
    a TensorOptions
  - typeIdWithDefault now produces a TensorOptions (probably could do
    with a rename, though I didn't)
- Sinks (things that used to CONSUME DispatchKey)
  - Previously, the function options() was typically used to convert the
    DispatchKey into a TensorOptions.  Now its replacement build_options
    just takes a TensorOptions and sets some extra fields on it.
    Irritatingly, I can't just replace
    `build_options(options, scalar_type, device)` with
    `options.dtype(scalar_type).device(device)` because the semantics
    are slightly different: if device is nullopt, we should preserve
    the usage of the device specified in options (what options.device()
    does is overwrite the device unconditionally; e.g., if device is
    nullopt, unset device from options)
  - The other major sink for DispatchKey was `internal_new_from_data`,
    but it turns out it only really extracts the device type from
    the dispatch key.  Now it just pulls out the device from
    TensorOptions.
- To actually do the translation of DispatchKey to TensorOptions, I
  introduce new functions dispatchKeyToLayout (replicating
  layout_from_backend--there are still a few uses of this function
  so I couldn't delete it) and dispatchKeyToDeviceType (replacing
  computeDeviceType)
- In all internal functions, whenever DispatchKey is taken as an argument,
  I instead take TensorOptions as an argument, and pass it along.
- Anywhere `legacyExtractDispatchKey(other.key_set())` equality was
  previously used, I now do `other.options().type_equal()`, which
  is the intended BC for doing "backend to backend" comparisons
- There are a few places in the sparse constructors where we allocated
  a tensor for values, and then read out the dispatch key from the
  result to allocate the keys.  As best as I can tell, this is totally
  equivalent to just passing in the options to both values and indices
  (the only difference is dtype, which is captured via a separate
  argument)

This refactor doesn't really go far enough: for example, there are now
functions that take both TensorOptions and ScalarType, when really
the TensorOptions can capture this all.  I kept it solely just
s/DispatchKey/TensorOptions/ to reduce the number of possible bugs;
also, a lot of this will be mooted by a proper fix to #53124.

Even with this limited refactor, the payoff is sweet.  I can delete:

- backendToCPU
- backendToXPU
- backendToCUDA
- backendToHIP
- backendToBackendOfDeviceType

The reason I can do this is because I can simply overwrite layout in TensorOptions
to do the conversion, rather than having to type out each backend case
explicitly.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

[ghstack-poisoned]
ezyang added a commit that referenced this issue Mar 15, 2021
…spatchKey"

Fixes #53544

I had to touch a bunch of lines but the refactoring was fairly
mechanical.  Here's how it works.

The basic concept behind this PR is that tensor_new.cpp was previously
abusing DispatchKey when it actually meant TensorOptions.  The provided
DispatchKey argument to most of the constructor functions typically
comes from torch::tensors::get_default_dispatch_key();  it doesn't
really make sense for people to set the default dispatch key, but
this got grandfathered in due to the old API set_default_tensor_type
(where the "Type" concept got refactored into "DispatchKey" concept
over time).  See also #53124.  But the upshot is that, semantically,
what we refer to as the default dispatch key really is more like
torch.set_default_tensor_type(torch.Tensor) versus
torch.set_default_tensor_type(torch.cuda.Tensor): clearly the user
wants to do something about *construction* of the tensor, and
TensorOptions captures that exactly.

So, how exactly to translate from one to the other?
- Sources (things that used to PRODUCE DispatchKey)
  - Most top level functions take a DispatchKey as their argument.  I
    use the new function dispatchKeyToTensorOptions to convert it into
    a TensorOptions
  - typeIdWithDefault now produces a TensorOptions (probably could do
    with a rename, though I didn't)
- Sinks (things that used to CONSUME DispatchKey)
  - Previously, the function options() was typically used to convert the
    DispatchKey into a TensorOptions.  Now its replacement build_options
    just takes a TensorOptions and sets some extra fields on it.
    Irritatingly, I can't just replace
    `build_options(options, scalar_type, device)` with
    `options.dtype(scalar_type).device(device)` because the semantics
    are slightly different: if device is nullopt, we should preserve
    the usage of the device specified in options (what options.device()
    does is overwrite the device unconditionally; e.g., if device is
    nullopt, unset device from options)
  - The other major sink for DispatchKey was `internal_new_from_data`,
    but it turns out it only really extracts the device type from
    the dispatch key.  Now it just pulls out the device from
    TensorOptions.
- To actually do the translation of DispatchKey to TensorOptions, I
  introduce new functions dispatchKeyToLayout (replicating
  layout_from_backend--there are still a few uses of this function
  so I couldn't delete it) and dispatchKeyToDeviceType (replacing
  computeDeviceType)
- In all internal functions, whenever DispatchKey is taken as an argument,
  I instead take TensorOptions as an argument, and pass it along.
- Anywhere `legacyExtractDispatchKey(other.key_set())` equality was
  previously used, I now do `other.options().type_equal()`, which
  is the intended BC for doing "backend to backend" comparisons
- There are a few places in the sparse constructors where we allocated
  a tensor for values, and then read out the dispatch key from the
  result to allocate the keys.  As best as I can tell, this is totally
  equivalent to just passing in the options to both values and indices
  (the only difference is dtype, which is captured via a separate
  argument)

This refactor doesn't really go far enough: for example, there are now
functions that take both TensorOptions and ScalarType, when really
the TensorOptions can capture this all.  I kept it solely just
s/DispatchKey/TensorOptions/ to reduce the number of possible bugs;
also, a lot of this will be mooted by a proper fix to #53124.

Even with this limited refactor, the payoff is sweet.  I can delete:

- backendToCPU
- backendToXPU
- backendToCUDA
- backendToHIP
- backendToBackendOfDeviceType

The reason I can do this is because I can simply overwrite layout in TensorOptions
to do the conversion, rather than having to type out each backend case
explicitly.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

[ghstack-poisoned]
ezyang added a commit that referenced this issue Mar 15, 2021
Fixes #53544

I had to touch a bunch of lines but the refactoring was fairly
mechanical.  Here's how it works.

The basic concept behind this PR is that tensor_new.cpp was previously
abusing DispatchKey when it actually meant TensorOptions.  The provided
DispatchKey argument to most of the constructor functions typically
comes from torch::tensors::get_default_dispatch_key();  it doesn't
really make sense for people to set the default dispatch key, but
this got grandfathered in due to the old API set_default_tensor_type
(where the "Type" concept got refactored into "DispatchKey" concept
over time).  See also #53124.  But the upshot is that, semantically,
what we refer to as the default dispatch key really is more like
torch.set_default_tensor_type(torch.Tensor) versus
torch.set_default_tensor_type(torch.cuda.Tensor): clearly the user
wants to do something about *construction* of the tensor, and
TensorOptions captures that exactly.

So, how exactly to translate from one to the other?
- Sources (things that used to PRODUCE DispatchKey)
  - Most top level functions take a DispatchKey as their argument.  I
    use the new function dispatchKeyToTensorOptions to convert it into
    a TensorOptions
  - typeIdWithDefault now produces a TensorOptions (probably could do
    with a rename, though I didn't)
- Sinks (things that used to CONSUME DispatchKey)
  - Previously, the function options() was typically used to convert the
    DispatchKey into a TensorOptions.  Now its replacement build_options
    just takes a TensorOptions and sets some extra fields on it.
    Irritatingly, I can't just replace
    `build_options(options, scalar_type, device)` with
    `options.dtype(scalar_type).device(device)` because the semantics
    are slightly different: if device is nullopt, we should preserve
    the usage of the device specified in options (what options.device()
    does is overwrite the device unconditionally; e.g., if device is
    nullopt, unset device from options)
  - The other major sink for DispatchKey was `internal_new_from_data`,
    but it turns out it only really extracts the device type from
    the dispatch key.  Now it just pulls out the device from
    TensorOptions.
- To actually do the translation of DispatchKey to TensorOptions, I
  introduce new functions dispatchKeyToLayout (replicating
  layout_from_backend--there are still a few uses of this function
  so I couldn't delete it) and dispatchKeyToDeviceType (replacing
  computeDeviceType)
- In all internal functions, whenever DispatchKey is taken as an argument,
  I instead take TensorOptions as an argument, and pass it along.
- Anywhere `legacyExtractDispatchKey(other.key_set())` equality was
  previously used, I now do `other.options().type_equal()`, which
  is the intended BC for doing "backend to backend" comparisons
- There are a few places in the sparse constructors where we allocated
  a tensor for values, and then read out the dispatch key from the
  result to allocate the keys.  As best as I can tell, this is totally
  equivalent to just passing in the options to both values and indices
  (the only difference is dtype, which is captured via a separate
  argument)

This refactor doesn't really go far enough: for example, there are now
functions that take both TensorOptions and ScalarType, when really
the TensorOptions can capture this all.  I kept it solely just
s/DispatchKey/TensorOptions/ to reduce the number of possible bugs;
also, a lot of this will be mooted by a proper fix to #53124.

Even with this limited refactor, the payoff is sweet.  I can delete:

- backendToCPU
- backendToXPU
- backendToCUDA
- backendToHIP
- backendToBackendOfDeviceType

The reason I can do this is because I can simply overwrite layout in TensorOptions
to do the conversion, rather than having to type out each backend case
explicitly.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

ghstack-source-id: 798965d0566a8cadeaab582a4aa6e4d16a36b697
Pull Request resolved: #54034
ezyang added a commit that referenced this issue Mar 16, 2021
…spatchKey"

Fixes #53544

I had to touch a bunch of lines but the refactoring was fairly
mechanical.  Here's how it works.

The basic concept behind this PR is that tensor_new.cpp was previously
abusing DispatchKey when it actually meant TensorOptions.  The provided
DispatchKey argument to most of the constructor functions typically
comes from torch::tensors::get_default_dispatch_key();  it doesn't
really make sense for people to set the default dispatch key, but
this got grandfathered in due to the old API set_default_tensor_type
(where the "Type" concept got refactored into "DispatchKey" concept
over time).  See also #53124.  But the upshot is that, semantically,
what we refer to as the default dispatch key really is more like
torch.set_default_tensor_type(torch.Tensor) versus
torch.set_default_tensor_type(torch.cuda.Tensor): clearly the user
wants to do something about *construction* of the tensor, and
TensorOptions captures that exactly.

So, how exactly to translate from one to the other?
- Sources (things that used to PRODUCE DispatchKey)
  - Most top level functions take a DispatchKey as their argument.  I
    use the new function dispatchKeyToTensorOptions to convert it into
    a TensorOptions
  - typeIdWithDefault now produces a TensorOptions (probably could do
    with a rename, though I didn't)
- Sinks (things that used to CONSUME DispatchKey)
  - Previously, the function options() was typically used to convert the
    DispatchKey into a TensorOptions.  Now its replacement build_options
    just takes a TensorOptions and sets some extra fields on it.
    Irritatingly, I can't just replace
    `build_options(options, scalar_type, device)` with
    `options.dtype(scalar_type).device(device)` because the semantics
    are slightly different: if device is nullopt, we should preserve
    the usage of the device specified in options (what options.device()
    does is overwrite the device unconditionally; e.g., if device is
    nullopt, unset device from options)
  - The other major sink for DispatchKey was `internal_new_from_data`,
    but it turns out it only really extracts the device type from
    the dispatch key.  Now it just pulls out the device from
    TensorOptions.
- To actually do the translation of DispatchKey to TensorOptions, I
  introduce new functions dispatchKeyToLayout (replicating
  layout_from_backend--there are still a few uses of this function
  so I couldn't delete it) and dispatchKeyToDeviceType (replacing
  computeDeviceType)
- In all internal functions, whenever DispatchKey is taken as an argument,
  I instead take TensorOptions as an argument, and pass it along.
- Anywhere `legacyExtractDispatchKey(other.key_set())` equality was
  previously used, I now do `other.options().type_equal()`, which
  is the intended BC for doing "backend to backend" comparisons
- There are a few places in the sparse constructors where we allocated
  a tensor for values, and then read out the dispatch key from the
  result to allocate the keys.  As best as I can tell, this is totally
  equivalent to just passing in the options to both values and indices
  (the only difference is dtype, which is captured via a separate
  argument)

This refactor doesn't really go far enough: for example, there are now
functions that take both TensorOptions and ScalarType, when really
the TensorOptions can capture this all.  I kept it solely just
s/DispatchKey/TensorOptions/ to reduce the number of possible bugs;
also, a lot of this will be mooted by a proper fix to #53124.

Even with this limited refactor, the payoff is sweet.  I can delete:

- backendToCPU
- backendToXPU
- backendToCUDA
- backendToHIP
- backendToBackendOfDeviceType

The reason I can do this is because I can simply overwrite layout in TensorOptions
to do the conversion, rather than having to type out each backend case
explicitly.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

[ghstack-poisoned]
ezyang added a commit that referenced this issue Mar 16, 2021
Fixes #53544

I had to touch a bunch of lines but the refactoring was fairly
mechanical.  Here's how it works.

The basic concept behind this PR is that tensor_new.cpp was previously
abusing DispatchKey when it actually meant TensorOptions.  The provided
DispatchKey argument to most of the constructor functions typically
comes from torch::tensors::get_default_dispatch_key();  it doesn't
really make sense for people to set the default dispatch key, but
this got grandfathered in due to the old API set_default_tensor_type
(where the "Type" concept got refactored into "DispatchKey" concept
over time).  See also #53124.  But the upshot is that, semantically,
what we refer to as the default dispatch key really is more like
torch.set_default_tensor_type(torch.Tensor) versus
torch.set_default_tensor_type(torch.cuda.Tensor): clearly the user
wants to do something about *construction* of the tensor, and
TensorOptions captures that exactly.

So, how exactly to translate from one to the other?
- Sources (things that used to PRODUCE DispatchKey)
  - Most top level functions take a DispatchKey as their argument.  I
    use the new function dispatchKeyToTensorOptions to convert it into
    a TensorOptions
  - typeIdWithDefault now produces a TensorOptions (probably could do
    with a rename, though I didn't)
- Sinks (things that used to CONSUME DispatchKey)
  - Previously, the function options() was typically used to convert the
    DispatchKey into a TensorOptions.  Now its replacement build_options
    just takes a TensorOptions and sets some extra fields on it.
    Irritatingly, I can't just replace
    `build_options(options, scalar_type, device)` with
    `options.dtype(scalar_type).device(device)` because the semantics
    are slightly different: if device is nullopt, we should preserve
    the usage of the device specified in options (what options.device()
    does is overwrite the device unconditionally; e.g., if device is
    nullopt, unset device from options)
  - The other major sink for DispatchKey was `internal_new_from_data`,
    but it turns out it only really extracts the device type from
    the dispatch key.  Now it just pulls out the device from
    TensorOptions.
- To actually do the translation of DispatchKey to TensorOptions, I
  introduce new functions dispatchKeyToLayout (replicating
  layout_from_backend--there are still a few uses of this function
  so I couldn't delete it) and dispatchKeyToDeviceType (replacing
  computeDeviceType)
- In all internal functions, whenever DispatchKey is taken as an argument,
  I instead take TensorOptions as an argument, and pass it along.
- Anywhere `legacyExtractDispatchKey(other.key_set())` equality was
  previously used, I now do `other.options().type_equal()`, which
  is the intended BC for doing "backend to backend" comparisons
- There are a few places in the sparse constructors where we allocated
  a tensor for values, and then read out the dispatch key from the
  result to allocate the keys.  As best as I can tell, this is totally
  equivalent to just passing in the options to both values and indices
  (the only difference is dtype, which is captured via a separate
  argument)

This refactor doesn't really go far enough: for example, there are now
functions that take both TensorOptions and ScalarType, when really
the TensorOptions can capture this all.  I kept it solely just
s/DispatchKey/TensorOptions/ to reduce the number of possible bugs;
also, a lot of this will be mooted by a proper fix to #53124.

Even with this limited refactor, the payoff is sweet.  I can delete:

- backendToCPU
- backendToXPU
- backendToCUDA
- backendToHIP
- backendToBackendOfDeviceType

The reason I can do this is because I can simply overwrite layout in TensorOptions
to do the conversion, rather than having to type out each backend case
explicitly.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

ghstack-source-id: b1ecfa798c268acbf87315d6468826a15afb1793
Pull Request resolved: #54034
ezyang added a commit that referenced this issue Mar 17, 2021
…spatchKey"

Fixes #53544

I had to touch a bunch of lines but the refactoring was fairly
mechanical.  Here's how it works.

The basic concept behind this PR is that tensor_new.cpp was previously
abusing DispatchKey when it actually meant TensorOptions.  The provided
DispatchKey argument to most of the constructor functions typically
comes from torch::tensors::get_default_dispatch_key();  it doesn't
really make sense for people to set the default dispatch key, but
this got grandfathered in due to the old API set_default_tensor_type
(where the "Type" concept got refactored into "DispatchKey" concept
over time).  See also #53124.  But the upshot is that, semantically,
what we refer to as the default dispatch key really is more like
torch.set_default_tensor_type(torch.Tensor) versus
torch.set_default_tensor_type(torch.cuda.Tensor): clearly the user
wants to do something about *construction* of the tensor, and
TensorOptions captures that exactly.

So, how exactly to translate from one to the other?
- Sources (things that used to PRODUCE DispatchKey)
  - Most top level functions take a DispatchKey as their argument.  I
    use the new function dispatchKeyToTensorOptions to convert it into
    a TensorOptions
  - typeIdWithDefault now produces a TensorOptions (probably could do
    with a rename, though I didn't)
- Sinks (things that used to CONSUME DispatchKey)
  - Previously, the function options() was typically used to convert the
    DispatchKey into a TensorOptions.  Now its replacement build_options
    just takes a TensorOptions and sets some extra fields on it.
    Irritatingly, I can't just replace
    `build_options(options, scalar_type, device)` with
    `options.dtype(scalar_type).device(device)` because the semantics
    are slightly different: if device is nullopt, we should preserve
    the usage of the device specified in options (what options.device()
    does is overwrite the device unconditionally; e.g., if device is
    nullopt, unset device from options)
  - The other major sink for DispatchKey was `internal_new_from_data`,
    but it turns out it only really extracts the device type from
    the dispatch key.  Now it just pulls out the device from
    TensorOptions.
- To actually do the translation of DispatchKey to TensorOptions, I
  introduce new functions dispatchKeyToLayout (replicating
  layout_from_backend--there are still a few uses of this function
  so I couldn't delete it) and dispatchKeyToDeviceType (replacing
  computeDeviceType)
- In all internal functions, whenever DispatchKey is taken as an argument,
  I instead take TensorOptions as an argument, and pass it along.
- Anywhere `legacyExtractDispatchKey(other.key_set())` equality was
  previously used, I now do `other.options().type_equal()`, which
  is the intended BC for doing "backend to backend" comparisons
- There are a few places in the sparse constructors where we allocated
  a tensor for values, and then read out the dispatch key from the
  result to allocate the keys.  As best as I can tell, this is totally
  equivalent to just passing in the options to both values and indices
  (the only difference is dtype, which is captured via a separate
  argument)

This refactor doesn't really go far enough: for example, there are now
functions that take both TensorOptions and ScalarType, when really
the TensorOptions can capture this all.  I kept it solely just
s/DispatchKey/TensorOptions/ to reduce the number of possible bugs;
also, a lot of this will be mooted by a proper fix to #53124.

Even with this limited refactor, the payoff is sweet.  I can delete:

- backendToCPU
- backendToXPU
- backendToCUDA
- backendToHIP
- backendToBackendOfDeviceType

The reason I can do this is because I can simply overwrite layout in TensorOptions
to do the conversion, rather than having to type out each backend case
explicitly.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

[ghstack-poisoned]
ezyang added a commit that referenced this issue Mar 17, 2021
…spatchKey"

Fixes #53544

I had to touch a bunch of lines but the refactoring was fairly
mechanical.  Here's how it works.

The basic concept behind this PR is that tensor_new.cpp was previously
abusing DispatchKey when it actually meant TensorOptions.  The provided
DispatchKey argument to most of the constructor functions typically
comes from torch::tensors::get_default_dispatch_key();  it doesn't
really make sense for people to set the default dispatch key, but
this got grandfathered in due to the old API set_default_tensor_type
(where the "Type" concept got refactored into "DispatchKey" concept
over time).  See also #53124.  But the upshot is that, semantically,
what we refer to as the default dispatch key really is more like
torch.set_default_tensor_type(torch.Tensor) versus
torch.set_default_tensor_type(torch.cuda.Tensor): clearly the user
wants to do something about *construction* of the tensor, and
TensorOptions captures that exactly.

So, how exactly to translate from one to the other?
- Sources (things that used to PRODUCE DispatchKey)
  - Most top level functions take a DispatchKey as their argument.  I
    use the new function dispatchKeyToTensorOptions to convert it into
    a TensorOptions
  - typeIdWithDefault now produces a TensorOptions (probably could do
    with a rename, though I didn't)
- Sinks (things that used to CONSUME DispatchKey)
  - Previously, the function options() was typically used to convert the
    DispatchKey into a TensorOptions.  Now its replacement build_options
    just takes a TensorOptions and sets some extra fields on it.
    Irritatingly, I can't just replace
    `build_options(options, scalar_type, device)` with
    `options.dtype(scalar_type).device(device)` because the semantics
    are slightly different: if device is nullopt, we should preserve
    the usage of the device specified in options (what options.device()
    does is overwrite the device unconditionally; e.g., if device is
    nullopt, unset device from options)
  - The other major sink for DispatchKey was `internal_new_from_data`,
    but it turns out it only really extracts the device type from
    the dispatch key.  Now it just pulls out the device from
    TensorOptions.
- To actually do the translation of DispatchKey to TensorOptions, I
  introduce new functions dispatchKeyToLayout (replicating
  layout_from_backend--there are still a few uses of this function
  so I couldn't delete it) and dispatchKeyToDeviceType (replacing
  computeDeviceType)
- In all internal functions, whenever DispatchKey is taken as an argument,
  I instead take TensorOptions as an argument, and pass it along.
- Anywhere `legacyExtractDispatchKey(other.key_set())` equality was
  previously used, I now do `other.options().type_equal()`, which
  is the intended BC for doing "backend to backend" comparisons
- There are a few places in the sparse constructors where we allocated
  a tensor for values, and then read out the dispatch key from the
  result to allocate the keys.  As best as I can tell, this is totally
  equivalent to just passing in the options to both values and indices
  (the only difference is dtype, which is captured via a separate
  argument)

This refactor doesn't really go far enough: for example, there are now
functions that take both TensorOptions and ScalarType, when really
the TensorOptions can capture this all.  I kept it solely just
s/DispatchKey/TensorOptions/ to reduce the number of possible bugs;
also, a lot of this will be mooted by a proper fix to #53124.

Even with this limited refactor, the payoff is sweet.  I can delete:

- backendToCPU
- backendToXPU
- backendToCUDA
- backendToHIP
- backendToBackendOfDeviceType

The reason I can do this is because I can simply overwrite layout in TensorOptions
to do the conversion, rather than having to type out each backend case
explicitly.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: [D27109509](https://our.internmc.facebook.com/intern/diff/D27109509)

[ghstack-poisoned]
ezyang added a commit that referenced this issue Mar 17, 2021
…ns instead of DispatchKey"

Fixes #53544

I had to touch a bunch of lines but the refactoring was fairly
mechanical.  Here's how it works.

The basic concept behind this PR is that tensor_new.cpp was previously
abusing DispatchKey when it actually meant TensorOptions.  The provided
DispatchKey argument to most of the constructor functions typically
comes from torch::tensors::get_default_dispatch_key();  it doesn't
really make sense for people to set the default dispatch key, but
this got grandfathered in due to the old API set_default_tensor_type
(where the "Type" concept got refactored into "DispatchKey" concept
over time).  See also #53124.  But the upshot is that, semantically,
what we refer to as the default dispatch key really is more like
torch.set_default_tensor_type(torch.Tensor) versus
torch.set_default_tensor_type(torch.cuda.Tensor): clearly the user
wants to do something about *construction* of the tensor, and
TensorOptions captures that exactly.

So, how exactly to translate from one to the other?
- Sources (things that used to PRODUCE DispatchKey)
  - Most top level functions take a DispatchKey as their argument.  I
    use the new function dispatchKeyToTensorOptions to convert it into
    a TensorOptions
  - typeIdWithDefault now produces a TensorOptions (probably could do
    with a rename, though I didn't)
- Sinks (things that used to CONSUME DispatchKey)
  - Previously, the function options() was typically used to convert the
    DispatchKey into a TensorOptions.  Now its replacement build_options
    just takes a TensorOptions and sets some extra fields on it.
    Irritatingly, I can't just replace
    `build_options(options, scalar_type, device)` with
    `options.dtype(scalar_type).device(device)` because the semantics
    are slightly different: if device is nullopt, we should preserve
    the usage of the device specified in options (what options.device()
    does is overwrite the device unconditionally; e.g., if device is
    nullopt, unset device from options)
  - The other major sink for DispatchKey was `internal_new_from_data`,
    but it turns out it only really extracts the device type from
    the dispatch key.  Now it just pulls out the device from
    TensorOptions.
- To actually do the translation of DispatchKey to TensorOptions, I
  introduce new functions dispatchKeyToLayout (replicating
  layout_from_backend--there are still a few uses of this function
  so I couldn't delete it) and dispatchKeyToDeviceType (replacing
  computeDeviceType)
- In all internal functions, whenever DispatchKey is taken as an argument,
  I instead take TensorOptions as an argument, and pass it along.
- Anywhere `legacyExtractDispatchKey(other.key_set())` equality was
  previously used, I now do `other.options().type_equal()`, which
  is the intended BC for doing "backend to backend" comparisons
- There are a few places in the sparse constructors where we allocated
  a tensor for values, and then read out the dispatch key from the
  result to allocate the keys.  As best as I can tell, this is totally
  equivalent to just passing in the options to both values and indices
  (the only difference is dtype, which is captured via a separate
  argument)

This refactor doesn't really go far enough: for example, there are now
functions that take both TensorOptions and ScalarType, when really
the TensorOptions can capture this all.  I kept it solely just
s/DispatchKey/TensorOptions/ to reduce the number of possible bugs;
also, a lot of this will be mooted by a proper fix to #53124.

Even with this limited refactor, the payoff is sweet.  I can delete:

- backendToCPU
- backendToXPU
- backendToCUDA
- backendToHIP
- backendToBackendOfDeviceType

The reason I can do this is because I can simply overwrite layout in TensorOptions
to do the conversion, rather than having to type out each backend case
explicitly.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: [D27109509](https://our.internmc.facebook.com/intern/diff/D27109509)

[ghstack-poisoned]
ezyang added a commit that referenced this issue Mar 17, 2021
…spatchKey"

Fixes #53544

I had to touch a bunch of lines but the refactoring was fairly
mechanical.  Here's how it works.

The basic concept behind this PR is that tensor_new.cpp was previously
abusing DispatchKey when it actually meant TensorOptions.  The provided
DispatchKey argument to most of the constructor functions typically
comes from torch::tensors::get_default_dispatch_key();  it doesn't
really make sense for people to set the default dispatch key, but
this got grandfathered in due to the old API set_default_tensor_type
(where the "Type" concept got refactored into "DispatchKey" concept
over time).  See also #53124.  But the upshot is that, semantically,
what we refer to as the default dispatch key really is more like
torch.set_default_tensor_type(torch.Tensor) versus
torch.set_default_tensor_type(torch.cuda.Tensor): clearly the user
wants to do something about *construction* of the tensor, and
TensorOptions captures that exactly.

So, how exactly to translate from one to the other?
- Sources (things that used to PRODUCE DispatchKey)
  - Most top level functions take a DispatchKey as their argument.  I
    use the new function dispatchKeyToTensorOptions to convert it into
    a TensorOptions
  - typeIdWithDefault now produces a TensorOptions (probably could do
    with a rename, though I didn't)
- Sinks (things that used to CONSUME DispatchKey)
  - Previously, the function options() was typically used to convert the
    DispatchKey into a TensorOptions.  Now its replacement build_options
    just takes a TensorOptions and sets some extra fields on it.
    Irritatingly, I can't just replace
    `build_options(options, scalar_type, device)` with
    `options.dtype(scalar_type).device(device)` because the semantics
    are slightly different: if device is nullopt, we should preserve
    the usage of the device specified in options (what options.device()
    does is overwrite the device unconditionally; e.g., if device is
    nullopt, unset device from options)
  - The other major sink for DispatchKey was `internal_new_from_data`,
    but it turns out it only really extracts the device type from
    the dispatch key.  Now it just pulls out the device from
    TensorOptions.
- To actually do the translation of DispatchKey to TensorOptions, I
  introduce new functions dispatchKeyToLayout (replicating
  layout_from_backend--there are still a few uses of this function
  so I couldn't delete it) and dispatchKeyToDeviceType (replacing
  computeDeviceType)
- In all internal functions, whenever DispatchKey is taken as an argument,
  I instead take TensorOptions as an argument, and pass it along.
- Anywhere `legacyExtractDispatchKey(other.key_set())` equality was
  previously used, I now do `other.options().type_equal()`, which
  is the intended BC for doing "backend to backend" comparisons
- There are a few places in the sparse constructors where we allocated
  a tensor for values, and then read out the dispatch key from the
  result to allocate the keys.  As best as I can tell, this is totally
  equivalent to just passing in the options to both values and indices
  (the only difference is dtype, which is captured via a separate
  argument)

This refactor doesn't really go far enough: for example, there are now
functions that take both TensorOptions and ScalarType, when really
the TensorOptions can capture this all.  I kept it solely just
s/DispatchKey/TensorOptions/ to reduce the number of possible bugs;
also, a lot of this will be mooted by a proper fix to #53124.

Even with this limited refactor, the payoff is sweet.  I can delete:

- backendToCPU
- backendToXPU
- backendToCUDA
- backendToHIP
- backendToBackendOfDeviceType

The reason I can do this is because I can simply overwrite layout in TensorOptions
to do the conversion, rather than having to type out each backend case
explicitly.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: [D27109509](https://our.internmc.facebook.com/intern/diff/D27109509)

[ghstack-poisoned]
@gchanan
Copy link
Contributor

gchanan commented Mar 18, 2021

one more reason: it's generally (IMO) not a good idea to treat devices with very different properties opaquely -- you often want to write very different code -- this applies in many scenarios with CUDA, but especially with remote devices (see e.g. https://dl.acm.org/doi/pdf/10.5555/974938).

ezyang added a commit that referenced this issue Mar 18, 2021
…ns instead of DispatchKey"

Fixes #53544

I had to touch a bunch of lines but the refactoring was fairly
mechanical.  Here's how it works.

The basic concept behind this PR is that tensor_new.cpp was previously
abusing DispatchKey when it actually meant TensorOptions.  The provided
DispatchKey argument to most of the constructor functions typically
comes from torch::tensors::get_default_dispatch_key();  it doesn't
really make sense for people to set the default dispatch key, but
this got grandfathered in due to the old API set_default_tensor_type
(where the "Type" concept got refactored into "DispatchKey" concept
over time).  See also #53124.  But the upshot is that, semantically,
what we refer to as the default dispatch key really is more like
torch.set_default_tensor_type(torch.Tensor) versus
torch.set_default_tensor_type(torch.cuda.Tensor): clearly the user
wants to do something about *construction* of the tensor, and
TensorOptions captures that exactly.

So, how exactly to translate from one to the other?
- Sources (things that used to PRODUCE DispatchKey)
  - Most top level functions take a DispatchKey as their argument.  I
    use the new function dispatchKeyToTensorOptions to convert it into
    a TensorOptions
  - typeIdWithDefault now produces a TensorOptions (probably could do
    with a rename, though I didn't)
- Sinks (things that used to CONSUME DispatchKey)
  - Previously, the function options() was typically used to convert the
    DispatchKey into a TensorOptions.  Now its replacement build_options
    just takes a TensorOptions and sets some extra fields on it.
    Irritatingly, I can't just replace
    `build_options(options, scalar_type, device)` with
    `options.dtype(scalar_type).device(device)` because the semantics
    are slightly different: if device is nullopt, we should preserve
    the usage of the device specified in options (what options.device()
    does is overwrite the device unconditionally; e.g., if device is
    nullopt, unset device from options)
  - The other major sink for DispatchKey was `internal_new_from_data`,
    but it turns out it only really extracts the device type from
    the dispatch key.  Now it just pulls out the device from
    TensorOptions.
- To actually do the translation of DispatchKey to TensorOptions, I
  introduce new functions dispatchKeyToLayout (replicating
  layout_from_backend--there are still a few uses of this function
  so I couldn't delete it) and dispatchKeyToDeviceType (replacing
  computeDeviceType)
- In all internal functions, whenever DispatchKey is taken as an argument,
  I instead take TensorOptions as an argument, and pass it along.
- Anywhere `legacyExtractDispatchKey(other.key_set())` equality was
  previously used, I now do `other.options().type_equal()`, which
  is the intended BC for doing "backend to backend" comparisons
- There are a few places in the sparse constructors where we allocated
  a tensor for values, and then read out the dispatch key from the
  result to allocate the keys.  As best as I can tell, this is totally
  equivalent to just passing in the options to both values and indices
  (the only difference is dtype, which is captured via a separate
  argument)

This refactor doesn't really go far enough: for example, there are now
functions that take both TensorOptions and ScalarType, when really
the TensorOptions can capture this all.  I kept it solely just
s/DispatchKey/TensorOptions/ to reduce the number of possible bugs;
also, a lot of this will be mooted by a proper fix to #53124.

Even with this limited refactor, the payoff is sweet.  I can delete:

- backendToCPU
- backendToXPU
- backendToCUDA
- backendToHIP
- backendToBackendOfDeviceType

The reason I can do this is because I can simply overwrite layout in TensorOptions
to do the conversion, rather than having to type out each backend case
explicitly.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: [D27109509](https://our.internmc.facebook.com/intern/diff/D27109509)

[ghstack-poisoned]
ezyang added a commit that referenced this issue Mar 18, 2021
…spatchKey"

Fixes #53544

I had to touch a bunch of lines but the refactoring was fairly
mechanical.  Here's how it works.

The basic concept behind this PR is that tensor_new.cpp was previously
abusing DispatchKey when it actually meant TensorOptions.  The provided
DispatchKey argument to most of the constructor functions typically
comes from torch::tensors::get_default_dispatch_key();  it doesn't
really make sense for people to set the default dispatch key, but
this got grandfathered in due to the old API set_default_tensor_type
(where the "Type" concept got refactored into "DispatchKey" concept
over time).  See also #53124.  But the upshot is that, semantically,
what we refer to as the default dispatch key really is more like
torch.set_default_tensor_type(torch.Tensor) versus
torch.set_default_tensor_type(torch.cuda.Tensor): clearly the user
wants to do something about *construction* of the tensor, and
TensorOptions captures that exactly.

So, how exactly to translate from one to the other?
- Sources (things that used to PRODUCE DispatchKey)
  - Most top level functions take a DispatchKey as their argument.  I
    use the new function dispatchKeyToTensorOptions to convert it into
    a TensorOptions
  - typeIdWithDefault now produces a TensorOptions (probably could do
    with a rename, though I didn't)
- Sinks (things that used to CONSUME DispatchKey)
  - Previously, the function options() was typically used to convert the
    DispatchKey into a TensorOptions.  Now its replacement build_options
    just takes a TensorOptions and sets some extra fields on it.
    Irritatingly, I can't just replace
    `build_options(options, scalar_type, device)` with
    `options.dtype(scalar_type).device(device)` because the semantics
    are slightly different: if device is nullopt, we should preserve
    the usage of the device specified in options (what options.device()
    does is overwrite the device unconditionally; e.g., if device is
    nullopt, unset device from options)
  - The other major sink for DispatchKey was `internal_new_from_data`,
    but it turns out it only really extracts the device type from
    the dispatch key.  Now it just pulls out the device from
    TensorOptions.
- To actually do the translation of DispatchKey to TensorOptions, I
  introduce new functions dispatchKeyToLayout (replicating
  layout_from_backend--there are still a few uses of this function
  so I couldn't delete it) and dispatchKeyToDeviceType (replacing
  computeDeviceType)
- In all internal functions, whenever DispatchKey is taken as an argument,
  I instead take TensorOptions as an argument, and pass it along.
- Anywhere `legacyExtractDispatchKey(other.key_set())` equality was
  previously used, I now do `other.options().type_equal()`, which
  is the intended BC for doing "backend to backend" comparisons
- There are a few places in the sparse constructors where we allocated
  a tensor for values, and then read out the dispatch key from the
  result to allocate the keys.  As best as I can tell, this is totally
  equivalent to just passing in the options to both values and indices
  (the only difference is dtype, which is captured via a separate
  argument)

This refactor doesn't really go far enough: for example, there are now
functions that take both TensorOptions and ScalarType, when really
the TensorOptions can capture this all.  I kept it solely just
s/DispatchKey/TensorOptions/ to reduce the number of possible bugs;
also, a lot of this will be mooted by a proper fix to #53124.

Even with this limited refactor, the payoff is sweet.  I can delete:

- backendToCPU
- backendToXPU
- backendToCUDA
- backendToHIP
- backendToBackendOfDeviceType

The reason I can do this is because I can simply overwrite layout in TensorOptions
to do the conversion, rather than having to type out each backend case
explicitly.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: [D27109509](https://our.internmc.facebook.com/intern/diff/D27109509)

[ghstack-poisoned]
facebook-github-bot pushed a commit that referenced this issue Mar 19, 2021
…54034)

Summary:
Pull Request resolved: #54034

Fixes #53544

I had to touch a bunch of lines but the refactoring was fairly
mechanical.  Here's how it works.

The basic concept behind this PR is that tensor_new.cpp was previously
abusing DispatchKey when it actually meant TensorOptions.  The provided
DispatchKey argument to most of the constructor functions typically
comes from torch::tensors::get_default_dispatch_key();  it doesn't
really make sense for people to set the default dispatch key, but
this got grandfathered in due to the old API set_default_tensor_type
(where the "Type" concept got refactored into "DispatchKey" concept
over time).  See also #53124.  But the upshot is that, semantically,
what we refer to as the default dispatch key really is more like
torch.set_default_tensor_type(torch.Tensor) versus
torch.set_default_tensor_type(torch.cuda.Tensor): clearly the user
wants to do something about *construction* of the tensor, and
TensorOptions captures that exactly.

So, how exactly to translate from one to the other?
- Sources (things that used to PRODUCE DispatchKey)
  - Most top level functions take a DispatchKey as their argument.  I
    use the new function dispatchKeyToTensorOptions to convert it into
    a TensorOptions
  - typeIdWithDefault now produces a TensorOptions (probably could do
    with a rename, though I didn't)
- Sinks (things that used to CONSUME DispatchKey)
  - Previously, the function options() was typically used to convert the
    DispatchKey into a TensorOptions.  Now its replacement build_options
    just takes a TensorOptions and sets some extra fields on it.
    Irritatingly, I can't just replace
    `build_options(options, scalar_type, device)` with
    `options.dtype(scalar_type).device(device)` because the semantics
    are slightly different: if device is nullopt, we should preserve
    the usage of the device specified in options (what options.device()
    does is overwrite the device unconditionally; e.g., if device is
    nullopt, unset device from options)
  - The other major sink for DispatchKey was `internal_new_from_data`,
    but it turns out it only really extracts the device type from
    the dispatch key.  Now it just pulls out the device from
    TensorOptions.
- To actually do the translation of DispatchKey to TensorOptions, I
  introduce new functions dispatchKeyToLayout (replicating
  layout_from_backend--there are still a few uses of this function
  so I couldn't delete it) and dispatchKeyToDeviceType (replacing
  computeDeviceType)
- In all internal functions, whenever DispatchKey is taken as an argument,
  I instead take TensorOptions as an argument, and pass it along.
- Anywhere `legacyExtractDispatchKey(other.key_set())` equality was
  previously used, I now do `other.options().type_equal()`, which
  is the intended BC for doing "backend to backend" comparisons
- There are a few places in the sparse constructors where we allocated
  a tensor for values, and then read out the dispatch key from the
  result to allocate the keys.  As best as I can tell, this is totally
  equivalent to just passing in the options to both values and indices
  (the only difference is dtype, which is captured via a separate
  argument)

This refactor doesn't really go far enough: for example, there are now
functions that take both TensorOptions and ScalarType, when really
the TensorOptions can capture this all.  I kept it solely just
s/DispatchKey/TensorOptions/ to reduce the number of possible bugs;
also, a lot of this will be mooted by a proper fix to #53124.

Even with this limited refactor, the payoff is sweet.  I can delete:

- backendToCPU
- backendToXPU
- backendToCUDA
- backendToHIP
- backendToBackendOfDeviceType

The reason I can do this is because I can simply overwrite layout in TensorOptions
to do the conversion, rather than having to type out each backend case
explicitly.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: bhosmer

Differential Revision: D27109509

Pulled By: ezyang

fbshipit-source-id: 91d16cfbc390127770362ac04fb43f7e070077e9
danieldk added a commit to danieldk/thinc that referenced this issue May 20, 2022
This PR removes use of `torch.set_default_tensor_type`. There are
various reasons why we should probably move away from using this
function:

- Upstream will deprecate and remove it:
  pytorch/pytorch#53124
- We cannot use this mechanism for other devices than CPU/CUDA, such as
  Metal Performance Shaders.
- It offers little flexibility in allocating Torch models on different
  devices.

This PR makes `PyTorchWrapper`/`PyTorchShim` flexible in terms of the
devices it can use. Both classes add a `device` argument to their
constructors that takes a `torch.device` instance. The shim ensures that
the model is on the given device. The wrapper ensures that input tensors
are on the correct device, by calling `xp2torch` with the new `device`
keyword argument.

Even though this approach offers more flexibility, as a default we want
to use the `cpu` device when `NumpyOps` is used and `cuda:N` when
CupyOps is used. In order to do so, this PR also adds a new function
`get_torch_default_device` that returns the correct device for the
currently active Ops. `PyTorchWrapper`/`PyTorchShim`/`xp2torch` use this
function when `None` is given as the device to fall back on this
default, mimicking the behavior from before this PR.
danieldk added a commit to danieldk/thinc that referenced this issue May 20, 2022
This PR removes use of `torch.set_default_tensor_type`. There are
various reasons why we should probably move away from using this
function:

- Upstream will deprecate and remove it:
  pytorch/pytorch#53124
- We cannot use this mechanism for other devices than CPU/CUDA, such as
  Metal Performance Shaders.
- It offers little flexibility in allocating Torch models on different
  devices.

This PR makes `PyTorchWrapper`/`PyTorchShim` flexible in terms of the
devices it can use. Both classes add a `device` argument to their
constructors that takes a `torch.device` instance. The shim ensures that
the model is on the given device. The wrapper ensures that input tensors
are on the correct device, by calling `xp2torch` with the new `device`
keyword argument.

Even though this approach offers more flexibility, as a default we want
to use the `cpu` device when `NumpyOps` is used and `cuda:N` when
CupyOps is used. In order to do so, this PR also adds a new function
`get_torch_default_device` that returns the correct device for the
currently active Ops. `PyTorchWrapper`/`PyTorchShim`/`xp2torch` use this
function when `None` is given as the device to fall back on this
default, mimicking the behavior from before this PR.
danieldk added a commit to danieldk/thinc that referenced this issue May 20, 2022
This PR removes use of `torch.set_default_tensor_type`. There are
various reasons why we should probably move away from using this
function:

- Upstream will deprecate and remove it:
  pytorch/pytorch#53124
- We cannot use this mechanism for other devices than CPU/CUDA, such as
  Metal Performance Shaders.
- It offers little flexibility in allocating Torch models on different
  devices.

This PR makes `PyTorchWrapper`/`PyTorchShim` flexible in terms of the
devices it can use. Both classes add a `device` argument to their
constructors that takes a `torch.device` instance. The shim ensures that
the model is on the given device. The wrapper ensures that input tensors
are on the correct device, by calling `xp2torch` with the new `device`
keyword argument.

Even though this approach offers more flexibility, as a default we want
to use the `cpu` device when `NumpyOps` is used and `cuda:N` when
CupyOps is used. In order to do so, this PR also adds a new function
`get_torch_default_device` that returns the correct device for the
currently active Ops. `PyTorchWrapper`/`PyTorchShim`/`xp2torch` use this
function when `None` is given as the device to fall back on this
default, mimicking the behavior from before this PR.
danieldk added a commit to explosion/thinc that referenced this issue May 24, 2022
* Remove use of `torch.set_default_tensor_type`

This PR removes use of `torch.set_default_tensor_type`. There are
various reasons why we should probably move away from using this
function:

- Upstream will deprecate and remove it:
  pytorch/pytorch#53124
- We cannot use this mechanism for other devices than CPU/CUDA, such as
  Metal Performance Shaders.
- It offers little flexibility in allocating Torch models on different
  devices.

This PR makes `PyTorchWrapper`/`PyTorchShim` flexible in terms of the
devices it can use. Both classes add a `device` argument to their
constructors that takes a `torch.device` instance. The shim ensures that
the model is on the given device. The wrapper ensures that input tensors
are on the correct device, by calling `xp2torch` with the new `device`
keyword argument.

Even though this approach offers more flexibility, as a default we want
to use the `cpu` device when `NumpyOps` is used and `cuda:N` when
CupyOps is used. In order to do so, this PR also adds a new function
`get_torch_default_device` that returns the correct device for the
currently active Ops. `PyTorchWrapper`/`PyTorchShim`/`xp2torch` use this
function when `None` is given as the device to fall back on this
default, mimicking the behavior from before this PR.

* Add some typing fixes

* Remove spurious cupy import

* Small fixes

- Use `torch.cuda.current_device()` to get the current PyTorch CUDA
  device.
- Do not use `torch_set_default_tensor_type` in `set_active_gpu`.
svlandeg added a commit to explosion/thinc that referenced this issue Jun 14, 2022
* Remove use of `torch.set_default_tensor_type` (#674)

* Remove use of `torch.set_default_tensor_type`

This PR removes use of `torch.set_default_tensor_type`. There are
various reasons why we should probably move away from using this
function:

- Upstream will deprecate and remove it:
  pytorch/pytorch#53124
- We cannot use this mechanism for other devices than CPU/CUDA, such as
  Metal Performance Shaders.
- It offers little flexibility in allocating Torch models on different
  devices.

This PR makes `PyTorchWrapper`/`PyTorchShim` flexible in terms of the
devices it can use. Both classes add a `device` argument to their
constructors that takes a `torch.device` instance. The shim ensures that
the model is on the given device. The wrapper ensures that input tensors
are on the correct device, by calling `xp2torch` with the new `device`
keyword argument.

Even though this approach offers more flexibility, as a default we want
to use the `cpu` device when `NumpyOps` is used and `cuda:N` when
CupyOps is used. In order to do so, this PR also adds a new function
`get_torch_default_device` that returns the correct device for the
currently active Ops. `PyTorchWrapper`/`PyTorchShim`/`xp2torch` use this
function when `None` is given as the device to fall back on this
default, mimicking the behavior from before this PR.

* Add some typing fixes

* Remove spurious cupy import

* Small fixes

- Use `torch.cuda.current_device()` to get the current PyTorch CUDA
  device.
- Do not use `torch_set_default_tensor_type` in `set_active_gpu`.

* Add `test_slow_gpu` explosion-bot command

* Auto-format code with black (#682)

Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>

* Azure: pin protobuf to fix Tensorflow

* Extend typing_extensions to <4.2.0 (#689)

* Add support for PyTorch Metal Performance Shaders (#685)

* Add `test_slow_gpu` explosion-bot command

* Auto-format code with black (#682)

Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>

* Add support for PyTorch Metal Performance Shaders

Nightly PyTorch versions add support for Metal Performance Shaders
(MPS). Metal is a low-level graphics API for Apple platforms that also
supports compute kernels (shaders). MPS is a framework of
highly-optimized compute and graphics kernels, including kernels for
neural networks. MPS is supported on both Apple Silicon, such as the M1
family of SoC, as well as a range of AMD GPUs used in Macs.

Since devices are handled in Thinc through a specific `Ops`
implementation (e.g. `CupyOps` == CUDA GPUs), this change introduces the
`MPSOps` class. This class is a subclass of `NumpyOps` or
`AppleOps` (when available). `MPSOps` does not override any methods, but
is used to signal to relevant code paths (e.g. `xp2torch`) that Torch
tensors should be placed on the MPS device.

The mapping in the previously introduced `get_torch_default_device`
function is updated to:

- `NumpyOps` -> `cpu`
- `CupyOps` -> `cuda:N`, where N is the selected CUDA device.
- `MPSOps` -> `mps`

to ensure placement of Torch tensors on the `mps` device when `MPSOps`
is active.

Finally, the following booleans have been added to or changed in
`compat`:

- `has_torch_mps` (new): PyTorch has MPS support
- `has_torch_mps_gpu` (new): PyTorch has MPS support and an
  MPS-capable GPU is available.
- `has_torch_cuda_gpu` (new): PyTorch has CUDA support and a
  CUDA-capable GPU is available.
- `has_torch_gpu` (changed): PyTorch has a GPU available (CUDA
  or MPS).

* Test PyTorch wrapper with all xp ops

* Azure: pin protobuf to fix Tensorflow

* Extend typing_extensions to <4.2.0 (#689)

* Fix type checking error

* Only back-off to NumpyOps on import error

We do not want to hide other issues while importing thinc_apple_ops.

* Remove unneeded `has_torch_mps` bool

* Add `has_gpu` bool and use it in `util`

* Replace another expression by has_gpu

* Set `has_torch_gpu` to `has_torch_cuda_gpu`

We need to decide whether we want to make the potentially breaking
change from `has_torch_cuda_gpu` to `has_torch_cuda_gpu or
has_torch_mps_gpu`. But since the latter is not needed for this PR,
remove the change.

* Update thinc/util.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

Co-authored-by: shademe <shadeMe@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

Co-authored-by: shademe <shadeMe@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
adrianeboyd added a commit to explosion/thinc that referenced this issue Oct 12, 2022
* Move compatiblity-related code into a separate `compat` module (#652)

* Add `compat` module to encapsulate imports of optional 3rd party frameworks/libraries

* Replace references to compat code in `.util` with references to `.compat`
Remove `cupy_ops. has_cupy` , `backends.has_cupy`, and `api.has_cupy`

* Update example notebook

* `util.set_active_gpu`: Return `None` if GPU is unavailable

* `util`: Import tensorflow and mxnet with shorthand names
Fix markdown formatting

* `api`: Re-export `has_cupy` from `compat`

* `backends`: Preserve `has_cupy` export for bwd-compat, remove superfluous imports

* Revert "Update example notebook"

This reverts commit 9f068a4.

* `util`: Revert changes to `set_active_gpu`, raise an error if no GPU is detected
Clarify docs

* NumpyOps: Add a method to get a table of C BLAS functions (#643)

* NumpyOps: Add a method to get a table of C BLAS functions

This table can be used for downstream `cdef nogil` functions that need
to use a BLAS function from the BLAS implementation used by an Ops
subclass.

* Bump blis requiment to >=0.9.0,<0.10.0

* NumpyOps: do not construct CBlas on every NumpyOps.cblas() call

* api-backends: Fix superfluous wording

* Fix a unit test in the PyTorch wrapper (#663)

* Fix a unit test in the PyTorch wrapper

This test checked whether the allocator was set to the PyTorch allocator
when the PyTorch shim is used. However, this is not the case when
PyTorch is installed, but CuPy isn't, so the test would fail. Since this
test relies on CuPy, disable it when CuPy is not available.

* Fix merge fallout

* `CupyOps`: Simplify `asarray` (#661)

* `CupyOps`: Simplify `asarray`

* Remove `cast_array` flag and use `astype` unconditionally

* Revert unconditional call to `astype`

* Remove no-op

* NumpyOps: Better type-casting in `asarray` (#656)

* `NumpyOps`: Better type-casting in `asarray`

* Simplify `dtype` check

* Update thinc/backends/numpy_ops.pyx

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Simplify casting further, avoid copies if possible

* Remove no-op

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Fix out-of-bounds writes in NumpyOps/CupyOps (#664)

* Fix out-of-bounds writes in NumpyOps/CupyOps

- Using `{CupyOps,NumpyOps}.adam` with incompatible shapes for weights,
  gradients, or moments resulted in out-of-bound writes.
- Using `NumpyOps.adam` with non-float32 arrays resulted filling arrays
  with incorrect data.

* Remove print debugging remnants

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* More print debugging remnants

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Set version to v8.1.0.dev0 (#666)

* Fix model.copy() bug where layer used more than once (#659)

* Fix model.copy() bug where layer used more than once

* Expand functionality to include shims

* Corrections after review

* Added default for Model._copy()

* `conftest.py`: Handle exception caused by `pytest` options being added twice in CI builds (#670)

* Auto-format code with `black` + Pin `black` requirement (#673)

* Add `autoblack` GitHub action

* Fix command

* Add `black` to `requirements.txt`

* Add support for bot-invoked slow tests (#672)

* `Shim`: Fix potential data race when allocated on different threads

* Fix two warnings (#676)

- torch.nn.functional.sigmoid is deprecated in favor of torch.sigmoid.
- Clip cosh input in sechsq to avoid overflow.

* Replace use of gpu_is_available with has_cupy_gpu (#675)

* Replace use of gpu_is_available with has_cupy_gpu

This PR is in preparation of better non-CUDA device support. Once we
support non-CUDA GPUs, there may be GPUs available that are not 'CuPy
GPUs'. In all places where we use `gpu_is_available` we actually mean:
is 'CuPy available with a CUDA GPU'? So, this PR replaces uses of
`gpu_is_available` to `has_cupy_gpu`. This allows us to use
`gpu_is_available` in the future to check if any GPU is available.

In addition to that, some code had expressions like

```
has_cupy and gpu_is_available()
```

This PR simplify such conditions to `has_cupy_gpu`, since `has_cupy_gpu`
implies that `has_cupy`.

* Remove unused import

* Improve error message when no CUDA GPU is found

* Fix another error message when no CUDA GPU is found

* Fixes for slow tests (#671)

* `test_uniqued`: Disable test timing for `test_uniqued_doesnt_change_result` (#678)

* `test_to_categorical`: Ensure that `label_smoothing < 0.5` (#680)

* `test_to_categorical`: Ensure that `label_smoothing < 0.5`

* Use `exclude_max` instead of clamping to `0.49`

* test_ops: do not lower precision in conversion to Torch tensor (#681)

* test_ops: do not lower precision in conversion to Torch tensor

float64 test values close to zero were rounded by conversion to a
float32 Torch tensor, resuling in mismatches between Thinc and Torch
gradients. This change prevents the loss in precision.

* test_ops: compare arrays on same device in Torch comparison

* test_maxout: compare arrays with same precision

* Add `test_slow_gpu` explosion-bot command

* Auto-format code with black (#682)

Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>

* Azure: pin protobuf to fix Tensorflow

* Extend typing_extensions to <4.2.0 (#689)

* xp2{tensorflow,torch}: convert NumPy arrays using dlpack (#686)

* xp2{tensorflow,torch}: convert NumPy arrays using dlpack

Newer versions of NumPy can expose arrays as dlpack capsules. Use this
functionality (when supported) to speed up NumPy -> Torch/Tensorflow
array conversion.

* Fix up copy paste error

* `test_model_gpu`: Use TF memory pool if available, feature-gate test (#688)

* `test_model_gpu`: Use TF memory pool if available, feature-gate test

* Fix typo

* `test_predict_extensive`: Disable test time monitoring

* Fix imports, use `has_cupy_gpu` for forward-compat

* `conftest`: Use `pytest_sessionstart` to enable TF GPU memory growth

* Bump version to v8.1.0.dev1 (#694)

* `NumpyOps`: Do not use global for `CBlas` (#697)

* Merge pytorch-device branch into master (#695)

* Remove use of `torch.set_default_tensor_type` (#674)

* Remove use of `torch.set_default_tensor_type`

This PR removes use of `torch.set_default_tensor_type`. There are
various reasons why we should probably move away from using this
function:

- Upstream will deprecate and remove it:
  pytorch/pytorch#53124
- We cannot use this mechanism for other devices than CPU/CUDA, such as
  Metal Performance Shaders.
- It offers little flexibility in allocating Torch models on different
  devices.

This PR makes `PyTorchWrapper`/`PyTorchShim` flexible in terms of the
devices it can use. Both classes add a `device` argument to their
constructors that takes a `torch.device` instance. The shim ensures that
the model is on the given device. The wrapper ensures that input tensors
are on the correct device, by calling `xp2torch` with the new `device`
keyword argument.

Even though this approach offers more flexibility, as a default we want
to use the `cpu` device when `NumpyOps` is used and `cuda:N` when
CupyOps is used. In order to do so, this PR also adds a new function
`get_torch_default_device` that returns the correct device for the
currently active Ops. `PyTorchWrapper`/`PyTorchShim`/`xp2torch` use this
function when `None` is given as the device to fall back on this
default, mimicking the behavior from before this PR.

* Add some typing fixes

* Remove spurious cupy import

* Small fixes

- Use `torch.cuda.current_device()` to get the current PyTorch CUDA
  device.
- Do not use `torch_set_default_tensor_type` in `set_active_gpu`.

* Add `test_slow_gpu` explosion-bot command

* Auto-format code with black (#682)

Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>

* Azure: pin protobuf to fix Tensorflow

* Extend typing_extensions to <4.2.0 (#689)

* Add support for PyTorch Metal Performance Shaders (#685)

* Add `test_slow_gpu` explosion-bot command

* Auto-format code with black (#682)

Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>

* Add support for PyTorch Metal Performance Shaders

Nightly PyTorch versions add support for Metal Performance Shaders
(MPS). Metal is a low-level graphics API for Apple platforms that also
supports compute kernels (shaders). MPS is a framework of
highly-optimized compute and graphics kernels, including kernels for
neural networks. MPS is supported on both Apple Silicon, such as the M1
family of SoC, as well as a range of AMD GPUs used in Macs.

Since devices are handled in Thinc through a specific `Ops`
implementation (e.g. `CupyOps` == CUDA GPUs), this change introduces the
`MPSOps` class. This class is a subclass of `NumpyOps` or
`AppleOps` (when available). `MPSOps` does not override any methods, but
is used to signal to relevant code paths (e.g. `xp2torch`) that Torch
tensors should be placed on the MPS device.

The mapping in the previously introduced `get_torch_default_device`
function is updated to:

- `NumpyOps` -> `cpu`
- `CupyOps` -> `cuda:N`, where N is the selected CUDA device.
- `MPSOps` -> `mps`

to ensure placement of Torch tensors on the `mps` device when `MPSOps`
is active.

Finally, the following booleans have been added to or changed in
`compat`:

- `has_torch_mps` (new): PyTorch has MPS support
- `has_torch_mps_gpu` (new): PyTorch has MPS support and an
  MPS-capable GPU is available.
- `has_torch_cuda_gpu` (new): PyTorch has CUDA support and a
  CUDA-capable GPU is available.
- `has_torch_gpu` (changed): PyTorch has a GPU available (CUDA
  or MPS).

* Test PyTorch wrapper with all xp ops

* Azure: pin protobuf to fix Tensorflow

* Extend typing_extensions to <4.2.0 (#689)

* Fix type checking error

* Only back-off to NumpyOps on import error

We do not want to hide other issues while importing thinc_apple_ops.

* Remove unneeded `has_torch_mps` bool

* Add `has_gpu` bool and use it in `util`

* Replace another expression by has_gpu

* Set `has_torch_gpu` to `has_torch_cuda_gpu`

We need to decide whether we want to make the potentially breaking
change from `has_torch_cuda_gpu` to `has_torch_cuda_gpu or
has_torch_mps_gpu`. But since the latter is not needed for this PR,
remove the change.

* Update thinc/util.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

Co-authored-by: shademe <shadeMe@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

Co-authored-by: shademe <shadeMe@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Expose `get_torch_default_device` through `thinc.api` (#698)

* Make `CBlas` methods standalone functions to avoid using vtables (#700)

* Make CBlas methods standalone functions to avoid using vtables

When testing #696, we found that adding new CBlas methods results in an
ABI compatibility. This would mean that every time we add a CBlas
method, we also have to rebuild spaCy.

The ABI incompatibility occurs because Cython generates a vtable for
cdef methods, even when the class or its methods are final. This vtable
is used by the caller to look up the (address of) the methods. When
methods are added, the vtable of the caller is out-of-sync when the
calling code is not recompiled.

This change works around this issue by making the methods of CBlas
standalone functions.

* Add link to PR in comments

For future reference.

* Add Dockerfile for building the website (#699)

* Add Dockerfile for building the website

This Dockerfile was taken from spaCy.

* README: Remove command substitution in example

* Bump version to v8.1.0.dev2 (#701)

* Use blis~=0.7.8 (#704)

Until the haswell bug is fixed in BLIS v0.9, switch back to blis~=0.7.8.

* Set version to v8.1.0.dev3 (#705)

* Speed up HashEmbed layer by avoiding large temporary arrays (#696)

* Speed up HashEmbed layer by avoiding large temporary arrays

The HashEmbed layer sums up keyed embeddings. For instance, a key matrix
of the shape (50000, 4) will result in 50,000 embeddings, each computed
by summing 4 embeddings. The HashEmbed layer computed the embeddings as
follows:

vectors[keys].sum(axis=1)

where `vectors` is an embedding matrix. However, this way of computing
embeddings results in very large allocations. Suppose that `vectors`
is (4000, 64). Even though the final embedding matrix is (50000, 64),
the first expression will construct a temporary array of shape
(50000, 4, 64).

This change avoids this by introducing a `gather_add` op as a
counterpart to `scatter_add`. In this particular example, the `NumpyOps`
implementation only allocates the final (50000, 64) array, computing
the embeddings in-place using the BLAS saxpy function.

In benchmarks with an M1 Max on de_core_news_lg, this improved
processing speed from 40511 WPS to 45591 (12.5% faster).

* Simplify saxpy call

* Fixup types

* NumpyOps.gather_add: add support for double

* NumpyOps.gather_add: support int and unsigned int indices

* Add gather_add CUDA kernel

* Add tests for gather_add

* Comment fixup

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* api-backends: document Ops.gather_add

* Ops.gather_add: arguments should be 2D arrays

* Comment fix

* Ops.gather_add returns Float2d

* docs: Ops.gather_add is new in 8.1

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Auto-format code with black (#706)

Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>

* Fix MyPy error when Torch without MPS support is installed (#708)

* Check that Torch-verified activations obey `inplace` (#709)

And fix some activations that do not obey the `inplace` kwarg.

* Increase test deadline to 30 minutes to prevent spurious test failures (#714)

* `test_mxnet_wrapper`: Feature-gate GPU test (#717)

* Add Ops.reduce_{first,last} plus tests (#710)

* Add Ops.reduce_{first,last} plus tests

* Add docs for reduce_{first,last}

* Typing fix

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Typing fixes (use InT)

* Fix some some reduction issues when using CuPy

* One maxout test fails with the latest CuPy.

Values of 5.9e-39 and 0 have an infinite relative difference. Accept
with a very strict tolerance (1e-10).

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Label smooth threshold fix (#707)

* correcting label smoothing param contraint

* test new label smooth validation error

* less than 0 input validation

* string concat

* small update to error msg

* fix max smoothing coefficient

* double check error message

* Update thinc/util.py

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* test error message fix

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Set version to v8.1.0 (#718)

* `get_array_module` with non-array input returns `None` (#703)

* if not xp array module is None

* raise error

* update test

* more detailed error

* Update thinc/tests/test_util.py

Co-authored-by: Daniël de Kok <me@github.danieldk.eu>

* Update thinc/util.py

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update thinc/tests/test_util.py

Co-authored-by: Daniël de Kok <me@github.danieldk.eu>
Co-authored-by: svlandeg <svlandeg@github.com>
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update build constraints and requirements for aarch64 wheels (#722)

* Extend build constraints for aarch64

* Skip mypy for aarch64

* Auto-format code with black (#723)

Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>

* Fix version string (#724)

* Extend to mypy<0.970 (#725)

* Fix typo

* Update build constraints for arm64 and aarch64 wheels (#716)

* Ops: replace FloatsType by constrained typevar (#720)

* Ops: replace FloatsType by constrained typevar

Ops used the `FloatsType`, which had `FloatsXd` as its bound. MyPy could
not infer that code such as the following is correct,

```
def dish(self, X: FloatsType, inplace: bool = False) -> FloatsType:
    tmp = X * X
    # ...
```

because the inferred type is the union (or a subtype). If we instead
constrain the type variable as follows:

```
FloatsType = TypeVar("FloatsType",
    Floats1d, Floats2d, Floats3d, Floats4d)
```

the type paramater will be instantiated with a single concrete type,
solving such issues.

* Remove a bunch of casts and ignores that are not necessary anymore

* Unroll `argmax` in `maxout` for small sizes of `P` (#702)

* Unroll `argmax` in `maxout` for small sizes of `P`

`maxout` uses the `argmax` function to determine the index of the
maximum value of each `P` inputs. `argmax` uses a generic array loop,
which impedes speculative execution and `could` also prevent unrolling
of the outer `maxout` loop.

This change unrolls `argmax` with small values of `P` using a variadic
template. This leads to a small performance improvement.

* Unmodernize struct initialization

* Change Docker image tag to thinc-ai (#732)

This is purely a cosmetic change, but less confusing than thinc-io :).

* Add `with_signpost_interval` layer (#711)

* Add with_signpost_interval layer

This layer wraps a layer, adding macOS interval signposts for the
forward and backward pass. These intervals can then be visualized
in the macOS Instruments.app timeline.

* Fix reference in api-layers.md

Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>

* End message is optional since signpost 0.0.3

* with_signpost_interval: also wrap init callback

* docs: we wrap init as well

* Add documentation fixes

Suggested by @svlandeg.

Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>

* Docs: Fix/update `label_smoothing` description, run prettier (#733)

* Add Dish activation (#719)

* Add Ops.(backprop_)dish and CUDA kernel

Dish is a Swish/GELU-like activation function. Since it does not rely on
elementary operations like `exp` or `erf`, it can generally be computed
faster than Swish and GELU:

https://twitter.com/danieldekok/status/1484898130441166853

* Make mypy happy

Apparently, X * X does not typecheck (?!?).

* test_compare_activations_to_torch: test with different dY

Also fix the backprop_dish CUDA kernel, which would fail now (thanks
@shadeMe).

* test_compare_activations_to_torch: be slightly more (absolute) tolerant

Or the Dish test would fail (possibly different accuracies for sqrt?).

* doc fix

* Update dish types to use `FloatsXdT`

* docs: add version tag to `(backprop_)dish`

* Add Dish Thinc layer

* Add Dish layer docs

Also update description as suggested by @kadarakos.

* Fix dish description

Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>

Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>

* Auto-format code with black (#737)

Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>

* Increment `blis` version upper-bound to `0.10.0` (#736)

* asarrayDf: take `Sequence[float]`, not `Sequence[int]` (#739)

* Use confection for configurations (#745)

* Remove redundant tests. Add confection to requirement.txt and setup.cfg. Adjust cnfig.py.

* Add reference to confection in website/docs/usage-config.md.

* Update confection reference in docs.

* Extend imports fro confection for backwards compatibility.

* `PyTorchGradScaler`: Cache `_found_inf` on the CPU (#746)

* `PyTorchGradScaler`: Cache `_found_inf` on the CPU

This prevents unnecessary overhead from launching kernels on the GPU in hot backward passes.

* Only pin `_found_inf` to the CPU

* Always store `_found_inf` as a `bool`

* More general remap_ids (#726)

* work with cupy arrays and 2d arrays

* force mypy pass

* addressing comments

* return correct shape empty array

* test remap_ids with Ints2d

* Update thinc/layers/remap_ids.py

Co-authored-by: Daniël de Kok <me@github.danieldk.eu>

* use numpy array

* remove cupy import

* mini fix

* more strict typing

* adjust test

* Update thinc/layers/remap_ids.py

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* remove check

* Update thinc/layers/remap_ids.py

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* address reviews

* Update thinc/layers/remap_ids.py

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* simplify casting

* Update thinc/layers/remap_ids.py

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update thinc/layers/remap_ids.py

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* remap_ids legacy

* legacy

* test version 1 and 2

* rename legacy to v1

* adding old test back

* remap_ids docs update

* Update website/docs/api-layers.md

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update website/docs/api-layers.md

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* make init/forward attribute setting more clear

* Update website/docs/api-layers.md

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update website/docs/api-layers.md

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update website/docs/api-layers.md

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* prettier

* update model type

* prettier

* Use new _v2 instead of renamed _v1

Co-authored-by: Daniël de Kok <me@github.danieldk.eu>
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Auto-format code with black (#753)

Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>

* Switch to macos-latest (#755)

* `util`: Explicitly call `__dlpack__` built-in method in `xp2tensorflow` (#757)

`tf.experimental.dlpack.from_dlpack` expects a `PyCapsule` object.

* Set version to 8.1.1 (#758)

* Remove references to FastAPI being an Explosion product (#761)

* Remove references to FastAPI being an Explosion product.

* Remove period at end of subheader.

* Update code example for Ragged (#756)

* Update code example for Ragged.

* Import from thinc.api.

* Update setup.cfg (#748)

Register fix_random_seed as a pytest-randomly entry point.

* Update cupy extras, quickstart (#740)

* Update cupy extras, quickstart

* Rename extra cuda-wheel to cuda-autodetect

* disable mypy run for Python 3.10 (#768)

* disable mypy run for Python 3.10

* dot

* Reorder requirements in requirements.txt (#770)

Move `confection` to the section with required explosion packages.

* Revert blis range to <0.8.0 (#772)

Due to more reports of access violations in windows, reduce supported
blis versions back to `<0.8.0`.

* Set version to v8.1.2 (#773)

* Fix `fix_random_seed` entrypoint in setup.cfg (#775)

* Support both Python 3.6 and Pydantic 1.10 (#779)

* support both Python 3.6 and Pydantic 1.10

* Simplify according to Adriane's suggestion

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* update to latest mypy and exclude Python 3.6 (#776)

* update to latest mypy and exclude Python 3.6

* fix typing of ops.alloc

* fix ArrayT usage in types.py

* Set version to v8.1.3 (#781)

* Update CI around conflicting extras requirements (#783)

* Update torch install, update package requirements after installing extra deps

* Only reinstall requirements

* Run test suite twice

* Check package requirements after extras

* Update thinc-apple-ops test for current macos jobs

* Move notebook extras

* Skip mypy in tests with extras

* Use torch<1.12.0

* Try to figure out numpy version (non)requirements

* More numpy version tests

* Adjust for all

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
Co-authored-by: Daniël de Kok <me@danieldk.eu>
Co-authored-by: Richard Hudson <richard@explosion.ai>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>
Co-authored-by: kadarakos <kadar.akos@gmail.com>
Co-authored-by: Daniël de Kok <me@github.danieldk.eu>
Co-authored-by: svlandeg <svlandeg@github.com>
Co-authored-by: Christian Clauss <cclauss@me.com>
Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>
Co-authored-by: Raphael Mitsch <r.mitsch@outlook.com>
Co-authored-by: Will Frey <jfrey89@gmail.com>
Co-authored-by: Timothée Mazzucotelli <pawamoy@pm.me>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: deprecation triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

3 participants