## Setup

In [127]:
%load_ext autoreload
%autoreload 2
import warnings
warnings.filterwarnings('ignore')

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [1]:
# check your NumPy version
import numpy as np

In [20]:
np.__version__

'1.26.4'

## NumPy 2.0 - why do we care?

* First major release since 2006
* Is not backwards compatible with previous numpy 1.x versions
* will break your existing code if not careful

### How do package releases/updates work?
* NumPy enhancement proposals (NEPs)
* Let's explore them here - https://numpy.org/neps/
* Release notes - https://numpy.org/devdocs/release/2.0.0-notes.html/

### What is backwards compatibility?
* With the new update, stated in NEP 52, NumPy is breaking away from its usual stance of having backward compatibility to ensure a more streamlined API that's cleaner and simpler.
* See NEP 52 at - https://numpy.org/neps/nep-0052-python-api-cleanup.html

## Major changes
* API and ABI changes
* New custom DTypes
* Scalar promotion rules
* Performance improvements

### API and ABI changes

In this update, there have been several cleanup changes made to the API and ABI (Application Binary Interface).
Here are some notable changes:
* __Public and private API split__: Using a module structure, the NumPy API for Python now has a clear split between public and private API.
* __Namespace cleanup__: Functions have been simplified to make learning NumPy easier. See the full removal list for more info.
* __Deprecating niche functionality__: Many of the non-recommended functions and aliases have been removed.
* __New custom type__: For the C API, a new public API for creating custom DType was released.

__Public and private API split__ <br>
`np.core` now becomes `np._core` to explicity denote it as private. <br>
Learn more about private methods at https://www.datacamp.com/tutorial/python-private-methods-explained

__Namespace cleanup__<br>
Functions have been simplified to make learning NumPy easier. See the full removal list for more info - https://numpy.org/devdocs/release/2.0.0-notes.html#numpy-2-0-python-api-removals

* About 100 members of the main np namespace have been deprecated, removed, or moved to a new place. It was done to reduce clutter and establish only one way to access a given attribute. Some examples are - 
    * `Inf`, `Infinity`, `infty` all replaced by `np.inf`
    * `row_stack` which was an alias for `vstack` now removed

NEP 52 outlines the reasons for these changes - https://numpy.org/neps/nep-0052-python-api-cleanup.html

__Deprecating niche functionality__<br>
Some niche functions like `newbyteorder` were deprecated for being too niche

__C API improvements__<br>
* A new public C API for creating custom dtypes - https://numpy.org/devdocs/reference/c-api/array.html#dtype-api
* Many outdated functions and macros removed, and private internals hidden to ease future extensibility
* New, easier to use, initialization functions: `PyArray_ImportNumPyAPI` and `PyUFunc_ImportUFuncAPI`.
* For the ABI changes, there will be a breakage with the new release. This will impact binaries of packages that use the NumPy C API.
* Anyone who builds it against any previous NumPy 1.xx release will not work with NumPy 2.0. I'll share more on how things can be migrated according to their recommendation. You'll encounter an ImportError message, which indicates binary incompatibility.

### Scalar promotion rules
- value based promotion vs dtype based promotion rules
- The main backward compatibility issue is the precision of your scalars.

In [7]:
# promotion rules
np.float32(6)

6.0

In [8]:
np.result_type(np.float32(6))

dtype('float32')

What is float32 ? 
https://en.wikipedia.org/wiki/Single-precision_floating-point_format

The data-type is important because it limits the precision to which a number can be stored, and also determines how big of a number can be stored. <br>
For example, an n-bit datatype can store 2^n large numbers.

In [9]:
np.result_type(np.float32(6) + 6)

dtype('float64')

The resulting output data-type got "promoted" from `float32` to `float64`. This is confusing and unexpected. Why did this happen?

In [11]:
np.result_type(6)

dtype('int64')

1. what happens when you combine dissimilar data types
2. The largest backwards compatibility change is that the precision of scalars is now preserved consistently.  Two examples are:
	1. `np.float32(3) + 3.` now returns a `float32` when it previously returned a `float64`.
	2. `np.array([3], dtype=np.float32) + np.float64(3)` will now return a `float64` array.  (The higher precision of the scalar is not ignored.)

__Old rule - values based casting__ <br>
Since NumPy 1.7, promotion rules use so-called “safe casting” which relies on inspection of the values involved. This helped identify a number of edge cases for users, but was complex to implement and also made behavior hard to predict.

__New rule - promotion rules based casting__ <br>
https://numpy.org/neps/nep-0050-scalar-promotion.html#schema-of-the-new-proposed-promotion-rules

### New DType API and String DType

As proposed in NEP 41, this 2.0 update releases a new API for implementing user-defined custom data types using the StringDType data type. This new API includes native support for variable-length string data types, which NumPy users have long requested.

* What are DTypes - 
https://numpy.org/devdocs/reference/arrays.dtypes.html#arrays-dtypes
* DTypes in NumPy - 
https://numpy.org/devdocs/reference/routines.dtypes.html#numpy.dtypes.StringDType
* NEP 41 user impact - https://numpy.org/neps/nep-0041-improved-dtype-support.html#user-impact

Some examples of impact of custom DTypes-
* bfloat16, used in deep learning ( https://en.wikipedia.org/wiki/Bfloat16_floating-point_format)
* categorical types
* physical units (such as meters)

### Performance improvements
* Faster sorting functions
* Improvements to linear algebra operations performance on macOS through macOS Accelerate
* To better suit Windows users, the NumPy team has also fixed several compatibility issues with the new release. For example, they have changed the default integer type on Windows to int64 rather than int32, to match the default behavior on other OS platforms.

## Migrating to NumPy 2.0

First off, to get started with this new version of NumPy, you’ll need to upgrade it from your current version to the latest.

To do this, simply use the the upgrade option in this command:

`pip install -U numpy`

* Addressing NumPy data type promotion changes
    - For a full guide on how on the several change behaviors and expected data type changes, refer to this table in NEP 50.
* Addressing namespace changes
    - Check for deprecated aliases and migration guidelines.
    - Replace deprecated aliases with a backward-compatible alternative.
    - Check for private members that were removed in the 2.0 update.
    - Use the existing API if private members were used.

__Ruff plugin to update Python code__
To help you through your migration process, you can use the Ruff plugin with a dedicated Ruff rule, NPY201.
- Ruff plugin - https://numpy.org/devdocs/numpy_2_0_migration_guide.html#ruff-plugin
- Rule NPY201 - https://docs.astral.sh/ruff/rules/numpy2-deprecation/

Ruff is available as ruff on PyPI. 

To install Ruff, enter this in the command line:
`pip install ruff`

Make sure to specify the NP201 rule to your pyproject.toml
`[tool.ruff.lint]
select = ["NPY201"]`


Alternatively, you can also specify the NPY201 rule using the command line:
`$ ruff check path/to/code/ --select NPY201`