In [1]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

Reason for NumPy's existence: efficiency on large arrays of data.

Why is it efficient?
- Stores data in a contiguous block of memory, independent of other built-in Python objects.
- Written in C
- No type checking or other python overheads
- Performs complex computations on entire arrays without using `for` loops.

In [2]:
my_arr = np.arange(1000000)
my_list = list(range(1000000))
print("Numpy Timings:")
%time for _ in range(10): my_arr2 = my_arr * 2

print("Python for loop timings")
%time for _ in range(10): my_list2 = [x * 2 for x in my_list]

Numpy Timings:
CPU times: user 10.4 ms, sys: 12.1 ms, total: 22.5 ms
Wall time: 23.5 ms
Python for loop timings
CPU times: user 548 ms, sys: 193 ms, total: 742 ms
Wall time: 741 ms


> An **ndarray** is a generic multidimensional container for homogeneous data; that is, <mark>all of the elements must be the same type.</mark>

## Attributes

1. `ndim`
2. `shape`
3. `dtype`

## Array Creation Function

| Function | Description |
| --- | --- |
| `array` | Convert input data (list, tuple, array, or other sequence type) to an ndarray either by inferring a dtype or explicitly specifying a dtype; copies the input data by default |
| `asarray` | Convert input to ndarray, but do not copy if the input is already an ndarray |
| `arange` | Like the built-in `range` but returns an ndarray instead of a list |
| `ones, ones_like` | Produce an array of all 1s with the given shape and dtype; `ones_like` takes another array and produces a ones array of the same shape and dtype |
| `zeros, zeros_like` | Like `ones` and `ones_like` but producing arrays of 0s instead |
| `empty, empty_like` | Create new arrays by allocating new memory, but do not populate with any values like `ones` and `zeros` |
| `full, full_like` | Produce an array of the given shape and dtype with all values set to the indicated "fill value" `full_like` takes another array and produces a filled array of the same shape and dtype |
| `eye, identity` | Create a square N × N identity matrix (1s on the diagonal and 0s elsewhere) |

## Numpy `dtype`s

| Type | Type code | Description |
| --- | --- | --- |
| `int8, uint8` | `i1, u1` | Signed and unsigned 8-bit (1 byte) integer types |
| `int16, uint16` | `i2, u2` | Signed and unsigned 16-bit integer types |
| `int32, uint32` | `i4, u4` | Signed and unsigned 32-bit integer types |
| `int64, uint64` | `i8, u8` | Signed and unsigned 64-bit integer types |
| `float16` | `f2` | Half-precision floating point |
| `float32` | `f4 or f` | Standard single-precision floating point; compatible with C float |
| `float64` | `f8 or d` | Standard double-precision floating point; compatible with C double and Python `float` object |
| `float128` | `f16 or g` | Extended-precision floating point |
| `complex64`, `complex128`, `complex256` | `c8, c16, c32` | Complex numbers represented by two 32, 64, or 128 floats, respectively |
| `bool` | ? | Boolean type storing `True` and `False` values |
| `object` | O | Python object type; a value can be any Python object |
| `string_` | S | Fixed-length ASCII string type (1 byte per character); for example, to create a string dtype with length 10, use `'S10'` |
| `unicode_` | U | Fixed-length Unicode type (number of bytes platform specific); same specification semantics as `string_` (e.g., `'U10'`) |

## Data type conversion

Use `astype`

`

In [3]:
arr = np.array([1, 2, 3, 4, 5])
print(arr.dtype)
float_arr = arr.astype(np.float64)
print(float_arr.dtype)

int64
float64


`numpy.string_` type, as string data in NumPy is <mark>fixed size and may truncate input without warning</mark>

## Universal Functions: Fast Element-wise Array functions

> Also called *ufunc*.

Fast vectorized wrappers for simple functions that take one or more scalar values and produce one or more scalar results.


### Unary functions

| Function | Description |
| --- | --- |
| `abs, fabs` | Compute the absolute value element-wise for integer, floating-point, or complex values |
| `sqrt` | Compute the square root of each element (equivalent to `arr ** 0.5`) |
| `square` | Compute the square of each element (equivalent to `arr ** 2`) |
| `exp` | Compute the exponent ex of each element |
| `log, log10, log2, log1p` | Natural logarithm (base *e*), log base 10, log base 2, and log(1 + x), respectively |
| `sign` | Compute the sign of each element: 1 (positive), 0 (zero), or --1 (negative) |
| `ceil` | Compute the ceiling of each element (i.e., the smallest integer greater than or equal to that number) |
| `floor` | Compute the floor of each element (i.e., the largest integer less than or equal to each element) |
| `rint` | Round elements to the nearest integer, preserving the `dtype` |
| `modf` | Return fractional and integral parts of array as a separate array |
| `isnan` | Return boolean array indicating whether each value is `NaN` (Not a Number) |
| `isfinite, isinf` | Return boolean array indicating whether each element is finite (non-`inf`, non-`NaN`) or infinite, respectively |
| `cos, cosh, sin, sinh, tan, tanh` | Regular and hyperbolic trigonometric functions |
| `arccos, arccosh, arcsin, arcsinh, arctan, arctanh` | Inverse trigonometric functions |
| `logical_not` | Compute truth value of `not x` element-wise (equivalent to `~arr`). |

### Binary Functions

| Function | Description |
| --- | --- |
| `add` | Add corresponding elements in arrays |
| `subtract` | Subtract elements in second array from first array |
| `multiply` | Multiply array elements |
| `divide, floor_divide` | Divide or floor divide (truncating the remainder) |
| `power` | Raise elements in first array to powers indicated in second array |
| `maximum, fmax` | Element-wise maximum; `fmax` ignores `NaN` |
| `minimum, fmin` | Element-wise minimum; `fmin` ignores `NaN` |
| `mod` | Element-wise modulus (remainder of division) |
| `copysign` | Copy sign of values in second argument to values in first argument |
| `greater, greater_equal, less, less_equal, equal, not_equal` | Perform element-wise comparison, yielding boolean array (equivalent to infix operators `>, >=, <, <=, ==, !=`) |
| `logical_and, logical_or, logical_xor` | Compute element-wise truth value of logical operation (equivalent to infix operators `& |, ^`) |

## Vectorization



## Broadcasting

## Writing conditional statements in NumPy

Use `np.where`

In [4]:
xarr = np.array([1.1, 1.2, 1.3, 1.4, 1.5])
yarr = np.array([2.1, 2.2, 2.3, 2.4, 2.5])
cond = np.array([True, False, True, True, False])

result1 = [(x if c else y) for x, y, c in zip(xarr, yarr, cond)]
print(result1, type(result1))

result2 = np.where(cond, xarr, yarr)
print(result2, type(result2))

[1.1, 2.2, 1.3, 1.4, 2.5] <class 'list'>
[1.1 2.2 1.3 1.4 2.5] <class 'numpy.ndarray'>


In [5]:
rng = np.random.default_rng(123)
x = np.arange(1000000)
y = rng.integers(low=0,high=1000000, size=1000000)
condition = rng.integers(low=0,high=2, size=1000000).astype(np.bool8)

In [6]:
%time result1 = [(i if c else j) for i, j, c in zip(x, y, condition)]

%time result2 = np.where(condition, x, y)

CPU times: user 331 ms, sys: 22 ms, total: 353 ms
Wall time: 353 ms
CPU times: user 6.21 ms, sys: 0 ns, total: 6.21 ms
Wall time: 5.81 ms


## Mathematical and Statistical Methods

Examples
- `sum`
- `mean`
- `median`
- `std`
- `cumsum` : cummulative sum
- `cumprod` : cummulative product

In [7]:
arr = np.array([0, 1, 2, 3, 4, 5, 6, 7])
arr.cumsum()

array([ 0,  1,  3,  6, 10, 15, 21, 28])

In [8]:
arr[1:].cumprod()

array([   1,    2,    6,   24,  120,  720, 5040])

### Basic Array Statistical Methods

| Method | Description |
| --- | --- |
| `sum` | Sum of all the elements in the array or along an axis; zero-length arrays have sum 0 |
| `mean` | Arithmetic mean; zero-length arrays have `NaN` mean |
| `std, var` | Standard deviation and variance, respectively, with optional degrees of freedom adjustment (default denominator `n`) |
| `min, max` | Minimum and maximum |
| `argmin, argmax` | Indices of minimum and maximum elements, respectively |
| `cumsum` | Cumulative sum of elements starting from 0 |
| `cumprod` | Cumulative product of elements starting from 1 |