<a href="https://colab.research.google.com/github/osipov/edu/blob/master/pyt0/Demo_PyTorch_Tensors.ipynb" target="_blank"><img src="https://colab.research.google.com/assets/colab-badge.svg"/></a>

# Python Doesn't Have Good Numeric Support
* Python integers are actually an object with header and typing information
* access to Python integers requires a level of indirection
* In C, integers are directly accessible in memory without indirection
<img src="https://github.com/osipov/edu/raw/master/pyt0/images/python-01.png" width=700 height=400>

## The Problem is Even Worse for Python Lists 
* Python lists are immensely flexible
  * no fixed size
  * OK to have heterogeneous data
* ...but as a result they are not likely to be contiguous in memory
* and even if they are, there is still a lot of indirection required
* so they aren't good for fast number crunching
<img src="https://github.com/osipov/edu/raw/master/pyt0/images/python-02.png" width=700 height=700>

In [0]:
pylist = list(range(1_000_000))
%timeit [i + 1 for i in pylist]

## One solution is to use PyTorch tensors
* written in C++
* allows for vectorized operations

In [0]:
!pip install --upgrade -q torch==1.7.0
import torch as pt
pt.__version__

## PyTorch Scalars

In [0]:
pt.tensor(42)

In [0]:
pt.tensor(42).dtype

In [0]:
pt.tensor(42).shape

In [0]:
len(pt.tensor(42).shape) == 0

In [0]:
pt.tensor(3.14).dtype

In [0]:
pt.tensor(3.14).item()

In [0]:
pt.tensor(3.14).item() == 3.14

## IEEE Standard for Floating-Point Arithmetic (IEEE 754) 
* a refresher on floating point precision issues

In [0]:
x = 0.3
x

In [0]:
3 * 0.1 == x

In [0]:
3 * 0.1

In [0]:
x = pt.tensor(3.14)

In [0]:
x.to(pt.uint8).item()

In [0]:
pt.tensor(x.to(pt.uint8))

In [0]:
pt.tensor(x.to(pt.uint8).item(), dtype = pt.float32)

## __pt.trunc()__

* nearest integer __`i`__ which is closer to zero than __`x`__ is

In [0]:
pt.trunc(x)

In [0]:
pt.trunc(pt.tensor(2.01)).dtype

## __pt.floor()__

* the largest integer __`i`__, such that __`i <= x`__

In [0]:
pt.floor(x)

In [0]:
pt.floor(pt.tensor(2.01))

In [0]:
pt.floor(pt.tensor(2.))

In [0]:
pt.floor(pt.tensor(-3.14))

## __pt.ceil()__

* the smallest integer __`i`__, such that __`i >= x`__

In [0]:
pt.ceil(x)

In [0]:
pt.ceil(pt.tensor(2.01))

In [0]:
pt.ceil(pt.tensor(2.))

* can __pt.ceil()__ be used in place of __pt.floor()__ ?

In [0]:
pt.ceil(x) - 1

In [0]:
pt.ceil(pt.tensor(2.01)) - 1

In [0]:
pt.ceil(pt.tensor(2.)) - 1

## PyTorch arrays
* data is stored contiguously in memory

In [0]:
# pytorch will infer the data type
a = pt.tensor([1, 4, 2, 5, 3])
a, a.dtype

In [0]:
a = pt.tensor([3.14, 4, 2, 3])
a, a.dtype

In [0]:
# ...or you can be explicit
a = pt.tensor([1, 2, 3, 4], dtype=pt.float32)
a

In [0]:
pt.tensor([range(i, i + 3) for i in [2, 4, 6]])

In [0]:
pt.zeros(10, dtype=int)

In [0]:
pt.ones((3, 5), dtype=float)

In [0]:
pt.eye(5)

In [0]:
pt.full((3, 5), 42, dtype=int)

In [0]:
pt.arange(0, 20, 2)

In [0]:
pt.linspace(0, 1, 5)

## Pseudo-Random Numbers

In [0]:
pt.manual_seed(0);

In [0]:
pt.randn(3, 3)

In [0]:
pt.normal(0, 1, size = (3, 3))


In [0]:
pt.randint(0, 10, (3, 3))

## Converting array types

In [0]:
x = pt.linspace(0, 10, 50)
x

In [0]:
x.to(int)

## Multi-dimensional Arrays

In [0]:
x2 = pt.randint(10, size=[3, 4])
x2

## True "matrix-style" indexing

In [0]:
x2[0, 0]

In [0]:
x2[2, 0]

In [0]:
x2[2, -1]

In [0]:
x2[0, 0] = 12
x2

In [0]:
pt.arange(0, 9).reshape(3, 3)

## Array Slicing

In [0]:
x = pt.arange(10)
x[:5]

In [0]:
x[5:]

In [0]:
x[4:7]

In [0]:
x[::2]

In [0]:
x[1::2]

In [0]:
x[::-1] # :)

In [0]:
reversed(x)

In [0]:
reversed(x)[5::2]

## Filtering 1-dimensional data

In [0]:
x = pt.tensor([ 1, 0, 5, 2, 1, 0, 8, 0, 0 ])

In [0]:
x.nonzero()

In [0]:
x[x.nonzero()]

In [0]:
x[x < 3]

## Filtering 2-dimensional data

In [0]:
x = pt.tensor([[1, 0, 0], [0, 5, 0], [7, 8, 0]])
x

In [0]:
# produces two arrays, one with x coords, one with y coords
x.nonzero()

In [0]:
x.nonzero(as_tuple = True)

In [0]:
x[x.nonzero(as_tuple = True)]

In [0]:
y = pt.arange(1, 10).reshape(3, 3)
y

In [0]:
y.index_select(dim = 0, index = pt.tensor([0, 2]))

In [0]:
y.index_select(dim = 1, index = pt.tensor([0, 2]))

In [0]:
y.triu()

In [0]:
y.tril()

In [0]:
y.tril().T #transpose

## Multi-dimensional subarrays

In [0]:
x2

In [0]:
x2[:2, :3]

In [0]:
x2[:3, ::2]

In [0]:
x2[::-1, ::-1]

In [0]:
reversed(x2)

In [0]:
indices = pt.arange(x2.numel() - 1, -1, -1)
indices

In [0]:
pt.take(x2, indices).reshape(x2.shape) #x2[::-1, ::-1]

## Subarray Views

In [0]:
x2, id(x2)

In [0]:
x2_sub = x2[:2, :2]
x2_sub, id(x2_sub)

In [0]:
x2_sub[0, 0] = 99
x2_sub

In [0]:
x2 # changes x2 as well, since the subarray has references to the original

## PyTorch ATen Functions
* operate on tensors as on contiguous blobs of data in memory
* _vectorized_ wrapper for a function that takes a fixed number of specific inputs and produces a fixed number of specific outputs

| Operator | ATen            | Description                         |
|----------|-----------------|-------------------------------------|
|   +      | pt.add          | Addition (e.g., 1 + 1 = 2)          |
|   -      | pt.subtract     | Subtraction (e.g., 3 - 2 = 1)       |
|   -      | pt.negative     | Unary negation (e.g., -2)           |
|   *      | pt.multiply     | Multiplication (e.g., 2 * 3 = 6)    |
|   /      | pt.divide       | Division (e.g., 3 / 2 = 1.5)        |
|   //     | pt.floor_divide | Floor division (e.g., 3 // 2 = 1)   |
|   **     | pt.power        | Exponentiation (e.g., 2 ** 3 = 8)   |
|   %      | pt.mod          | Modulus/remainder (e.g., 9 % 4 = 1) |

## Vectorized Operations

In [0]:
pytorch = pt.arange(1_000_000)
%timeit pytorch + 1

In [0]:
x = pt.arange(9).reshape((3, 3))
2 ** x

In [0]:
x = pt.arange(4)
-(0.5 * x + 1) ** 2

## Exponents and Logarithms 

In [0]:
x = pt.tensor([1., 2., 3.])
pt.exp(x)

In [0]:
pt.pow(3, x)

In [0]:
pt.log(pt.tensor([1., 2., 3.]))

In [0]:
pt.log2(pt.tensor([1., 256., 65536.]))

In [0]:
pt.log10(pt.tensor([1_000., 1_000_000., 10. ** 10]))

## Aggregations

In [0]:
x = pt.arange(15).reshape(3, 5)
x

In [0]:
x.sum()

In [0]:
x.sum(dim = 0)

In [0]:
x.sum(dim = 1, keepdims = True)

In [0]:
x.sum(dim = 1)

In [0]:
x.to(float).mean(), x.to(float).std()

Copyright 2020 CounterFactual.AI LLC. Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.