# 06.1 - Intro to Dask: Tensors

Dask was created to handle ammounts of data that numpy simply cannot handle. When you try to create large sets of data you get:

In [1]:
import numpy as np

In [2]:
vec = np.ones((10000, 10000, 1000))

MemoryError: Unable to allocate 745. GiB for an array with shape (10000, 10000, 1000) and data type float64

Numpy was simply not built to handle large chunks of data.

Dask on the other hand:

In [4]:
import dask.array as da
import dask
dask.config.set({"visualization.engine": "cytoscape"})

<dask.config.set at 0x22268f47110>

In [5]:
vec = da.ones((10000, 10000, 1000))

In [6]:
vec

Unnamed: 0,Array,Chunk
Bytes,745.06 GiB,126.51 MiB
Shape,"(10000, 10000, 1000)","(255, 255, 255)"
Dask graph,6400 chunks in 1 graph layer,6400 chunks in 1 graph layer
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 745.06 GiB 126.51 MiB Shape (10000, 10000, 1000) (255, 255, 255) Dask graph 6400 chunks in 1 graph layer Data type float64 numpy.ndarray",1000  10000  10000,

Unnamed: 0,Array,Chunk
Bytes,745.06 GiB,126.51 MiB
Shape,"(10000, 10000, 1000)","(255, 255, 255)"
Dask graph,6400 chunks in 1 graph layer,6400 chunks in 1 graph layer
Data type,float64 numpy.ndarray,float64 numpy.ndarray


We have now created a tensor with size 10000 * 1000 * 1000.

A tensor is a mathematical object. It can have any integer positive dimension, in its most basic definition.

Think of a tensor as a generalization of all the mathematical objects you have seen so far.

A scalar is a zero-rank tensor. A vector is a first rank tensor, a matrix is a second-rank tensor, and so on.

In [7]:
da.ones(1)

Unnamed: 0,Array,Chunk
Bytes,8 B,8 B
Shape,"(1,)","(1,)"
Dask graph,1 chunks in 1 graph layer,1 chunks in 1 graph layer
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 8 B 8 B Shape (1,) (1,) Dask graph 1 chunks in 1 graph layer Data type float64 numpy.ndarray",1  1,

Unnamed: 0,Array,Chunk
Bytes,8 B,8 B
Shape,"(1,)","(1,)"
Dask graph,1 chunks in 1 graph layer,1 chunks in 1 graph layer
Data type,float64 numpy.ndarray,float64 numpy.ndarray


In [8]:
da.ones(10)

Unnamed: 0,Array,Chunk
Bytes,80 B,80 B
Shape,"(10,)","(10,)"
Dask graph,1 chunks in 1 graph layer,1 chunks in 1 graph layer
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 80 B 80 B Shape (10,) (10,) Dask graph 1 chunks in 1 graph layer Data type float64 numpy.ndarray",10  1,

Unnamed: 0,Array,Chunk
Bytes,80 B,80 B
Shape,"(10,)","(10,)"
Dask graph,1 chunks in 1 graph layer,1 chunks in 1 graph layer
Data type,float64 numpy.ndarray,float64 numpy.ndarray


In [9]:
da.ones((10, 10))

Unnamed: 0,Array,Chunk
Bytes,800 B,800 B
Shape,"(10, 10)","(10, 10)"
Dask graph,1 chunks in 1 graph layer,1 chunks in 1 graph layer
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 800 B 800 B Shape (10, 10) (10, 10) Dask graph 1 chunks in 1 graph layer Data type float64 numpy.ndarray",10  10,

Unnamed: 0,Array,Chunk
Bytes,800 B,800 B
Shape,"(10, 10)","(10, 10)"
Dask graph,1 chunks in 1 graph layer,1 chunks in 1 graph layer
Data type,float64 numpy.ndarray,float64 numpy.ndarray


In [10]:
da.ones((10, 10, 10, 10, 100))

Unnamed: 0,Array,Chunk
Bytes,7.63 MiB,7.63 MiB
Shape,"(10, 10, 10, 10, 100)","(10, 10, 10, 10, 100)"
Dask graph,1 chunks in 1 graph layer,1 chunks in 1 graph layer
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 7.63 MiB 7.63 MiB Shape (10, 10, 10, 10, 100) (10, 10, 10, 10, 100) Dask graph 1 chunks in 1 graph layer Data type float64 numpy.ndarray",10  10  100  10  10,

Unnamed: 0,Array,Chunk
Bytes,7.63 MiB,7.63 MiB
Shape,"(10, 10, 10, 10, 100)","(10, 10, 10, 10, 100)"
Dask graph,1 chunks in 1 graph layer,1 chunks in 1 graph layer
Data type,float64 numpy.ndarray,float64 numpy.ndarray


---
## Why we use tables

No matter the dimensionality of our tensor, there is always a manner of representing that information in tabular form.

Let us say that we have measurements of geographical coordinates, __Longitude__(Lon) and __Latitude__(Lat), and for each coordinate we have values for __Temperature__(T), __Atmospheric Pressure__(P), and __Wind Speed__(W).

 <img src="https://cdn.britannica.com/63/2063-050-89E52B49/Perspective-globe-grid-parallels-meridians-longitude-latitude.jpg" alt="Coordinate system from Encyclopedia Britannica" width="500" height="600"> 

<div class="alert alert-info">
    <br>
    <b>What shape would this tensor have?</b>      
    <br>
    <br>
</div>

Since we have a single value of T, P, and W, for each geographical location, the shape would depende on our geographic granularity.

For this exercise, let's assume we have a point for a single degree of Lat and Lon. So we have (-90, 90) to Latitude and (-180, 180) to Longitude.

In [11]:
vec = da.zeros((181, 361, 1, 1, 1))

In [12]:
vec

Unnamed: 0,Array,Chunk
Bytes,510.48 kiB,510.48 kiB
Shape,"(181, 361, 1, 1, 1)","(181, 361, 1, 1, 1)"
Dask graph,1 chunks in 1 graph layer,1 chunks in 1 graph layer
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 510.48 kiB 510.48 kiB Shape (181, 361, 1, 1, 1) (181, 361, 1, 1, 1) Dask graph 1 chunks in 1 graph layer Data type float64 numpy.ndarray",361  181  1  1  1,

Unnamed: 0,Array,Chunk
Bytes,510.48 kiB,510.48 kiB
Shape,"(181, 361, 1, 1, 1)","(181, 361, 1, 1, 1)"
Dask graph,1 chunks in 1 graph layer,1 chunks in 1 graph layer
Data type,float64 numpy.ndarray,float64 numpy.ndarray


On a relational database, this could be represented as:

| Latitude | Longitude | Temperature | Pressure | Wind speed |
|---|---|---|---|---|
| Lat<sub>0</sub>  | Lon<sub>0</sub> | T<sub>0,0</sub> | P<sub>0,0</sub> | W<sub>0,0</sub> |
| Lat<sub>0</sub>  | Lon<sub>1</sub> | T<sub>0,1</sub> | P<sub>0,1</sub> | W<sub>0,1</sub> |
| Lat<sub>0</sub>  | Lon<sub>2</sub> | T<sub>0,2</sub> | P<sub>0,2</sub> | W<sub>0,2</sub> |
|...|...|...|...|...|
| Lat<sub>180</sub>  | Lon<sub>360</sub> | T<sub>180,360</sub> | P<sub>180,360</sub> | W<sub>180,360</sub> |

<div class="alert alert-danger">
    <br>
    <b>NEVER pivot a large table!</b>      
    <br>
    <br>
</div>

Relational databases are optimized to handle rows.

__There is a direct relation between a tensor of any rank (dimensionality) and a table.__

If you can idealize it as multidimensional object, it is possible to create a 2D table (rows, columns) with the same information.