# "Move array"

* [python - What is the difference between Numpy's array() and asarray() functions? - Stack Overflow](https://stackoverflow.com/questions/14415741/what-is-the-difference-between-numpys-array-and-asarray-functions)

In [4]:
import numpy as np

In [5]:
# 1GB = 1024*1024*1024 / 8
l = [.0]*(2**(30-3));
type(l), len(l)

(list, 134217728)

In [6]:
%%time
x0 = np.array(l, copy=True)

CPU times: user 5.73 s, sys: 94.6 ms, total: 5.82 s
Wall time: 5.79 s


In [7]:
%%time
x1 = np.array(l, copy=False)

CPU times: user 5.72 s, sys: 79.2 ms, total: 5.8 s
Wall time: 5.77 s


In [8]:
%%time
x2 = np.asarray(l)

CPU times: user 5.69 s, sys: 89.2 ms, total: 5.78 s
Wall time: 5.75 s


### Python objectを読み込む場合

* `dtype` を指定した方が早い
* `copy` は関係ない (`asarray` との差は無い)

In [12]:
%%time
x0 = np.array(l, copy=True, dtype=np.float64)

CPU times: user 3.47 s, sys: 78.6 ms, total: 3.55 s
Wall time: 3.54 s


In [13]:
%%time
x1 = np.array(l, copy=False, dtype=np.float64)

CPU times: user 3.45 s, sys: 87.6 ms, total: 3.54 s
Wall time: 3.53 s


In [14]:
%%time
x2 = np.asarray(l, dtype=np.float64)

CPU times: user 3.46 s, sys: 79.2 ms, total: 3.54 s
Wall time: 3.52 s


### `numpy.ndarray` から読み込む場合

* [numpy.asarray — NumPy v1.23 Manual](https://numpy.org/doc/stable/reference/generated/numpy.asarray.html)

In [108]:
x0.dtype, x0.dtype.char, x0.dtype.alignment

(dtype('float64'), 'd', 8)

In [109]:
 x0.data, x0.shape, x0.strides

(<memory at 0x7fb7fa8c3940>, (125000000,), (8,))

In [110]:
x0.flags

  C_CONTIGUOUS : True
  F_CONTIGUOUS : True
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False

---

In [16]:
%%time
y0 = np.array(x0, copy=True)

CPU times: user 183 ms, sys: 61.9 ms, total: 245 ms
Wall time: 244 ms


In [17]:
1.0/0.244

4.098360655737705

メモリからCPUへの転送が4GB/sも出る･･･

In [22]:
%%time
y1 = np.array(x0, copy=False)

CPU times: user 3 µs, sys: 0 ns, total: 3 µs
Wall time: 5.48 µs


In [23]:
%%time
y2 = np.asarray(x0)

CPU times: user 2 µs, sys: 0 ns, total: 2 µs
Wall time: 4.77 µs


In [117]:
hex(x0.__array_interface__['data'][0])

'0x7fb56c6ef010'

In [118]:
hex(y0.__array_interface__['data'][0])

'0x7fb4b99e8010'

In [119]:
hex(y1.__array_interface__['data'][0])

'0x7fb56c6ef010'

In [120]:
hex(y2.__array_interface__['data'][0])

'0x7fb56c6ef010'

## "Move array from the host to a device"

In [24]:
import numpy as np
import nlcpy as vp

In [25]:
%%time
x0 = np.array(l, dtype=np.float64)

CPU times: user 3.54 s, sys: 66.4 ms, total: 3.61 s
Wall time: 3.59 s


In [26]:
%%time
x1 = vp.array(l, dtype=np.float64)

CPU times: user 3.64 s, sys: 80.9 ms, total: 3.72 s
Wall time: 3.71 s


In [27]:
%%time
y0 = np.array(x0)

CPU times: user 179 ms, sys: 67.9 ms, total: 246 ms
Wall time: 246 ms


In [31]:
%%time
y1 = vp.array(x0)

CPU times: user 120 ms, sys: 1.11 ms, total: 121 ms
Wall time: 121 ms


In [32]:
1.0/0.121, 126.031/8

(8.264462809917356, 15.753875)

In [34]:
(1.0/0.121) / (126.031/8)

0.5245987295136819

ホストのメモリからVEへの転送はDMAにより8.3GB/s (PCIe Gen3 x16の実効ピーク性能の5割)

## "Move array from a device to the device"

In [35]:
type(x1), hex(x1.ve_adr)

(nlcpy.core.core.ndarray, '0x610054000010')

In [55]:
x1.size*x1.dtype.alignment

1073741824

In [66]:
%%time
z0 = x1.copy()

CPU times: user 956 µs, sys: 0 ns, total: 956 µs
Wall time: 970 µs


In [67]:
%%time
z1 = vp.array(x1)

CPU times: user 1.11 ms, sys: 6 µs, total: 1.12 ms
Wall time: 1.12 ms


In [68]:
1.0/0.00112

892.8571428571429

VE10Cのメモリ転送帯域は0.75TB/sのはずなので出過ぎ

In [69]:
%%time
z2 = vp.asarray(x1)

CPU times: user 4 µs, sys: 0 ns, total: 4 µs
Wall time: 5.48 µs


In [70]:
type(x1), type(z0), type(z1)

(nlcpy.core.core.ndarray, nlcpy.core.core.ndarray, nlcpy.core.core.ndarray)

In [71]:
hex(x1.ve_adr), hex(z0.ve_adr), hex(z1.ve_adr)

('0x610054000010', '0x6101a8000010', '0x6101ec000010')

## "Move array from a device to the host"

In [79]:
%%time
z2 = np.array(x1)

CPU times: user 375 ms, sys: 128 ms, total: 503 ms
Wall time: 502 ms


In [80]:
%%time
z3 = x1.get()

CPU times: user 187 ms, sys: 64.7 ms, total: 252 ms
Wall time: 251 ms


In [81]:
1.0/0.251, 126.031/8

(3.9840637450199203, 15.753875)

ホストのメモリに書き戻す方はちょっと遅い (8GB/s vs. 4GB/s)

In [94]:
type(z2), type(z3)

(numpy.ndarray, numpy.ndarray)

## `numpy.array()`, `numpy.asarray()`, `cupy.asnumpy()`, `cupy.get()`, `nlcpy.get()`

* [Basics of CuPy — CuPy 10.6.0 documentation](https://docs.cupy.dev/en/stable/user_guide/basic.html#move-array-from-a-device-to-the-host)
* `nlcpy.ndarray.get()`
  - [nlcpy.ndarray — nlcpy 2.1.1 documentation](https://sxauroratsubasa.sakura.ne.jp/documents/nlcpy/en/reference/generated/nlcpy.ndarray.html?highlight=nlcp%20ndarray%20get#nlcpy.ndarray.get)
  - [005. NumPy互換数値演算ライブラリNLCPy(その1): テーマ別コラムインデックス | NEC](https://jpn.nec.com/hpc/sxauroratsubasa/column/005.html)

* [python - memory address of numpy elements - Stack Overflow](https://stackoverflow.com/questions/60228604/memory-address-of-numpy-elements)