Score and Predict Large Datasets
================================

Sometimes you'll train on a smaller dataset that fits in memory, but need to predict or score for a much larger (possibly larger than memory) dataset. Perhaps your [learning curve](http://scikit-learn.org/stable/modules/learning_curve.html) has leveled off, or you only have labels for a subset of the data.

In this situation, you can use [ParallelPostFit](http://ml.dask.org/modules/generated/dask_ml.wrappers.ParallelPostFit.html) to parallelize and distribute the scoring or prediction steps.

In [28]:
from dask.distributed import Client, progress


# Assuming you've already set up your cluster and client
client = Client('localhost:8786')

client


+-------------+--------+-----------+---------+
| Package     | Client | Scheduler | Workers |
+-------------+--------+-----------+---------+
| cloudpickle | 3.1.0  | 3.0.0     | 3.0.0   |
| msgpack     | 1.1.0  | 1.0.7     | 1.0.7   |
| numpy       | 1.26.4 | 1.26.3    | 1.26.3  |
| pandas      | 2.2.3  | 2.1.4     | 2.1.4   |
| toolz       | 1.0.0  | 0.12.0    | 0.12.0  |
| tornado     | 6.4.1  | 6.3.3     | 6.3.3   |
+-------------+--------+-----------+---------+


0,1
Connection method: Direct,
Dashboard: http://localhost:8787/status,

0,1
Comm: tcp://10.249.15.39:8786,Workers: 64
Dashboard: http://10.249.15.39:8787/status,Total threads: 16384
Started: 5 minutes ago,Total memory: 29.79 TiB

0,1
Comm: tcp://10.249.15.39:32809,Total threads: 256
Dashboard: http://10.249.15.39:40647/status,Memory: 476.56 GiB
Nanny: tcp://10.249.15.39:33687,
Local directory: /tmp/dask-scratch-space/worker-d7wnekph,Local directory: /tmp/dask-scratch-space/worker-d7wnekph
Tasks executing:,Tasks in memory:
Tasks ready:,Tasks in flight:
CPU usage: 2.0%,Last seen: Just now
Memory usage: 208.54 MiB,Spilled bytes: 0 B
Read bytes: 71.00 kiB,Write bytes: 306.84 kiB

0,1
Comm: tcp://10.249.15.39:32977,Total threads: 256
Dashboard: http://10.249.15.39:39479/status,Memory: 476.56 GiB
Nanny: tcp://10.249.15.39:36199,
Local directory: /tmp/dask-scratch-space/worker-8jnsbmmf,Local directory: /tmp/dask-scratch-space/worker-8jnsbmmf
Tasks executing:,Tasks in memory:
Tasks ready:,Tasks in flight:
CPU usage: 2.0%,Last seen: Just now
Memory usage: 208.96 MiB,Spilled bytes: 0 B
Read bytes: 64.76 kiB,Write bytes: 306.77 kiB

0,1
Comm: tcp://10.249.15.39:33113,Total threads: 256
Dashboard: http://10.249.15.39:35971/status,Memory: 476.56 GiB
Nanny: tcp://10.249.15.39:33907,
Local directory: /tmp/dask-scratch-space/worker-ny3u25l2,Local directory: /tmp/dask-scratch-space/worker-ny3u25l2
Tasks executing:,Tasks in memory:
Tasks ready:,Tasks in flight:
CPU usage: 0.0%,Last seen: Just now
Memory usage: 208.50 MiB,Spilled bytes: 0 B
Read bytes: 71.64 kiB,Write bytes: 308.13 kiB

0,1
Comm: tcp://10.249.15.39:33159,Total threads: 256
Dashboard: http://10.249.15.39:32885/status,Memory: 476.56 GiB
Nanny: tcp://10.249.15.39:35611,
Local directory: /tmp/dask-scratch-space/worker-d8bbwl9o,Local directory: /tmp/dask-scratch-space/worker-d8bbwl9o
Tasks executing:,Tasks in memory:
Tasks ready:,Tasks in flight:
CPU usage: 2.0%,Last seen: Just now
Memory usage: 207.13 MiB,Spilled bytes: 0 B
Read bytes: 71.58 kiB,Write bytes: 307.84 kiB

0,1
Comm: tcp://10.249.15.39:34229,Total threads: 256
Dashboard: http://10.249.15.39:35261/status,Memory: 476.56 GiB
Nanny: tcp://10.249.15.39:46645,
Local directory: /tmp/dask-scratch-space/worker-dhr0mtud,Local directory: /tmp/dask-scratch-space/worker-dhr0mtud
Tasks executing:,Tasks in memory:
Tasks ready:,Tasks in flight:
CPU usage: 2.0%,Last seen: Just now
Memory usage: 209.03 MiB,Spilled bytes: 0 B
Read bytes: 65.52 kiB,Write bytes: 335.79 kiB

0,1
Comm: tcp://10.249.15.39:34989,Total threads: 256
Dashboard: http://10.249.15.39:39033/status,Memory: 476.56 GiB
Nanny: tcp://10.249.15.39:37897,
Local directory: /tmp/dask-scratch-space/worker-lvl3kuqw,Local directory: /tmp/dask-scratch-space/worker-lvl3kuqw
Tasks executing:,Tasks in memory:
Tasks ready:,Tasks in flight:
CPU usage: 2.0%,Last seen: Just now
Memory usage: 128.81 MiB,Spilled bytes: 0 B
Read bytes: 36.98 kiB,Write bytes: 419.96 kiB

0,1
Comm: tcp://10.249.15.39:35131,Total threads: 256
Dashboard: http://10.249.15.39:36293/status,Memory: 476.56 GiB
Nanny: tcp://10.249.15.39:42095,
Local directory: /tmp/dask-scratch-space/worker-iy9aiznr,Local directory: /tmp/dask-scratch-space/worker-iy9aiznr
Tasks executing:,Tasks in memory:
Tasks ready:,Tasks in flight:
CPU usage: 2.0%,Last seen: Just now
Memory usage: 209.69 MiB,Spilled bytes: 0 B
Read bytes: 62.79 kiB,Write bytes: 304.58 kiB

0,1
Comm: tcp://10.249.15.39:36791,Total threads: 256
Dashboard: http://10.249.15.39:37103/status,Memory: 476.56 GiB
Nanny: tcp://10.249.15.39:43487,
Local directory: /tmp/dask-scratch-space/worker-o3z04mxj,Local directory: /tmp/dask-scratch-space/worker-o3z04mxj
Tasks executing:,Tasks in memory:
Tasks ready:,Tasks in flight:
CPU usage: 2.0%,Last seen: Just now
Memory usage: 154.54 MiB,Spilled bytes: 0 B
Read bytes: 62.80 kiB,Write bytes: 304.62 kiB

0,1
Comm: tcp://10.249.15.39:37067,Total threads: 256
Dashboard: http://10.249.15.39:44709/status,Memory: 476.56 GiB
Nanny: tcp://10.249.15.39:34487,
Local directory: /tmp/dask-scratch-space/worker-85zj4kdx,Local directory: /tmp/dask-scratch-space/worker-85zj4kdx
Tasks executing:,Tasks in memory:
Tasks ready:,Tasks in flight:
CPU usage: 2.0%,Last seen: Just now
Memory usage: 128.79 MiB,Spilled bytes: 0 B
Read bytes: 37.18 kiB,Write bytes: 420.94 kiB

0,1
Comm: tcp://10.249.15.39:37487,Total threads: 256
Dashboard: http://10.249.15.39:42677/status,Memory: 476.56 GiB
Nanny: tcp://10.249.15.39:40545,
Local directory: /tmp/dask-scratch-space/worker-7bwaqw6j,Local directory: /tmp/dask-scratch-space/worker-7bwaqw6j
Tasks executing:,Tasks in memory:
Tasks ready:,Tasks in flight:
CPU usage: 2.0%,Last seen: Just now
Memory usage: 208.98 MiB,Spilled bytes: 0 B
Read bytes: 61.48 kiB,Write bytes: 328.52 kiB

0,1
Comm: tcp://10.249.15.39:38235,Total threads: 256
Dashboard: http://10.249.15.39:38515/status,Memory: 476.56 GiB
Nanny: tcp://10.249.15.39:37837,
Local directory: /tmp/dask-scratch-space/worker-75q4uij9,Local directory: /tmp/dask-scratch-space/worker-75q4uij9
Tasks executing:,Tasks in memory:
Tasks ready:,Tasks in flight:
CPU usage: 2.0%,Last seen: Just now
Memory usage: 157.41 MiB,Spilled bytes: 0 B
Read bytes: 71.58 kiB,Write bytes: 307.86 kiB

0,1
Comm: tcp://10.249.15.39:38751,Total threads: 256
Dashboard: http://10.249.15.39:33095/status,Memory: 476.56 GiB
Nanny: tcp://10.249.15.39:35247,
Local directory: /tmp/dask-scratch-space/worker-wsfhk2p3,Local directory: /tmp/dask-scratch-space/worker-wsfhk2p3
Tasks executing:,Tasks in memory:
Tasks ready:,Tasks in flight:
CPU usage: 2.0%,Last seen: Just now
Memory usage: 122.51 MiB,Spilled bytes: 0 B
Read bytes: 33.03 kiB,Write bytes: 388.22 kiB

0,1
Comm: tcp://10.249.15.39:38961,Total threads: 256
Dashboard: http://10.249.15.39:38905/status,Memory: 476.56 GiB
Nanny: tcp://10.249.15.39:33949,
Local directory: /tmp/dask-scratch-space/worker-ufew852y,Local directory: /tmp/dask-scratch-space/worker-ufew852y
Tasks executing:,Tasks in memory:
Tasks ready:,Tasks in flight:
CPU usage: 2.0%,Last seen: Just now
Memory usage: 132.58 MiB,Spilled bytes: 0 B
Read bytes: 34.48 kiB,Write bytes: 388.57 kiB

0,1
Comm: tcp://10.249.15.39:39757,Total threads: 256
Dashboard: http://10.249.15.39:42325/status,Memory: 476.56 GiB
Nanny: tcp://10.249.15.39:46247,
Local directory: /tmp/dask-scratch-space/worker-ogk4kuaw,Local directory: /tmp/dask-scratch-space/worker-ogk4kuaw
Tasks executing:,Tasks in memory:
Tasks ready:,Tasks in flight:
CPU usage: 2.0%,Last seen: Just now
Memory usage: 119.03 MiB,Spilled bytes: 0 B
Read bytes: 36.95 kiB,Write bytes: 419.72 kiB

0,1
Comm: tcp://10.249.15.39:40253,Total threads: 256
Dashboard: http://10.249.15.39:46601/status,Memory: 476.56 GiB
Nanny: tcp://10.249.15.39:44475,
Local directory: /tmp/dask-scratch-space/worker-jyk9y7ls,Local directory: /tmp/dask-scratch-space/worker-jyk9y7ls
Tasks executing:,Tasks in memory:
Tasks ready:,Tasks in flight:
CPU usage: 2.0%,Last seen: Just now
Memory usage: 208.10 MiB,Spilled bytes: 0 B
Read bytes: 63.16 kiB,Write bytes: 334.18 kiB

0,1
Comm: tcp://10.249.15.39:41479,Total threads: 256
Dashboard: http://10.249.15.39:34165/status,Memory: 476.56 GiB
Nanny: tcp://10.249.15.39:38311,
Local directory: /tmp/dask-scratch-space/worker-c0wyy84x,Local directory: /tmp/dask-scratch-space/worker-c0wyy84x
Tasks executing:,Tasks in memory:
Tasks ready:,Tasks in flight:
CPU usage: 4.0%,Last seen: Just now
Memory usage: 208.00 MiB,Spilled bytes: 0 B
Read bytes: 71.69 kiB,Write bytes: 307.84 kiB

0,1
Comm: tcp://10.249.15.39:41989,Total threads: 256
Dashboard: http://10.249.15.39:38207/status,Memory: 476.56 GiB
Nanny: tcp://10.249.15.39:34525,
Local directory: /tmp/dask-scratch-space/worker-r4o3b00m,Local directory: /tmp/dask-scratch-space/worker-r4o3b00m
Tasks executing:,Tasks in memory:
Tasks ready:,Tasks in flight:
CPU usage: 2.0%,Last seen: Just now
Memory usage: 126.83 MiB,Spilled bytes: 0 B
Read bytes: 37.05 kiB,Write bytes: 420.83 kiB

0,1
Comm: tcp://10.249.15.39:41997,Total threads: 256
Dashboard: http://10.249.15.39:41037/status,Memory: 476.56 GiB
Nanny: tcp://10.249.15.39:39619,
Local directory: /tmp/dask-scratch-space/worker-plmau99u,Local directory: /tmp/dask-scratch-space/worker-plmau99u
Tasks executing:,Tasks in memory:
Tasks ready:,Tasks in flight:
CPU usage: 0.0%,Last seen: Just now
Memory usage: 129.22 MiB,Spilled bytes: 0 B
Read bytes: 36.67 kiB,Write bytes: 416.40 kiB

0,1
Comm: tcp://10.249.15.39:42249,Total threads: 256
Dashboard: http://10.249.15.39:34595/status,Memory: 476.56 GiB
Nanny: tcp://10.249.15.39:45845,
Local directory: /tmp/dask-scratch-space/worker-6m87q8sb,Local directory: /tmp/dask-scratch-space/worker-6m87q8sb
Tasks executing:,Tasks in memory:
Tasks ready:,Tasks in flight:
CPU usage: 0.0%,Last seen: Just now
Memory usage: 206.96 MiB,Spilled bytes: 0 B
Read bytes: 71.83 kiB,Write bytes: 308.40 kiB

0,1
Comm: tcp://10.249.15.39:42451,Total threads: 256
Dashboard: http://10.249.15.39:40223/status,Memory: 476.56 GiB
Nanny: tcp://10.249.15.39:44061,
Local directory: /tmp/dask-scratch-space/worker-0v_zsvaf,Local directory: /tmp/dask-scratch-space/worker-0v_zsvaf
Tasks executing:,Tasks in memory:
Tasks ready:,Tasks in flight:
CPU usage: 2.0%,Last seen: Just now
Memory usage: 118.90 MiB,Spilled bytes: 0 B
Read bytes: 27.87 kiB,Write bytes: 289.56 kiB

0,1
Comm: tcp://10.249.15.39:43171,Total threads: 256
Dashboard: http://10.249.15.39:46809/status,Memory: 476.56 GiB
Nanny: tcp://10.249.15.39:40275,
Local directory: /tmp/dask-scratch-space/worker-lkfa22cd,Local directory: /tmp/dask-scratch-space/worker-lkfa22cd
Tasks executing:,Tasks in memory:
Tasks ready:,Tasks in flight:
CPU usage: 0.0%,Last seen: Just now
Memory usage: 209.10 MiB,Spilled bytes: 0 B
Read bytes: 69.52 kiB,Write bytes: 302.95 kiB

0,1
Comm: tcp://10.249.15.39:43201,Total threads: 256
Dashboard: http://10.249.15.39:35471/status,Memory: 476.56 GiB
Nanny: tcp://10.249.15.39:39103,
Local directory: /tmp/dask-scratch-space/worker-pgktvxs8,Local directory: /tmp/dask-scratch-space/worker-pgktvxs8
Tasks executing:,Tasks in memory:
Tasks ready:,Tasks in flight:
CPU usage: 2.0%,Last seen: Just now
Memory usage: 129.28 MiB,Spilled bytes: 0 B
Read bytes: 36.94 kiB,Write bytes: 419.58 kiB

0,1
Comm: tcp://10.249.15.39:43415,Total threads: 256
Dashboard: http://10.249.15.39:42731/status,Memory: 476.56 GiB
Nanny: tcp://10.249.15.39:36879,
Local directory: /tmp/dask-scratch-space/worker-_3raa6_p,Local directory: /tmp/dask-scratch-space/worker-_3raa6_p
Tasks executing:,Tasks in memory:
Tasks ready:,Tasks in flight:
CPU usage: 2.0%,Last seen: Just now
Memory usage: 142.95 MiB,Spilled bytes: 0 B
Read bytes: 69.57 kiB,Write bytes: 303.17 kiB

0,1
Comm: tcp://10.249.15.39:43507,Total threads: 256
Dashboard: http://10.249.15.39:37723/status,Memory: 476.56 GiB
Nanny: tcp://10.249.15.39:34389,
Local directory: /tmp/dask-scratch-space/worker-ray1tii5,Local directory: /tmp/dask-scratch-space/worker-ray1tii5
Tasks executing:,Tasks in memory:
Tasks ready:,Tasks in flight:
CPU usage: 2.0%,Last seen: Just now
Memory usage: 118.46 MiB,Spilled bytes: 0 B
Read bytes: 27.88 kiB,Write bytes: 289.68 kiB

0,1
Comm: tcp://10.249.15.39:43573,Total threads: 256
Dashboard: http://10.249.15.39:37497/status,Memory: 476.56 GiB
Nanny: tcp://10.249.15.39:37433,
Local directory: /tmp/dask-scratch-space/worker-7bwzjneb,Local directory: /tmp/dask-scratch-space/worker-7bwzjneb
Tasks executing:,Tasks in memory:
Tasks ready:,Tasks in flight:
CPU usage: 2.0%,Last seen: Just now
Memory usage: 116.85 MiB,Spilled bytes: 0 B
Read bytes: 27.96 kiB,Write bytes: 290.52 kiB

0,1
Comm: tcp://10.249.15.39:44505,Total threads: 256
Dashboard: http://10.249.15.39:39201/status,Memory: 476.56 GiB
Nanny: tcp://10.249.15.39:45923,
Local directory: /tmp/dask-scratch-space/worker-am192mra,Local directory: /tmp/dask-scratch-space/worker-am192mra
Tasks executing:,Tasks in memory:
Tasks ready:,Tasks in flight:
CPU usage: 2.0%,Last seen: Just now
Memory usage: 129.32 MiB,Spilled bytes: 0 B
Read bytes: 37.07 kiB,Write bytes: 421.05 kiB

0,1
Comm: tcp://10.249.15.39:44919,Total threads: 256
Dashboard: http://10.249.15.39:38479/status,Memory: 476.56 GiB
Nanny: tcp://10.249.15.39:45531,
Local directory: /tmp/dask-scratch-space/worker-auv3nqop,Local directory: /tmp/dask-scratch-space/worker-auv3nqop
Tasks executing:,Tasks in memory:
Tasks ready:,Tasks in flight:
CPU usage: 0.0%,Last seen: Just now
Memory usage: 128.55 MiB,Spilled bytes: 0 B
Read bytes: 27.88 kiB,Write bytes: 289.66 kiB

0,1
Comm: tcp://10.249.15.39:45335,Total threads: 256
Dashboard: http://10.249.15.39:36235/status,Memory: 476.56 GiB
Nanny: tcp://10.249.15.39:44997,
Local directory: /tmp/dask-scratch-space/worker-20wlrq1e,Local directory: /tmp/dask-scratch-space/worker-20wlrq1e
Tasks executing:,Tasks in memory:
Tasks ready:,Tasks in flight:
CPU usage: 2.0%,Last seen: Just now
Memory usage: 205.96 MiB,Spilled bytes: 0 B
Read bytes: 64.52 kiB,Write bytes: 306.18 kiB

0,1
Comm: tcp://10.249.15.39:46001,Total threads: 256
Dashboard: http://10.249.15.39:42931/status,Memory: 476.56 GiB
Nanny: tcp://10.249.15.39:39743,
Local directory: /tmp/dask-scratch-space/worker-i8dfax9k,Local directory: /tmp/dask-scratch-space/worker-i8dfax9k
Tasks executing:,Tasks in memory:
Tasks ready:,Tasks in flight:
CPU usage: 2.0%,Last seen: Just now
Memory usage: 134.90 MiB,Spilled bytes: 0 B
Read bytes: 31.16 kiB,Write bytes: 344.22 kiB

0,1
Comm: tcp://10.249.15.39:46547,Total threads: 256
Dashboard: http://10.249.15.39:35629/status,Memory: 476.56 GiB
Nanny: tcp://10.249.15.39:41175,
Local directory: /tmp/dask-scratch-space/worker-p0qh08gr,Local directory: /tmp/dask-scratch-space/worker-p0qh08gr
Tasks executing:,Tasks in memory:
Tasks ready:,Tasks in flight:
CPU usage: 2.0%,Last seen: Just now
Memory usage: 158.88 MiB,Spilled bytes: 0 B
Read bytes: 64.63 kiB,Write bytes: 333.35 kiB

0,1
Comm: tcp://10.249.15.39:46719,Total threads: 256
Dashboard: http://10.249.15.39:37699/status,Memory: 476.56 GiB
Nanny: tcp://10.249.15.39:42139,
Local directory: /tmp/dask-scratch-space/worker-xwe_drzk,Local directory: /tmp/dask-scratch-space/worker-xwe_drzk
Tasks executing:,Tasks in memory:
Tasks ready:,Tasks in flight:
CPU usage: 2.0%,Last seen: Just now
Memory usage: 144.17 MiB,Spilled bytes: 0 B
Read bytes: 62.01 kiB,Write bytes: 299.53 kiB

0,1
Comm: tcp://10.249.15.39:46831,Total threads: 256
Dashboard: http://10.249.15.39:33483/status,Memory: 476.56 GiB
Nanny: tcp://10.249.15.39:34993,
Local directory: /tmp/dask-scratch-space/worker-5nj70fr7,Local directory: /tmp/dask-scratch-space/worker-5nj70fr7
Tasks executing:,Tasks in memory:
Tasks ready:,Tasks in flight:
CPU usage: 2.0%,Last seen: Just now
Memory usage: 208.11 MiB,Spilled bytes: 0 B
Read bytes: 64.84 kiB,Write bytes: 334.42 kiB

0,1
Comm: tcp://10.249.16.91:33871,Total threads: 256
Dashboard: http://10.249.16.91:37445/status,Memory: 476.56 GiB
Nanny: tcp://10.249.16.91:40227,
Local directory: /tmp/dask-scratch-space/worker-sxx4h8ys,Local directory: /tmp/dask-scratch-space/worker-sxx4h8ys
Tasks executing:,Tasks in memory:
Tasks ready:,Tasks in flight:
CPU usage: 0.0%,Last seen: Just now
Memory usage: 127.61 MiB,Spilled bytes: 0 B
Read bytes: 6.57 kiB,Write bytes: 4.62 kiB

0,1
Comm: tcp://10.249.16.91:35801,Total threads: 256
Dashboard: http://10.249.16.91:45005/status,Memory: 476.56 GiB
Nanny: tcp://10.249.16.91:37775,
Local directory: /tmp/dask-scratch-space/worker-9glphkfa,Local directory: /tmp/dask-scratch-space/worker-9glphkfa
Tasks executing:,Tasks in memory:
Tasks ready:,Tasks in flight:
CPU usage: 4.0%,Last seen: Just now
Memory usage: 146.98 MiB,Spilled bytes: 0 B
Read bytes: 6.64 kiB,Write bytes: 15.01 kiB

0,1
Comm: tcp://10.249.16.91:36087,Total threads: 256
Dashboard: http://10.249.16.91:45137/status,Memory: 476.56 GiB
Nanny: tcp://10.249.16.91:33395,
Local directory: /tmp/dask-scratch-space/worker-abnnwgi_,Local directory: /tmp/dask-scratch-space/worker-abnnwgi_
Tasks executing:,Tasks in memory:
Tasks ready:,Tasks in flight:
CPU usage: 2.0%,Last seen: Just now
Memory usage: 118.06 MiB,Spilled bytes: 0 B
Read bytes: 6.93 kiB,Write bytes: 12.12 kiB

0,1
Comm: tcp://10.249.16.91:36445,Total threads: 256
Dashboard: http://10.249.16.91:38133/status,Memory: 476.56 GiB
Nanny: tcp://10.249.16.91:38893,
Local directory: /tmp/dask-scratch-space/worker-rvkx73l5,Local directory: /tmp/dask-scratch-space/worker-rvkx73l5
Tasks executing:,Tasks in memory:
Tasks ready:,Tasks in flight:
CPU usage: 2.0%,Last seen: Just now
Memory usage: 147.53 MiB,Spilled bytes: 0 B
Read bytes: 7.20 kiB,Write bytes: 13.57 kiB

0,1
Comm: tcp://10.249.16.91:37197,Total threads: 256
Dashboard: http://10.249.16.91:34723/status,Memory: 476.56 GiB
Nanny: tcp://10.249.16.91:42467,
Local directory: /tmp/dask-scratch-space/worker-ktf2tld1,Local directory: /tmp/dask-scratch-space/worker-ktf2tld1
Tasks executing:,Tasks in memory:
Tasks ready:,Tasks in flight:
CPU usage: 2.0%,Last seen: Just now
Memory usage: 129.15 MiB,Spilled bytes: 0 B
Read bytes: 6.98 kiB,Write bytes: 21.02 kiB

0,1
Comm: tcp://10.249.16.91:37227,Total threads: 256
Dashboard: http://10.249.16.91:43549/status,Memory: 476.56 GiB
Nanny: tcp://10.249.16.91:34911,
Local directory: /tmp/dask-scratch-space/worker-lef4jeml,Local directory: /tmp/dask-scratch-space/worker-lef4jeml
Tasks executing:,Tasks in memory:
Tasks ready:,Tasks in flight:
CPU usage: 2.0%,Last seen: Just now
Memory usage: 210.94 MiB,Spilled bytes: 0 B
Read bytes: 5.61 kiB,Write bytes: 10.36 kiB

0,1
Comm: tcp://10.249.16.91:37473,Total threads: 256
Dashboard: http://10.249.16.91:39199/status,Memory: 476.56 GiB
Nanny: tcp://10.249.16.91:44859,
Local directory: /tmp/dask-scratch-space/worker-yrmhhcl8,Local directory: /tmp/dask-scratch-space/worker-yrmhhcl8
Tasks executing:,Tasks in memory:
Tasks ready:,Tasks in flight:
CPU usage: 0.0%,Last seen: Just now
Memory usage: 127.79 MiB,Spilled bytes: 0 B
Read bytes: 6.55 kiB,Write bytes: 4.61 kiB

0,1
Comm: tcp://10.249.16.91:37655,Total threads: 256
Dashboard: http://10.249.16.91:34631/status,Memory: 476.56 GiB
Nanny: tcp://10.249.16.91:41805,
Local directory: /tmp/dask-scratch-space/worker-dxl3jf82,Local directory: /tmp/dask-scratch-space/worker-dxl3jf82
Tasks executing:,Tasks in memory:
Tasks ready:,Tasks in flight:
CPU usage: 2.0%,Last seen: Just now
Memory usage: 138.97 MiB,Spilled bytes: 0 B
Read bytes: 4.77 kiB,Write bytes: 10.11 kiB

0,1
Comm: tcp://10.249.16.91:37675,Total threads: 256
Dashboard: http://10.249.16.91:36351/status,Memory: 476.56 GiB
Nanny: tcp://10.249.16.91:32999,
Local directory: /tmp/dask-scratch-space/worker-32fkoohu,Local directory: /tmp/dask-scratch-space/worker-32fkoohu
Tasks executing:,Tasks in memory:
Tasks ready:,Tasks in flight:
CPU usage: 2.0%,Last seen: Just now
Memory usage: 210.92 MiB,Spilled bytes: 0 B
Read bytes: 5.34 kiB,Write bytes: 8.90 kiB

0,1
Comm: tcp://10.249.16.91:37921,Total threads: 256
Dashboard: http://10.249.16.91:33501/status,Memory: 476.56 GiB
Nanny: tcp://10.249.16.91:36595,
Local directory: /tmp/dask-scratch-space/worker-7d4v6ejb,Local directory: /tmp/dask-scratch-space/worker-7d4v6ejb
Tasks executing:,Tasks in memory:
Tasks ready:,Tasks in flight:
CPU usage: 0.0%,Last seen: Just now
Memory usage: 291.55 MiB,Spilled bytes: 0 B
Read bytes: 6.97 kiB,Write bytes: 20.99 kiB

0,1
Comm: tcp://10.249.16.91:38129,Total threads: 256
Dashboard: http://10.249.16.91:45085/status,Memory: 476.56 GiB
Nanny: tcp://10.249.16.91:44091,
Local directory: /tmp/dask-scratch-space/worker-yefmykjt,Local directory: /tmp/dask-scratch-space/worker-yefmykjt
Tasks executing:,Tasks in memory:
Tasks ready:,Tasks in flight:
CPU usage: 0.0%,Last seen: Just now
Memory usage: 213.64 MiB,Spilled bytes: 0 B
Read bytes: 4.46 kiB,Write bytes: 4.81 kiB

0,1
Comm: tcp://10.249.16.91:38455,Total threads: 256
Dashboard: http://10.249.16.91:42367/status,Memory: 476.56 GiB
Nanny: tcp://10.249.16.91:42461,
Local directory: /tmp/dask-scratch-space/worker-6dwrz4o4,Local directory: /tmp/dask-scratch-space/worker-6dwrz4o4
Tasks executing:,Tasks in memory:
Tasks ready:,Tasks in flight:
CPU usage: 2.0%,Last seen: Just now
Memory usage: 125.20 MiB,Spilled bytes: 0 B
Read bytes: 7.24 kiB,Write bytes: 22.57 kiB

0,1
Comm: tcp://10.249.16.91:38533,Total threads: 256
Dashboard: http://10.249.16.91:32961/status,Memory: 476.56 GiB
Nanny: tcp://10.249.16.91:34071,
Local directory: /tmp/dask-scratch-space/worker-r8kvu7je,Local directory: /tmp/dask-scratch-space/worker-r8kvu7je
Tasks executing:,Tasks in memory:
Tasks ready:,Tasks in flight:
CPU usage: 4.0%,Last seen: Just now
Memory usage: 209.03 MiB,Spilled bytes: 0 B
Read bytes: 4.46 kiB,Write bytes: 4.69 kiB

0,1
Comm: tcp://10.249.16.91:39625,Total threads: 256
Dashboard: http://10.249.16.91:44191/status,Memory: 476.56 GiB
Nanny: tcp://10.249.16.91:39393,
Local directory: /tmp/dask-scratch-space/worker-upub488h,Local directory: /tmp/dask-scratch-space/worker-upub488h
Tasks executing:,Tasks in memory:
Tasks ready:,Tasks in flight:
CPU usage: 0.0%,Last seen: Just now
Memory usage: 209.53 MiB,Spilled bytes: 0 B
Read bytes: 5.28 kiB,Write bytes: 11.70 kiB

0,1
Comm: tcp://10.249.16.91:40057,Total threads: 256
Dashboard: http://10.249.16.91:43415/status,Memory: 476.56 GiB
Nanny: tcp://10.249.16.91:39989,
Local directory: /tmp/dask-scratch-space/worker-05s_dai3,Local directory: /tmp/dask-scratch-space/worker-05s_dai3
Tasks executing:,Tasks in memory:
Tasks ready:,Tasks in flight:
CPU usage: 0.0%,Last seen: Just now
Memory usage: 211.43 MiB,Spilled bytes: 0 B
Read bytes: 4.05 kiB,Write bytes: 1.79 kiB

0,1
Comm: tcp://10.249.16.91:40797,Total threads: 256
Dashboard: http://10.249.16.91:40655/status,Memory: 476.56 GiB
Nanny: tcp://10.249.16.91:33243,
Local directory: /tmp/dask-scratch-space/worker-6fzgcwj3,Local directory: /tmp/dask-scratch-space/worker-6fzgcwj3
Tasks executing:,Tasks in memory:
Tasks ready:,Tasks in flight:
CPU usage: 0.0%,Last seen: Just now
Memory usage: 210.88 MiB,Spilled bytes: 0 B
Read bytes: 4.06 kiB,Write bytes: 1.80 kiB

0,1
Comm: tcp://10.249.16.91:41541,Total threads: 256
Dashboard: http://10.249.16.91:42445/status,Memory: 476.56 GiB
Nanny: tcp://10.249.16.91:36591,
Local directory: /tmp/dask-scratch-space/worker-kqqi2eff,Local directory: /tmp/dask-scratch-space/worker-kqqi2eff
Tasks executing:,Tasks in memory:
Tasks ready:,Tasks in flight:
CPU usage: 4.0%,Last seen: Just now
Memory usage: 158.48 MiB,Spilled bytes: 0 B
Read bytes: 6.65 kiB,Write bytes: 15.02 kiB

0,1
Comm: tcp://10.249.16.91:41753,Total threads: 256
Dashboard: http://10.249.16.91:46003/status,Memory: 476.56 GiB
Nanny: tcp://10.249.16.91:35491,
Local directory: /tmp/dask-scratch-space/worker-d_umgl5a,Local directory: /tmp/dask-scratch-space/worker-d_umgl5a
Tasks executing:,Tasks in memory:
Tasks ready:,Tasks in flight:
CPU usage: 2.0%,Last seen: Just now
Memory usage: 211.10 MiB,Spilled bytes: 0 B
Read bytes: 5.27 kiB,Write bytes: 11.67 kiB

0,1
Comm: tcp://10.249.16.91:41807,Total threads: 256
Dashboard: http://10.249.16.91:35509/status,Memory: 476.56 GiB
Nanny: tcp://10.249.16.91:36717,
Local directory: /tmp/dask-scratch-space/worker-k7i3uzrc,Local directory: /tmp/dask-scratch-space/worker-k7i3uzrc
Tasks executing:,Tasks in memory:
Tasks ready:,Tasks in flight:
CPU usage: 2.0%,Last seen: Just now
Memory usage: 142.78 MiB,Spilled bytes: 0 B
Read bytes: 6.66 kiB,Write bytes: 10.54 kiB

0,1
Comm: tcp://10.249.16.91:42977,Total threads: 256
Dashboard: http://10.249.16.91:35039/status,Memory: 476.56 GiB
Nanny: tcp://10.249.16.91:35131,
Local directory: /tmp/dask-scratch-space/worker-s3xsosxx,Local directory: /tmp/dask-scratch-space/worker-s3xsosxx
Tasks executing:,Tasks in memory:
Tasks ready:,Tasks in flight:
CPU usage: 0.0%,Last seen: Just now
Memory usage: 117.18 MiB,Spilled bytes: 0 B
Read bytes: 6.45 kiB,Write bytes: 4.44 kiB

0,1
Comm: tcp://10.249.16.91:43095,Total threads: 256
Dashboard: http://10.249.16.91:40799/status,Memory: 476.56 GiB
Nanny: tcp://10.249.16.91:34877,
Local directory: /tmp/dask-scratch-space/worker-hh96duqc,Local directory: /tmp/dask-scratch-space/worker-hh96duqc
Tasks executing:,Tasks in memory:
Tasks ready:,Tasks in flight:
CPU usage: 2.0%,Last seen: Just now
Memory usage: 211.32 MiB,Spilled bytes: 0 B
Read bytes: 5.37 kiB,Write bytes: 10.25 kiB

0,1
Comm: tcp://10.249.16.91:43249,Total threads: 256
Dashboard: http://10.249.16.91:36001/status,Memory: 476.56 GiB
Nanny: tcp://10.249.16.91:45341,
Local directory: /tmp/dask-scratch-space/worker-cqwetks8,Local directory: /tmp/dask-scratch-space/worker-cqwetks8
Tasks executing:,Tasks in memory:
Tasks ready:,Tasks in flight:
CPU usage: 2.0%,Last seen: Just now
Memory usage: 119.98 MiB,Spilled bytes: 0 B
Read bytes: 7.72 kiB,Write bytes: 10.29 kiB

0,1
Comm: tcp://10.249.16.91:43631,Total threads: 256
Dashboard: http://10.249.16.91:44549/status,Memory: 476.56 GiB
Nanny: tcp://10.249.16.91:37841,
Local directory: /tmp/dask-scratch-space/worker-lz7kbqqn,Local directory: /tmp/dask-scratch-space/worker-lz7kbqqn
Tasks executing:,Tasks in memory:
Tasks ready:,Tasks in flight:
CPU usage: 4.0%,Last seen: Just now
Memory usage: 211.22 MiB,Spilled bytes: 0 B
Read bytes: 5.36 kiB,Write bytes: 10.24 kiB

0,1
Comm: tcp://10.249.16.91:44763,Total threads: 256
Dashboard: http://10.249.16.91:35799/status,Memory: 476.56 GiB
Nanny: tcp://10.249.16.91:34891,
Local directory: /tmp/dask-scratch-space/worker-tjxqe1ur,Local directory: /tmp/dask-scratch-space/worker-tjxqe1ur
Tasks executing:,Tasks in memory:
Tasks ready:,Tasks in flight:
CPU usage: 2.0%,Last seen: Just now
Memory usage: 126.91 MiB,Spilled bytes: 0 B
Read bytes: 6.82 kiB,Write bytes: 7.38 kiB

0,1
Comm: tcp://10.249.16.91:45387,Total threads: 256
Dashboard: http://10.249.16.91:45297/status,Memory: 476.56 GiB
Nanny: tcp://10.249.16.91:44793,
Local directory: /tmp/dask-scratch-space/worker-i9vrzsa7,Local directory: /tmp/dask-scratch-space/worker-i9vrzsa7
Tasks executing:,Tasks in memory:
Tasks ready:,Tasks in flight:
CPU usage: 2.0%,Last seen: Just now
Memory usage: 211.70 MiB,Spilled bytes: 0 B
Read bytes: 5.36 kiB,Write bytes: 10.23 kiB

0,1
Comm: tcp://10.249.16.91:45497,Total threads: 256
Dashboard: http://10.249.16.91:36541/status,Memory: 476.56 GiB
Nanny: tcp://10.249.16.91:39009,
Local directory: /tmp/dask-scratch-space/worker-surqv_pr,Local directory: /tmp/dask-scratch-space/worker-surqv_pr
Tasks executing:,Tasks in memory:
Tasks ready:,Tasks in flight:
CPU usage: 2.0%,Last seen: Just now
Memory usage: 117.65 MiB,Spilled bytes: 0 B
Read bytes: 6.57 kiB,Write bytes: 4.61 kiB

0,1
Comm: tcp://10.249.16.91:45799,Total threads: 256
Dashboard: http://10.249.16.91:45651/status,Memory: 476.56 GiB
Nanny: tcp://10.249.16.91:35447,
Local directory: /tmp/dask-scratch-space/worker-xp7vb9ny,Local directory: /tmp/dask-scratch-space/worker-xp7vb9ny
Tasks executing:,Tasks in memory:
Tasks ready:,Tasks in flight:
CPU usage: 0.0%,Last seen: Just now
Memory usage: 128.21 MiB,Spilled bytes: 0 B
Read bytes: 6.17 kiB,Write bytes: 11.88 kiB

0,1
Comm: tcp://10.249.16.91:45831,Total threads: 256
Dashboard: http://10.249.16.91:42995/status,Memory: 476.56 GiB
Nanny: tcp://10.249.16.91:37185,
Local directory: /tmp/dask-scratch-space/worker-g5zrd1wl,Local directory: /tmp/dask-scratch-space/worker-g5zrd1wl
Tasks executing:,Tasks in memory:
Tasks ready:,Tasks in flight:
CPU usage: 2.0%,Last seen: Just now
Memory usage: 129.84 MiB,Spilled bytes: 0 B
Read bytes: 6.45 kiB,Write bytes: 4.57 kiB

0,1
Comm: tcp://10.249.16.91:46169,Total threads: 256
Dashboard: http://10.249.16.91:41173/status,Memory: 476.56 GiB
Nanny: tcp://10.249.16.91:39265,
Local directory: /tmp/dask-scratch-space/worker-53vk1yxz,Local directory: /tmp/dask-scratch-space/worker-53vk1yxz
Tasks executing:,Tasks in memory:
Tasks ready:,Tasks in flight:
CPU usage: 2.0%,Last seen: Just now
Memory usage: 211.16 MiB,Spilled bytes: 0 B
Read bytes: 5.25 kiB,Write bytes: 10.24 kiB

0,1
Comm: tcp://10.249.16.91:46215,Total threads: 256
Dashboard: http://10.249.16.91:38425/status,Memory: 476.56 GiB
Nanny: tcp://10.249.16.91:35573,
Local directory: /tmp/dask-scratch-space/worker-x05ztfmf,Local directory: /tmp/dask-scratch-space/worker-x05ztfmf
Tasks executing:,Tasks in memory:
Tasks ready:,Tasks in flight:
CPU usage: 2.0%,Last seen: Just now
Memory usage: 142.71 MiB,Spilled bytes: 0 B
Read bytes: 7.21 kiB,Write bytes: 13.59 kiB

0,1
Comm: tcp://10.249.16.91:46299,Total threads: 256
Dashboard: http://10.249.16.91:36597/status,Memory: 476.56 GiB
Nanny: tcp://10.249.16.91:44761,
Local directory: /tmp/dask-scratch-space/worker-q4uxec_h,Local directory: /tmp/dask-scratch-space/worker-q4uxec_h
Tasks executing:,Tasks in memory:
Tasks ready:,Tasks in flight:
CPU usage: 0.0%,Last seen: Just now
Memory usage: 144.84 MiB,Spilled bytes: 0 B
Read bytes: 6.86 kiB,Write bytes: 13.59 kiB

0,1
Comm: tcp://10.249.16.91:46545,Total threads: 256
Dashboard: http://10.249.16.91:35837/status,Memory: 476.56 GiB
Nanny: tcp://10.249.16.91:39047,
Local directory: /tmp/dask-scratch-space/worker-7jbnsf8d,Local directory: /tmp/dask-scratch-space/worker-7jbnsf8d
Tasks executing:,Tasks in memory:
Tasks ready:,Tasks in flight:
CPU usage: 4.0%,Last seen: Just now
Memory usage: 208.61 MiB,Spilled bytes: 0 B
Read bytes: 3.96 kiB,Write bytes: 3.24 kiB


In [29]:
import numpy as np
import dask.array as da
from sklearn.datasets import make_classification

In [30]:
from dask.distributed import PipInstall
plugin = PipInstall(packages=["dask-ml"], pip_options=["--upgrade"])

We'll generate a small random dataset with scikit-learn.

In [31]:
X_train, y_train = make_classification(
    n_features=2, n_redundant=0, n_informative=2,
    random_state=1, n_clusters_per_class=1, n_samples=1000)
X_train[:5]

array([[ 1.53682958, -1.39869399],
       [ 1.36917601, -0.63734411],
       [ 0.50231787, -0.45910529],
       [ 1.83319262, -1.29808229],
       [ 1.04235568,  1.12152929]])

And we'll clone that dataset many times with `dask.array`. `X_large` and `y_large` represent our larger than memory dataset.

In [32]:
# Scale up: increase N, the number of times we replicate the data.
N = 100
X_large = da.concatenate([da.from_array(X_train, chunks=X_train.shape)
                          for _ in range(N)])
y_large = da.concatenate([da.from_array(y_train, chunks=y_train.shape)
                          for _ in range(N)])
X_large

Unnamed: 0,Array,Chunk
Bytes,1.53 MiB,15.62 kiB
Shape,"(100000, 2)","(1000, 2)"
Dask graph,100 chunks in 2 graph layers,100 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 1.53 MiB 15.62 kiB Shape (100000, 2) (1000, 2) Dask graph 100 chunks in 2 graph layers Data type float64 numpy.ndarray",2  100000,

Unnamed: 0,Array,Chunk
Bytes,1.53 MiB,15.62 kiB
Shape,"(100000, 2)","(1000, 2)"
Dask graph,100 chunks in 2 graph layers,100 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray


Since our training dataset fits in memory, we can use a scikit-learn estimator as the actual estimator fit during traning.
But we know that we'll want to predict for a large dataset, so we'll wrap the scikit-learn estimator with `ParallelPostFit`.

In [33]:
from sklearn.linear_model import LogisticRegressionCV
from dask_ml.wrappers import ParallelPostFit

In [34]:
clf = ParallelPostFit(LogisticRegressionCV(cv=3), scoring="r2")

See the note in the `dask-ml`'s documentation about when and why a `scoring` parameter is needed: https://ml.dask.org/modules/generated/dask_ml.wrappers.ParallelPostFit.html#dask_ml.wrappers.ParallelPostFit.

Now we'll call `clf.fit`. Dask-ML does nothing here, so this step can only use datasets that fit in memory.

In [35]:
clf.fit(X_train, y_train)

Now that training is done, we'll turn to predicting for the full (larger than memory) dataset.

In [36]:
y_pred = clf.predict(X_large)
y_pred

Unnamed: 0,Array,Chunk
Bytes,781.25 kiB,7.81 kiB
Shape,"(100000,)","(1000,)"
Dask graph,100 chunks in 3 graph layers,100 chunks in 3 graph layers
Data type,int64 numpy.ndarray,int64 numpy.ndarray
"Array Chunk Bytes 781.25 kiB 7.81 kiB Shape (100000,) (1000,) Dask graph 100 chunks in 3 graph layers Data type int64 numpy.ndarray",100000  1,

Unnamed: 0,Array,Chunk
Bytes,781.25 kiB,7.81 kiB
Shape,"(100000,)","(1000,)"
Dask graph,100 chunks in 3 graph layers,100 chunks in 3 graph layers
Data type,int64 numpy.ndarray,int64 numpy.ndarray


y_pred is Dask arary. Workers can write the predicted values to a shared file system, without ever having to collect the data on a single machine.

Or we can check the models score on the entire large dataset. The computation will be done in parallel, and no single machine will have to hold all the data.

In [37]:
#clf.score(X_large, y_large)