Skip to content

Latest commit

 

History

History
 
 

using-dask

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 

Using Dask

description: learn how to read from cloud data and scale PyData tools (Numpy, Pandas, Scikit-Learn, etc.) with Dask

"Dask natively scales Python" and "provides advanced parallelism for analytics, enabling performance at scale for the tools you love." It is open source, freely available, and sits in the PyData ecosystem of tools, develop in coordination with other projects like Numpy, Pandas, and Scikit-Learn. It provides familiar APIs for Python users, allows for low-level customization and streaming with a futures API, and scales up on clusters.

Dask is often compared to Spark - see this page to help evaluate which is the best tool for you. Common ML tools like Optuna, Scikit-Learn, XGBoost, LightGBM, and more can be distributed via Dask. There are numerous packages available for scaling on cloud clusters.

In this tutorial, the following notebooks demonstrate using Dask with Azure: