Skip to content

Latest commit

 

History

History
 
 

using-dask

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
page_type languages products description
sample
python
azurecli
azure-machine-learning
Learn how to read from cloud data and scale PyData tools (Numpy, Pandas, Scikit-Learn, etc.) with [Dask](https://dask.org) and Azure ML.

Using Dask

"Dask natively scales Python" and "provides advanced parallelism for analytics, enabling performance at scale for the tools you love." It is open source, freely available, and sits in the PyData ecosystem of tools, develop in coordination with other projects like Numpy, Pandas, and Scikit-Learn. It provides familiar APIs for Python users, allows for low-level customization and streaming with a futures API, and scales up on clusters.

Dask is often compared to Spark - see this page to help evaluate which is the better tool for you. Common ML tools like Optuna, Scikit-Learn, XGBoost, LightGBM, and more can be distributed via Dask. There are numerous packages available for scaling on cloud clusters.

In this tutorial, the following notebooks demonstrate using Dask with Azure:

The main dask and distributed themselves are small and focused. Thousands of tools, some built by the Dask organization and most not, utilize Dask for parallel or distributed processing. Some of the most useful for data science include: