page_type | languages | products | description | |||
---|---|---|---|---|---|---|
sample |
|
|
Learn how to read from cloud data and scale PyData tools (Numpy, Pandas, Scikit-Learn, etc.) with [Dask](https://dask.org) and Azure ML. |
"Dask natively scales Python" and "provides advanced parallelism for analytics, enabling performance at scale for the tools you love." It is open source, freely available, and sits in the PyData ecosystem of tools, develop in coordination with other projects like Numpy, Pandas, and Scikit-Learn. It provides familiar APIs for Python users, allows for low-level customization and streaming with a futures API, and scales up on clusters.
Dask is often compared to Spark - see this page to help evaluate which is the better tool for you. Common ML tools like Optuna, Scikit-Learn, XGBoost, LightGBM, and more can be distributed via Dask. There are numerous packages available for scaling on cloud clusters.
In this tutorial, the following notebooks demonstrate using Dask with Azure:
The main dask and distributed themselves are small and focused. Thousands of tools, some built by the Dask organization and most not, utilize Dask for parallel or distributed processing. Some of the most useful for data science include: