Skip to content
424cfb9
Compare
Choose a tag to compare

v0.10.0a1

Pre-release
Pre-release

This is the release notes of v0.10.0a1. See here for the complete list of solved issues and merged PRs.

New Features

  • Oscar
    • Stop importing main module when starting Mars local cluster (#3110)
  • Tensor
    • Integrate special error functions (#3060)
    • Integrate part of scipy elliptic functions and integrals (#3111)
  • DataFrame

Enhancements

  • Disable bloom filter in merge for now (#2967)
  • [Ray] Implement ray task executor progress (#3008)
  • Dump remote tracebacks to make local ones more friendly (#3028)
  • Use tell when remove mapper data after execution (#3027)
  • Optimize import speed for Mars package (#3022)
  • Do not aggressively choose tree method in tile of groupby for distributed setting (#3032)
  • [Ray] Implements get_chunks_result for Ray execution context (#3023)
  • Refine ThreadedServiceContext.get_chunks_meta usage (#3037)
  • Shuffle both sides at the same time for md.merge (#3041)
  • Assign reducer ops in task assigner to make them more balanced across cluster (#3048)
  • [Ray] Destroy Ray executor when the task finish (#3049)
  • [Ray] Implements get_chunks_meta for Ray execution context (#3052)
  • [Ray] Support basic subtask retry and lineage reconstruction (#2969)
  • Combine tree and shuffle methods in DataFrameGroupBy.agg tile (#3051)
  • [Ray] Implements get_total_n_cpu for Ray execution context (#3059)
  • [Ray] Implement cancel method on Ray task executor (#3044)
  • Use OS-designated ports instead of random ports to create sub pools (#3053)
  • Unify DataFrameGroupByAgg's tile logic for auto method (#3084)
  • Simplify router clean up when pools or clusters ends (#3086)
  • Call immutable web API only once when previous call blocks (#3085)
  • [Ray] Create RayTaskState actor as needed by default (#3081)
  • [Ray] Implement gc for ray task executor context (#3061)
  • Simplify argument passing in actor batch calls (#3098)
  • Optimize performance of transfer (#3091)
  • Add n_reducers and reducer_ordinal to shuffle operands (#3055)
  • Optimize serializable memory (#3120)

Bug fixes

  • Fix errors when deleting mapper data (#3018)
  • Fix recursive_tile that it may cause duplicated tile for one tileable (#3021)
  • Fix error message when sparse data format not supported (#3046)
  • Patch pandas to make pickle compatible between 1.2 and 1.3 (#3047)
  • Fix chunk index error in auto_merge_chunks (#3057)
  • [Ray] Fix ray worker failover (#3080)
  • [Metric] Fix prometheus metric backend (#3124)
  • Fix mt.{cumsum, cumprod} when the first chunk is empty (#3134)

Tests

  • Check initialization of serializables on CI (#3007)
  • Use @pytest_asyncio.fixture instead of @pytest.fixture for async fixtures (#3025)
  • Change code owners to Mars PMC maintainers (#3031)
  • [Ray] Fix ray executor progress test (#3033)
  • [Ray] Optimize Ray CI execution time and stability (#3102)
  • Make test_session_set_progress more stable under Ray tests (#3103)
  • Update pytest imports for test_special.py (#3129)
  • [Ray] Fix flaky test test_optional_supervisor_node (#3133)

Others

  • Build web code before CIBW when deploying to PyPI (#3014)
  • Make PyPI user name configurable (#3130)
5908922
Compare
Choose a tag to compare

This is the release notes of v0.9.0. See here for the complete list of solved issues and merged PRs.

This release note only covers the difference from v0.9.0rc3; for all highlights and changes, please refer to the release notes of the pre-releases:

alpha1
alpha2
beta1
beta2
rc1
rc2
rc3

Changes that break compatibility

From v0.9 on, Python 3.6 is dropped support.

Highlights

  • Performance is fully optimized in this version, welcome to give your feedback.

New Features

  • Oscar
    • Stop importing main module when starting Mars local cluster (#3113)
  • Tensor
    • Integrate special error functions (#3062)
    • Integrate part of scipy elliptic functions and integrals (#3112)
  • DataFrame

Enhancements

  • Dump remote tracebacks to make local ones more friendly (#3030)
  • Optimize import speed for Mars package (#3035)
  • [Ray] Implement ray task executor progress (#3065)
  • Shuffle both sides at the same time for md.merge (#3066)
  • Refine ThreadedServiceContext.get_chunks_meta usage (#3067)
  • Do not aggressively choose tree method in tile of groupby for distributed setting (#3070)
  • Disable bloom filter in merge for now (#3071)
  • [Ray] Implements get_chunks_result for Ray execution context (#3072)
  • Use tell when remove mapper data after execution (#3073)
  • Assign reducer ops in task assigner to make them more balanced across cluster (#3075)
  • [Ray] Destroy Ray executor when the task finish (#3074)
  • Combine tree and shuffle methods in DataFrameGroupBy.agg tile (#3077)
  • [Ray] Implements get_chunks_meta for Ray execution context (#3076)
  • Use OS-designated ports instead of random ports to create sub pools (#3087)
  • Call immutable web API only once when previous call blocks (#3088)
  • Unify DataFrameGroupByAgg's tile logic for auto method (#3094)
  • [Ray] Support basic subtask retry and lineage reconstruction (#3097)
  • Simplify argument passing in actor batch calls (#3100)
  • [Ray] Implements get_total_n_cpu for Ray execution context (#3104)
  • Optimize performance of transfer (#3105)
  • Add n_reducers and reducer_ordinal to shuffle operands (#3107)
  • [Ray] Implement cancel method on Ray task executor (#3093)
  • [Ray] Create RayTaskState actor as needed by default (#3114)
  • [Ray] Implement gc for ray task executor context (#3116)
  • Optimize serializable memory (#3126)

Bug fixes

  • Patch pandas to make pickle compatible between 1.2 and 1.3 (#3050)
  • Fix errors when deleting mapper data (#3064)
  • Fix chunk index error in auto_merge_chunks (#3068)
  • Fix recursive_tile that it may cause duplicated tile for one tileable (#3069)
  • [Ray] Fix ray worker failover (#3115)
  • [Ray] Fix pandas schema parsing when reading Ray dataset (#3117)
  • [Ray] fix auto scale-in hang (#3125)
  • [Metric] Fix prometheus metric backend (#3127)
  • Fix mt.{cumsum, cumprod} when the first chunk is empty (#3136)

Tests

  • Check initialization of serializables on CI (#3013)
  • [Ray] Optimize Ray CI execution time and stability (#3121)
  • Update pytest imports for test_special.py (#3131)
  • [Ray] Fix flaky test test_optional_supervisor_node (#3135)

Others

  • Build web code before CIBW when deploying to PyPI (#3016)
e9b8e79
Compare
Choose a tag to compare

This is the release notes of v0.8.7.

Bug fixes

  • Fixes missing web packages in Linux wheels (#3014)
03ed810
Compare
Choose a tag to compare

v0.9.0rc3

Pre-release
Pre-release

This is the release notes of v0.9.0rc3. See here for the complete list of solved issues and merged PRs.

New Features

  • Tensor
  • Services
    • Support worker meta service (#2909)
    • Basic Ray execution backend (#2921)

Enhancements

  • Add execution API to enable custimization of Mars Task Service (#2894)
  • Optimize serialization performance (#2914)
  • Skip adding band in meta when fetch shuffle data (#2922)
  • Store complete meta on worker and update supervisor meta via fetching from workers (#2912)
  • Use cython to accelerate core serialization (#2924)
  • Refine lifecycle api to support incref or decref with ref counts (#2926)
  • Ignore fetch operands when assign initial nodes (#2929)
  • Use cython to accelerate message serialization (#2932)
  • Ignore broadcaster's locality when assign subtasks (#2943)
  • Allow spawning serialization to threads for large objects (#2944)
  • Add metrics and event report for Ray channels (#2936)
  • Add more logs about execution info (#2940)
  • Add support for dask.persist (#2953, thanks @loopyme!)
  • Remove should_be_monotonic property (#2949)
  • Add metrics on operand and subtask executions (#2947, thanks @zhongchun!)
  • [Ray] optimize ray fetcher by query in remote node (#2957)
  • Improve deploy backend (#2958)
  • Support reporting tile progress (#2954)
  • Add logic key for tileable graph (#2961, thanks @zhongchun!)
  • [Ray] Loads the subtask inputs from meta (#2976)
  • New ExecutionConfig API (#2968)
  • Fix speculative execution compatibility with coloring (#2995)
  • Make functions that may take long run in thread for lifecycle tracker (#2992)
  • Optimize metric configs (#2996, thanks @zhongchun!)
  • Expand the ability of resource evaluator (#2997, thanks @zhongchun!)
  • Optimize gen subtask graph (#3004)
  • [Ray] Ray execution state (#3002)

Bug fixes

  • Fix paramter issue of worker actor pool (#2911, thanks @zhongchun!)
  • Fix default config to ensure storage backends configured (#2935)
  • Wrap errors in operand execution to protect scheduling service (#2964)
  • Fix dtype of series result for DataFrame.apply (#2978)
  • Fix potential data leak for shuffle tasks (#2975)
  • Fix potential empty chunks when creating DataFrame from pandas (#2987)
  • [Ray] Support new ray cluster through ray client (#2981)
  • Fix missing extra_params when constructing operands (#2999)
  • Fix msg_to_simple_str in Ray backend and add tests (#3003)
  • Fix incorrect result for df.sort_values when specifying multiple ascending (#2984)

Documentation

Tests

  • Add TPC-H benchmarks (#2937)
  • Fix Ray cases (#2983)
  • Fix version mismatch between kubernetes and minikube (#2986)
  • Allow selecting TPC queries (#3005)
e550ae4
Compare
Choose a tag to compare

This is the release notes of v0.8.6. See here for the complete list of solved issues and merged PRs.

New Features

  • Tensor

Enhancements

  • Add support for dask.persist (#2990, thanks @loopyme!)
  • Optimize gen subtask graph (#3006)
  • Ignore broadcaster's locality when assign subtasks (#2994)

Bug fixes

  • Fix task hang when error object cannot be pickled (#2913)
  • Fix potential KeyError in actor_ref calls when running with multiple processes (#2962)
  • Wrap errors in operand execution to protect scheduling service (#2971)
  • Fix dtype of series result for DataFrame.apply (#2979)
  • Fix default config to ensure storage backends configured (#2989)
  • Fix potential empty chunks when creating DataFrame from pandas (#2991)
  • Fix incorrect result for df.sort_values when specifying multiple ascending (#3006)
  • Fix missing extra_params when constructing operands (#3006)

Tests

  • Fix version mismatch between kubernetes and minikube (#2988)
dc93f88
Compare
Choose a tag to compare

v0.9.0rc2

Pre-release
Pre-release

This is the release notes of v0.9.0rc2. See here for the complete list of solved issues and merged PRs.

New Features

  • Web
    • Add stack display page on Mars Web (#2876)

Enhancements

  • Avoid printing too many messages in Oscar (#2871)
  • Expand slot scheduler to resource scheduler (#2846, thanks @zhongchun!)
  • Optimized iterative tiling by pruning unrelated chunks (#2874)
  • Optimize DataFrameIsin's tile (#2864)
  • Add benchmark for serialization (#2901)
  • [Ray] Ray client channel get recv when first complied (#2740, thanks @Catch-Bull!)
  • Use bloom filter to optimize df.merge execution (#2895)
  • Stop recording all mapper meta (#2900)
  • [Ray] Use main pool as owner when autoscale disabled (#2878)

Bug fixes

  • Fix XGBoost when some workers do not have evals data (#2861)
  • Fix duplicate node iteration in GraphAssigner (#2857)
  • Raise ActorNotExist when no supervisors available (#2859)
  • Fix dtype infer in DataFrame arithmetic on datetime consts (#2879)
  • Fix timeout for wait_task (#2883)
  • Make sure error can be raised in Actor.__pre_destroy__ (#2887)

Tests

  • Upgrade azure-pipelines to Python 3.9 (#2862)
  • Adapt to official cancel of Github Actions (#2902)
ed300c5
Compare
Choose a tag to compare

This is the release notes of v0.8.5. See here for the complete list of solved issues and merged PRs.

New Features

  • Web
    • Add stack display page on Mars Web (#2881)

Enhancements

  • Avoid printing too many messages in Oscar (#2880)
  • [Ray] Use main pool as owner when autoscale disabled (#2903)

Bug fixes

  • Fix XGBoost when some workers do not have evals data (#2863)
  • Raise ActorNotExist when no supervisors available (#2869)
  • Fix dtype infer in DataFrame arithmetic on datetime consts (#2880)
  • Fix duplicate node iteration in GraphAssigner (#2880)
  • Fix timeout for wait_task (#2890)
  • Make sure errors can be raised in Actor.__pre_destroy__ (#2892)

Tests

  • Upgrade azure-pipelines to Python 3.9 (#2886)
  • Adapt to official cancel of Github Actions (#2903)
96af4fa
Compare
Choose a tag to compare

v0.9.0rc1

Pre-release
Pre-release

This is the release notes of v0.9.0rc1. See here for the complete list of solved issues and merged PRs.

New Features

  • Tensor
    • Implements mars.tensor.setdiff1d (#2823)
  • Learn
    • Added support for mars.learn.metrics.roc_auc_score (#2832)
  • Services
    • A speculative execution based task scheduler (#2576)
  • Metric
  • Others
    • Use versioneer to manage release versions (#2806)

Enhancements

  • Support generating a DOT file for subtask graph (#2803)
  • Support generating dtypes, index_value etc lazily for DataFrame chunks (#2756)
  • [ray] Default enable fault tolerance for ray (#2801)
  • Improve subtask details in logs (#2836)
  • Accurate resource management for global slot manager (#2732)
  • Configure nthread of XGBoost jobs (#2844)
  • Improved performance of mars.learn.metrics.{roc_curve, roc_auc_score} (#2838)
  • Bump minimist and nanoid in Mars UI due to security alerts (#2849)
  • Fix store duplicate chunk and meta per subtask (#2845)

Bug fixes

  • Fix default value of gpu property for some operands (#2811)
  • Fixes the failure on Vineyard CI by ensure the input tensor chunk is a numpy's ndarray (#2817)
  • Fix race condition of set_subtask_result (#2784)
  • Fix duplicate subtask submit (#2815)
  • Change StorageHandlerActor to stateful (#2824)
  • Fix running xgboost on Ray cluster (#2826)
  • Fix FileSystem.ls for OSS (#2837)
  • Stop fetching data when pure dependencies specified (#2840)
  • Fix dirty version number caused by versioneer when building with cibuildwheel (#2855)

Tests

  • [Ray] Refine ray tests (#2793)
  • Build docker images cronically (#2804)
  • Introduce asv benchmark (#2798)
9a83bb8
Compare
Choose a tag to compare

This is the release notes of v0.8.4. See here for the complete list of solved issues and merged PRs.

New Features

  • Tensor
    • Implements mars.tensor.setdiff1d (#2829)
  • Learn
    • Added support for mars.learn.metrics.roc_auc_score (#2841)
  • Others
    • Use versioneer to manage release versions (#2807)
    • Use cibuildwheel to release wheels (#2854)

Enhancements

  • Support generating a DOT file for subtask graph (#2818)
  • Enhance subtask details in logs (#2842)
  • Configure cores of XGBoost jobs (#2847)
  • Improved performance of mars.learn.metrics.{roc_curve, roc_auc_score} (#2850)
  • Fix store duplicate chunk and meta per subtask (#2851)
  • Bump minimist and nanoid in Mars UI due to security alerts (#2851)

Bug fixes

  • Fix race condition of set_subtask_result (#2819)
  • Fix duplicate subtask submit (#2819)
  • Fixes the failure on Vineyard CI by ensure the input tensor chunk is a numpy's ndarray (#2819)
  • Fix default value of gpu property for some operands (#2820)
  • Fix running xgboost on Ray cluster (#2830)
  • Change StorageHandlerActor to stateful (#2830)
  • Fix FileSystem.ls for OSS (#2842)
  • Stop fetching data when pure dependencies specified (#2843)

Tests

  • [Ray] Refine ray tests (#2810)
  • Build docker images cronically (#2807)
9b3cc49
Compare
Choose a tag to compare

v0.9.0b2

Pre-release
Pre-release

This is the release notes of v0.9.0b2. See here for the complete list of solved issues and merged PRs.

New Features

Enhancements

  • Simplify rechunk implementation (#2745)
  • Stop inferring outputs when args provided (#2759)
  • Add broadcast merge support for DataFrame (#2772)
  • Remove deprecate warnings when import mars.tensor (#2788)
  • Optimize in-process actor calls (#2763)
  • [ray] New ray actor creation model (#2783)

Bug fixes

  • Fix duplicate dec object ref (#2741, thanks @Catch-Bull!)
  • Fix long exception of asyncio.gather (#2748)
  • Fix NameError: name 'pq' is not defined if pyarrow is not installed (#2751)
  • Fix profiling band_subtasks and most_calls are empty if the slow duration is large (#2755)
  • Fix the wrong result of df.merge (#2774)
  • Fix DataFrame initializer when Mars object exists in list (#2770)
  • [ray] support ray client mode (#2773)

Tests

  • Increase test stability for command-line tests (#2779)