From 87e64feb6242454756ebefc6b54544473c010a10 Mon Sep 17 00:00:00 2001 From: Joshua Lochner Date: Thu, 6 Jun 2024 17:28:02 +0200 Subject: [PATCH 1/2] Add support for decision transformer (Closes #794) --- README.md | 3 ++- docs/snippets/5_supported-tasks.snippet | 2 +- docs/snippets/6_supported-models.snippet | 1 + scripts/supported_models.py | 17 +++++++++++++++++ src/models.js | 12 ++++++++++++ 5 files changed, 33 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 6f0eeadff..2b2712288 100644 --- a/README.md +++ b/README.md @@ -260,7 +260,7 @@ You can refine your search by selecting the task you're interested in (e.g., [te | Task | ID | Description | Supported? | |--------------------------|----|-------------|------------| -| [Reinforcement Learning](https://huggingface.co/tasks/reinforcement-learning) | n/a | Learning from actions by interacting with an environment through trial and error and receiving rewards (negative or positive) as feedback. | ❌ | +| [Reinforcement Learning](https://huggingface.co/tasks/reinforcement-learning) | n/a | Learning from actions by interacting with an environment through trial and error and receiving rewards (negative or positive) as feedback. | ✅ | @@ -286,6 +286,7 @@ You can refine your search by selecting the task you're interested in (e.g., [te 1. **[ConvNeXTV2](https://huggingface.co/docs/transformers/model_doc/convnextv2)** (from Facebook AI) released with the paper [ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders](https://arxiv.org/abs/2301.00808) by Sanghyun Woo, Shoubhik Debnath, Ronghang Hu, Xinlei Chen, Zhuang Liu, In So Kweon, Saining Xie. 1. **[DeBERTa](https://huggingface.co/docs/transformers/model_doc/deberta)** (from Microsoft) released with the paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen. 1. **[DeBERTa-v2](https://huggingface.co/docs/transformers/model_doc/deberta-v2)** (from Microsoft) released with the paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen. +1. **[Decision Transformer](https://huggingface.co/docs/transformers/model_doc/decision_transformer)** (from Berkeley/Facebook/Google) released with the paper [Decision Transformer: Reinforcement Learning via Sequence Modeling](https://arxiv.org/abs/2106.01345) by Lili Chen, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Michael Laskin, Pieter Abbeel, Aravind Srinivas, Igor Mordatch. 1. **[DeiT](https://huggingface.co/docs/transformers/model_doc/deit)** (from Facebook) released with the paper [Training data-efficient image transformers & distillation through attention](https://arxiv.org/abs/2012.12877) by Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, Hervé Jégou. 1. **[Depth Anything](https://huggingface.co/docs/transformers/main/model_doc/depth_anything)** (from University of Hong Kong and TikTok) released with the paper [Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data](https://arxiv.org/abs/2401.10891) by Lihe Yang, Bingyi Kang, Zilong Huang, Xiaogang Xu, Jiashi Feng, Hengshuang Zhao. 1. **[DETR](https://huggingface.co/docs/transformers/model_doc/detr)** (from Facebook) released with the paper [End-to-End Object Detection with Transformers](https://arxiv.org/abs/2005.12872) by Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, Sergey Zagoruyko. diff --git a/docs/snippets/5_supported-tasks.snippet b/docs/snippets/5_supported-tasks.snippet index ee682ffca..0d1929de1 100644 --- a/docs/snippets/5_supported-tasks.snippet +++ b/docs/snippets/5_supported-tasks.snippet @@ -67,4 +67,4 @@ | Task | ID | Description | Supported? | |--------------------------|----|-------------|------------| -| [Reinforcement Learning](https://huggingface.co/tasks/reinforcement-learning) | n/a | Learning from actions by interacting with an environment through trial and error and receiving rewards (negative or positive) as feedback. | ❌ | +| [Reinforcement Learning](https://huggingface.co/tasks/reinforcement-learning) | n/a | Learning from actions by interacting with an environment through trial and error and receiving rewards (negative or positive) as feedback. | ✅ | diff --git a/docs/snippets/6_supported-models.snippet b/docs/snippets/6_supported-models.snippet index 8fbdbffb6..f8ad89ae0 100644 --- a/docs/snippets/6_supported-models.snippet +++ b/docs/snippets/6_supported-models.snippet @@ -21,6 +21,7 @@ 1. **[ConvNeXTV2](https://huggingface.co/docs/transformers/model_doc/convnextv2)** (from Facebook AI) released with the paper [ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders](https://arxiv.org/abs/2301.00808) by Sanghyun Woo, Shoubhik Debnath, Ronghang Hu, Xinlei Chen, Zhuang Liu, In So Kweon, Saining Xie. 1. **[DeBERTa](https://huggingface.co/docs/transformers/model_doc/deberta)** (from Microsoft) released with the paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen. 1. **[DeBERTa-v2](https://huggingface.co/docs/transformers/model_doc/deberta-v2)** (from Microsoft) released with the paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen. +1. **[Decision Transformer](https://huggingface.co/docs/transformers/model_doc/decision_transformer)** (from Berkeley/Facebook/Google) released with the paper [Decision Transformer: Reinforcement Learning via Sequence Modeling](https://arxiv.org/abs/2106.01345) by Lili Chen, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Michael Laskin, Pieter Abbeel, Aravind Srinivas, Igor Mordatch. 1. **[DeiT](https://huggingface.co/docs/transformers/model_doc/deit)** (from Facebook) released with the paper [Training data-efficient image transformers & distillation through attention](https://arxiv.org/abs/2012.12877) by Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, Hervé Jégou. 1. **[Depth Anything](https://huggingface.co/docs/transformers/main/model_doc/depth_anything)** (from University of Hong Kong and TikTok) released with the paper [Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data](https://arxiv.org/abs/2401.10891) by Lihe Yang, Bingyi Kang, Zilong Huang, Xiaogang Xu, Jiashi Feng, Hengshuang Zhao. 1. **[DETR](https://huggingface.co/docs/transformers/model_doc/detr)** (from Facebook) released with the paper [End-to-End Object Detection with Transformers](https://arxiv.org/abs/2005.12872) by Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, Sergey Zagoruyko. diff --git a/scripts/supported_models.py b/scripts/supported_models.py index ecfb5b4e7..c738b0ae1 100644 --- a/scripts/supported_models.py +++ b/scripts/supported_models.py @@ -299,6 +299,23 @@ 'sileod/deberta-v3-large-tasksource-nli', ], }, + 'decision-transformer': { + # Reinforcement learning + 'reinforcement-learning': [ + 'edbeeching/decision-transformer-gym-hopper-expert', + 'edbeeching/decision-transformer-gym-hopper-medium', + 'edbeeching/decision-transformer-gym-hopper-medium-replay', + 'edbeeching/decision-transformer-gym-hopper-expert-new', + + 'edbeeching/decision-transformer-gym-halfcheetah-expert', + 'edbeeching/decision-transformer-gym-halfcheetah-medium', + 'edbeeching/decision-transformer-gym-halfcheetah-medium-replay', + + 'edbeeching/decision-transformer-gym-walker2d-expert', + 'edbeeching/decision-transformer-gym-walker2d-medium', + 'edbeeching/decision-transformer-gym-walker2d-medium-replay', + ], + }, 'deit': { # Image classification 'image-classification': [ diff --git a/src/models.js b/src/models.js index a8112912e..03aa4ea35 100644 --- a/src/models.js +++ b/src/models.js @@ -5458,6 +5458,17 @@ export class EfficientNetForImageClassification extends EfficientNetPreTrainedMo } ////////////////////////////////////////////////// +////////////////////////////////////////////////// +// Decision Transformer models +export class DecisionTransformerPreTrainedModel extends PreTrainedModel { } + +/** + * The model builds upon the GPT2 architecture to perform autoregressive prediction of actions in an offline RL setting. + * Refer to the paper for more details: https://arxiv.org/abs/2106.01345 + */ +export class DecisionTransformerModel extends DecisionTransformerPreTrainedModel { } + +////////////////////////////////////////////////// ////////////////////////////////////////////////// // AutoModels, used to simplify construction of PreTrainedModels @@ -5584,6 +5595,7 @@ const MODEL_MAPPING_NAMES_ENCODER_ONLY = new Map([ ['hifigan', ['SpeechT5HifiGan', SpeechT5HifiGan]], ['efficientnet', ['EfficientNetModel', EfficientNetModel]], + ['decision_transformer', ['DecisionTransformerModel', DecisionTransformerModel]], ]); const MODEL_MAPPING_NAMES_ENCODER_DECODER = new Map([ From d63485839addb519cc2a113d8e8ed159eb890472 Mon Sep 17 00:00:00 2001 From: Joshua Lochner Date: Thu, 6 Jun 2024 17:47:28 +0200 Subject: [PATCH 2/2] Comment out supported decision transformer models Models are in the `onnx-community` org on HF --- scripts/supported_models.py | 33 ++++++++++++++++----------------- 1 file changed, 16 insertions(+), 17 deletions(-) diff --git a/scripts/supported_models.py b/scripts/supported_models.py index c738b0ae1..dc044167e 100644 --- a/scripts/supported_models.py +++ b/scripts/supported_models.py @@ -299,23 +299,22 @@ 'sileod/deberta-v3-large-tasksource-nli', ], }, - 'decision-transformer': { - # Reinforcement learning - 'reinforcement-learning': [ - 'edbeeching/decision-transformer-gym-hopper-expert', - 'edbeeching/decision-transformer-gym-hopper-medium', - 'edbeeching/decision-transformer-gym-hopper-medium-replay', - 'edbeeching/decision-transformer-gym-hopper-expert-new', - - 'edbeeching/decision-transformer-gym-halfcheetah-expert', - 'edbeeching/decision-transformer-gym-halfcheetah-medium', - 'edbeeching/decision-transformer-gym-halfcheetah-medium-replay', - - 'edbeeching/decision-transformer-gym-walker2d-expert', - 'edbeeching/decision-transformer-gym-walker2d-medium', - 'edbeeching/decision-transformer-gym-walker2d-medium-replay', - ], - }, + # TODO: Add back in v3 + # 'decision-transformer': { + # # Reinforcement learning + # 'reinforcement-learning': [ + # 'edbeeching/decision-transformer-gym-hopper-expert', + # 'edbeeching/decision-transformer-gym-hopper-medium', + # 'edbeeching/decision-transformer-gym-hopper-medium-replay', + # 'edbeeching/decision-transformer-gym-hopper-expert-new', + # 'edbeeching/decision-transformer-gym-halfcheetah-expert', + # 'edbeeching/decision-transformer-gym-halfcheetah-medium', + # 'edbeeching/decision-transformer-gym-halfcheetah-medium-replay', + # 'edbeeching/decision-transformer-gym-walker2d-expert', + # 'edbeeching/decision-transformer-gym-walker2d-medium', + # 'edbeeching/decision-transformer-gym-walker2d-medium-replay', + # ], + # }, 'deit': { # Image classification 'image-classification': [