17 Mar 01:56

060f7ba

v1.4.1 release

中文版本

新模型推荐

序号	模型名称&快捷链接	贡献组织
1	ChatGLM-中英对话大模型-6B	智谱.AI
2	GLM130B-中英大模型	智谱.AI
3	unidiffuser-v1	清华TSAIL
4	元语功能型对话大模型v2	元语智能
5	盘古α 2.6B	鹏城实验室
6	openjourney	个人开发者-dienstag
7	Rwkv-4-pile-14b	个人开发者-Blink_DL
8	SiameseUIE通用信息抽取-中文-base
9	SiameseUniNLU零样本通用自然语言理解-中文-base · 模型库 (modelscope.cn)

高亮功能

外部repo可以以插件形式和modelscope库协同工作
增加SCRFD模型的onnx导出
增加damoyolo模型的onnx导出
支持序列标注模型的onnx/torchscript导出
支持cartoon模型的pb文件导出
重构taskdataset模块，用户现在可定制自己的数据集逻辑了
增加text-generation任务的examples，同样适用于GPT3
Siamese uie模型支持finetune

功能列表

推理和训练中支持torch2.0 compile，注意因为测试尚不充分因此有些模型可能遇到错误
Add adadet库的trainer支持
ddcolor image colorization支持训练
增加video_instance_segmentation推理能力
增加CLI工具的插件能力
增加human reconstruction任务
增加vidt模型
增加speech_timestamp任务
增加disco guided diffusion模型
ocr_reco_crnn支持训练
增加action detection的训练
增加ocr_detection_db的训练
增加 lore lineness table recognition任务
增加PEER模型
增加damoyolo的烟雾探测模型
增加RLEG模型
增加用于视频实例追踪的ProContEXT模型
增加视频感知模型longshortnet
增加dingding去噪模型
支持vision efficient tuning
支持text-to-video-synthesis任务

功能提升

text-generation推理支持args输入
在video temporal grounding中支持soonet
trainer支持DDPHook
Kws支持继续训练能力
支持GPU上的正确DDIM采样能力
增加更多的CLI工具
修改语音推理的输入输出
优化kws的配置
ImagePaintbyexamplePipeline支持demoservice
支持easycv trainer的load_from

BugFix

修复安装detecron2的报错
修复generate_scp_from_url方法的报错
修复speaker_verification_pipeline和speaker_diarization_pipeline
修复data releate case失败的问题
修复ast扫描失败的问题
修复Word alignment预处理器的bug

English Version

New Model List and Quick Access

No	Model Name & Link	Org
1	ChatGLM-English&Chinese-6B	ZhiPu.AI
2	GLM130B-LLM English&Chinese	ZhiPu.AI
3	unidiffuser-v1	TsingHua TSAIL
4	ChatYuan-large-v2	YuanYu
5	OpenICommunity/pangu_2_6B	PengCheng Lab
6	openjourney	personal-dienstag
7	Rwkv-4-pile-14b	personal-Blink_DL
8	SiameseUIE information extraction-Chinese-base
9	SiameseUniNLU zero-shot NLU Chinese base model

Highlight

Support repos work with modelscope library via plugin
Support onnx export for SCRFD model
Add onnx exporter for damoyolo
Add onnx/torchscript exporter for token classification models
Add frozen graph def exporter for cartoon model
Refactor taskdataset module, user now can write datasets with custom logics
Add example for text-generation finetuning, also available for GPT3
Siamese uie finetune support

Breaking changes

Feature

Support torch2.0 compile in inference and training, this feature is not stable on all models
Add ADADET && thirdparty arg for damoyolo trainer
Add finetune for ddcolor image colorization
Add video_instance_segmentation pipeline
Add plugin with cli tool
Add human reconstruction task
Add vidt model
Add task: speech_timestamp
Add disco guided diffusion
Add training support for ocr_reco_crnn
Add action detection finetune
Add ocr_detection_db training module
Add lore lineness table recognition
Add PEER model
Add smoke and fire detection model using damoyolo
Add generative multimodal embedding model RLEG
Add vop_se for text video retrival
Add ProContEXT model for video single object tracking
Add video streaming perception models longshortnet
Add dingding denoise model
Support vision efficient tuning finetune
Add text-to-video-synthesis
Add MAN for image-quality-assessment

Improvements

-Support run text generation pipeline with args

Add soonet for video temporal grounding
Trainer support parallel_groups setting and DDP hook
Kws support continue training from a checkpoint
Correct DDIM sampling on GPU
Add more cli tools
Modify audio input types && punc postprocess
Optimize kws pipeline and training conf
Support ImagePaintbyexamplePipeline demo service
Support load_from for easycv trainer

BugFix

Fix bug for install detecron2
Fix bug for modify function generate_scp_from_url
Fix bug for speaker_verification_pipeline and speaker_diarization_pipeline: re-write the default config with configure.json
Fix bug for data releate case failed bug
Fix bug for ast scan funcitondef
Word alignment preprocessor fix

Assets 2

02 Mar 15:06

wenmengzhou

v1.3.2

32cd90f

v1.3.2 release

中文版本

新模型列表及快捷访问

该小版本共新增上架6个模型，其中新增2个模型支持finetune能力。

序号	模型名称&链接	支持finetune
1	ControlNet可控图像生成
2	兰丁宫颈细胞AI辅助诊断模型
3	读光-文字检测-DB行检测模型-中英-通用领域
4	SOND说话人日志-中文-alimeeting-16k-离线-pytorch
5	NeRF快速三维重建模型	√
6	DCT-Net人像卡通化	√

Feature

GPT3 Finetune功能完善，支持DDP+tensor parallel， finetune流程串接推理流程优化
checkpoint保存逻辑优化，确保周期性保存和最优保存的文件可以直接用于推理
Hooks方案重构，解耦各个功能hook，支持hooks间交互
支持ImagePaintbyExamplePipeline demo service
支持多种音频类型
支持Petr3D CPU推理支持兼容新版mmcv
deberta v2 预处理器更新
支持NLP下游任务模型初始化仅加载backbone预训练权重
更新librosa.resample()参数支持最新版本
添加下游工具箱调用埋点统计功能

不兼容行问题

checkpoint保存分拆了模型参数和训练状态参数，老版本的模型参数需要转换后加载

问题修复：

修复asr vad/lm/punc输入处理
修复gpt moe finetune checkpoint path error
修复args lm_train_conf is invalid
修复删除已有文件ci测试报错
修复OCR识别bug
移除preprocessing stage中图像分辨率的限制
修复输出wav文件是32-bit float而不是预期的16-bit int
设置num_workers=0，以防止在demo-service中创建子进程

English Version

New Model List and Quick Access

This minor version adds a total of six new models, including two models with finetuning capability.

No.	Model Name & Link	Finetuning Supported
1	ControlNet Controllable Image Generation
2	Landing AI Cervical Cell AI-assisted Diagnosis Model
3	Reading Light - Text Detection - DB Row Detection Model - Chinese and English - General Domain
4	SOND Speaker Diary - Chinese - Alimeeting-16k - Offline - PyTorch
5	NeRF Fast 3D Reconstruction Model	√
6	DCT-Net Person Image Cartoonization	√

Features

GPT-3 finetune has been improved to support DDP+tensor parallel
Checkpoint saving logic has been optimized to ensure that files saved periodically and those saved as the best can be used directly by pipeline
The Hooks scheme has been refactored to decouple various functional hooks and support interaction between hooks.
Supports ImagePaintbyExamplePipeline demo service
Supports multi-machine data and tensor parallel finetuning for cartoon task
Supports various audio types
Supports Petr3D CPU inference with compatibility for the latest version of mmcv
Updates deberta v2 preprocessor
Supports initialization of downstream NLP task models with only backbone pre-training weights loaded
Updates librosa.resample() parameter support to the latest version
Adds downstream toolbox call tracking function

Break changes

Saving model parameters and training state seperately, so previous trained checkpoints should be converted before resume training

Bug Fixes:

Fixes asr vad/lm/punc input processing
Fixes gpt moe finetune checkpoint path error
Fixes args lm_train_conf is invalid
Fixes ci test errors when deleting existing files
Fixes OCR recognition bugs
Removes image resolution restrictions in preprocessing stage
Fixes output wav file being 32-bit float instead of expected 16-bit int
Sets num_workers=0 to prevent creating sub-processes in demo-service.

Assets 2

20 Feb 11:57

zzhangpurdue

v1.3.0

ea9bd3c

v1.3.0 release

中文版本

该版本共新增上架51个模型，其中11个模型支持finetune能力。

模型功能特性说明

提供finetune的示例脚本，允许用户通过运行脚本命令行传参方式进行模型训练，详细可以参考github脚本
NLP领域新增了backbone + head的开发支持，允许用户任意组合已有的backbone(Encoder) 和任务head，方便在特定任务上切换不同模型进行建模，详细参考文档
贡献者文档完善模型贡献部分，详细参考接入流程概览
数据集接口支持本地文件直接加载 MsDataset.load('/to/path/abc.csv')
模型导出支持nlp_structbert_zero-shot、 nlp_csanmt_translation系列模型

更多SDK功能和变更可查看：https://github.com/modelscope/modelscope/releases/tag/v1.3.0

新模型列表及快捷访问

序号	模型名称&链接	支持finetune
1	NAFNet图像去模糊	√
2	BEiTv2图像分类-通用-base	√
3	BEiTv2图像分类-通用-large	√
4	实时人头检测-通用	√
5	实时手机检测-通用	√
6	NAFNet图像去模糊压缩	√
7	DINO-高精度目标检测模型	√
8	StructBERT文本相似度-中文-电商-base	√
9	StructBERT事实准确性检测-中文-电商-base	√
10	StructBERT FAQ问答-中文-金融领域-base	√
11	StructBERT FAQ问答-中文-政务领域-base	√
12	IR人脸识别模型FRIR
13	口罩人脸识别模型FRFM-large
14	人脸质量模型FQA
15	静默人脸活体检测模型-炫彩
16	运动生成-人体运动-英文
17	M2FP单人人体解析
18	DeOldify视频上色
19	图像质量MOS评估
20	异常图像检测
21	YOLOPV2车辆检测车道线分割-自动驾驶领域
22	DCT-Net人像卡通化-扩散模型-插画
23	DCT-Net人像卡通化-扩散模型-漫画
24	卡通系列文生图模型
25	卡通系列文生图模型-漫画风
26	卡通系列文生图模型-水彩风
27	卡通系列文生图模型-剪贴画
28	卡通系列文生图模型-扁平风
29	轻量级SRResNet视频超分辨率
30	ECBSR端上图像超分模型
31	实时交通标识检测-自动驾驶领域
31	多尺度局部平面引导的单目深度估计
33	uhdm图像去摩尔纹
34	M2FP多人人体解析
35	VFI-RAFT视频插帧-应用型
36	StableDiffusionV2图像填充
37	MT5开放域多轮对话改写-中文-通用-base
38	基础视觉模型高效调优-adapter
39	基础视觉模型高效调优-prompt
40	基础视觉模型高效调优-prefix
41	基础视觉模型高效调优-lora
42	视频全景分割-VideoKNet-SwinB
43	人脸重建模型
44	DDPM-Seg基于扩散模型的语义分割
45	DeepLPF图像调色
46	视频去场纹
47	Adaptive-Interval-3DLUT图像调色
48	RealESRGAN图像去色带
49	图像画质损伤分析
50	基于视觉和语言的知识蒸馏的开放词汇目标检测
51	针对长尾/小目标问题的高性能通用目标检测

最佳实践教程

最后，我们还推出许多任务级别和模型级别的最佳实践教程文档，旨在帮助开发者更好地理解和应用模型。

欢迎关注我们的开源社区：https://github.com/modelscope/modelscope

English Version

Highlight

Add vqa-degradation
Add content check pipeline
Add pipelines for en2zh-imt and zh2en-imt
Add single and multiple human parsing models
Add AdaInt model
Add open vocabulary detection
Support finetune for sentence-embedding
Add bad image detection model and pipeline
Support translation model exporting
Add asr dataset for finetune
Add ocr detection model and pipeline
Add face quality assessment model
Add video deinterlace model
Add language model for audio task
Add deeplpf for image color enhance and image debanding model
Add ecbsr model for mobile image super-resolution
Add msrresnetlite model for video super-resolution
Support finetune and evaluation for image-fewshot-detection-defrcn
Add yolopv2 model cv_yolopv2_image_driving_perception
Add face liveness xc model
Add paint-by-example model
Add universal_matting pipeline
Add multi-modal_gridvlp_classification_chinese-base-ecom-cate
Add DINO detection with easycv
Add speech speaker verification pipeline
Add nerf-recon model
Support finetune for real-time object detection with easycv
Add single-camera depth estimation bts model
Add MGIMN model
Add fuse-in-decoder dialogue task
Add vision_efficient_tuning models
Add traffic-sign detection
Add object_detection3d_depe model
Add stable diffusion model for image inpainting
Add head&phone detection models
Add face_reconstruction model
Add structured model probing pipeline for image classification
Add video panorama segmentation with VideoKNet-SwinB
Add image quality assessment mos(mean option score) model
Add ddpm-segmentation pipeline
Add plug mental model
Add video-colorization pipeline
Add image demoireing
Add face recognition ir model
Support batch inference for nlp_csanmt_translation_en2zh
Add ima...

Assets 2

09 Feb 07:48

wenmengzhou

v1.2.1

f1cdc04

v1.2.1 release

中文说明

语音领域依赖拆分为子领域，减少依赖安装
语音唤醒增加返回中文配置支持
funasr版本升级 & 语音识别、说话人确认、标点预测增加额外参数配置
移除基础框架对torchaudio的依赖

English

separate audio requirements
kws pipeline returns Chinese charactor by configuration
add args for asr_infer_pipeline, punc_pipeline, sv_pipeline & modify funasr version
re-place import torchaudio to avoid unnecessary requirements in framework

Assets 2

18 Jan 12:53

wenmengzhou

v1.2.0

8406c31

v1.2.0 release

中文版本

该版本共新增上架38个模型，其中14个模型支持finetune能力。

模型功能特性说明

高性能检测热门应用系列， 基于精度和速度均超越当前经典YOLO系列、面向工业落地的高性能检测框架DAMOYOLO，新增实时口罩检测模型、实时安全帽检测模型、实时人体检测模型、实时香烟检测模型上线，提供开箱即用的高效体验
语音识别、语音合成以及语音唤醒可以基于Modelscope Python SDK进行模型finetune
语音合成，新增方言模型四川话、广东粤语与上海话，新增俄语与韩语外语模型
- SambertHifigan语音合成-四川话-通用领域-16k-发音人chuangirl, 方言四川话女声模型
- SambertHifigan语音合成-广东粤语-通用领域-16k-发音人jiajia，方言广东话女声模型
- SambertHifigan语音合成-上海话-通用领域-16k-发音人xiaoda，方言上海话女声模型
- SambertHifigan语音合成-俄语-通用领域-16k-发音人masha，俄语女声模型
- SambertHifigan语音合成-韩语-通用领域-16k-发音人kyong，韩语女声模型
语音文件后处理
- 新增英语、德语、菲律宾语、韩语、越南语、日语、俄语、印尼语、葡萄牙语、法语、西班牙等11中语言的文本规整模型
图像人脸融合
- 自动进行人脸区域提取&对齐，并完成面部特征提取，无需额外预处理。
- 引入3D重建网络对脸型进行拟合迁移，使得融合后的脸型相似度更高。
人脸人体
- GPEN人像增强修复-大分辨率人脸，基于GPEN框架，收集超大分辨率人脸数据训练的1024和2048模型。
视觉编辑
- DDColor图像上色，相比Deoldify等之前方法在色彩丰富度和语义贴合上大幅提升。
- VFI-RAFT视频插帧，和其它SOTA模型相比，在大运动和重复纹理场景下有较好的插帧效果。
- DUT-RAFT视频稳像，对多种视频抖动都有稳定的去抖效果，相比原生DUT，能够更好地保持视频清晰度。
底层视觉
- RealBasicVSR视频超分辨率，对于大部分真实场景的视频超分辨率效果良好，对于小部分降质十分严重的情况可能表现不佳。

非兼容性修改

文图生成任务输出类型改为多图输出
语音合成任务输出数据从output_pcm改为output_wav

新模型列表及快捷访问

贡献组织	模型名称
哔哩哔哩	RealCUGAN图像超分辨率
元语智能	元语功能型对话大模型
封神榜	闻仲-GPT2-110M-中文-v2
封神榜	二郎神-RoBERTa-330M-文本相似度
封神榜	二郎神-RoBERTa-110M-自然语言推理
封神榜	二郎神-RoBERTa-330M-文本相似度
阿里巴巴AAIG	离散对抗训练ViT-H/14-鲁棒图像分类-imagenet1k
阿里云机器学习平台PAI	GPT-MoE中文67亿诗歌生成模型
阿里云机器学习平台PAI	GPT-MoE中文270亿作文生成模型
达摩院	读光-文字检测-单词检测模型-英文-VLPT预训练
达摩院	读光-文档理解-文档理解多模态预训练模型
达摩院	中文StableDiffusion-通用领域
达摩院	DDColor图像上色
达摩院	视频多目标跟踪-行人
达摩院	MaskDINO-SwinL图像实例分割
达摩院	VFI-RAFT视频插帧
达摩院	DUT-RAFT视频稳像
达摩院	RealBasicVSR视频超分辨率
达摩院	GPEN人像增强修复-大分辨率人脸
达摩院	YOLOX-PAI手部检测模型
达摩院	ConvNeXt图像分类-中文-垃圾分类
达摩院	BNext二值化图像分类-英文-通用-small
达摩院	实时口罩检测-通用
达摩院	实时安全帽检测-通用
达摩院	实时香烟检测-通用
达摩院	人脸活体检测模型
达摩院	人脸活体检测模型-IR
达摩院	MGeo多任务多模态地址预训练底座-中文-base
达摩院	MaSTS预训练模型-中文-搜索-CLUE语义匹配-large
达摩院	MaSTS文本相似度-中文-搜索-CLUE语义匹配-large
达摩院	NestedNER命名实体识别-中文-医疗领域-base
达摩院	CoROM文本向量-中文-电商领域-base
达摩院	CoROM语义相关性-中文-电商领域-base
达摩院	全任务零样本学习-mT5分类增强版-中文-base
达摩院	StructBERT情绪分类-中文-七分类-base
达摩院	HiTransUSE用户满意度估计-中文-电商-base
达摩院	UniASR语音识别-中文-通用-8k-实时-pytorch
达摩院	Paraformer语音识别-中文-通用-16k-离线-large-pytorch
达摩院	Paraformer语音识别-中文-通用-16k-离线-large-长音频版
达摩院	RaNER-chunking-英文-large
达摩院	mPLUG-HiTeA-视频问答模型-英文-Base
达摩院	mPLUG-HiTeA-视频描述-英文-Base
达摩院	Mask2Former-R50全景分割
达摩院	图像人脸融合
达摩院	春联生成模型-中文-base
达摩院	GPT-3夸夸机器人-中文-large
达摩院	BART文本纠错-中文-法律领域-large

English Version

Highlight

Add finetune support for DAMO-YOLO
Add new real-time mask detection model, real-time helmet detection model, real-time human body detection model, real-time cigarette detection model
Add finetune support asr, tts and kws model
Batch inference support for nlp and ofa based multi-modal tasks
Add high-resolution gpen model for face restoration
Add DDColor model for image colorization
Add VFI-RAFT model for video frame interpolation
Add DUT-RAFT model for video stabilization
Add RealBasicVSR model for video-super-resolution

Breaking changes

change output of task text to image to list of images
change output of task tts from output_pcm to output_wav

Feature

Add easyrobust-models for image classification
Video depth estimation support cpu mode
asr pipeline add output_dir parame
Add RTS face recognition ood model
Add image-defrcn-fewshot-detection
Add hires gpen model
Add mgeo finetune and pipeline
Add asr finetune & change inference
Add quadtree image matching pipeline
Add finetune for DAMO-YOLO
Add FLRGB Face Liveness RGB Model
Add speech separation finetune
Asr inference: support new models, punctuation, vad, sv
Add vop retrieval
Add NAFNet Image Deblurring pipeline and finetune support
Add megatron bert
Add panovit-layout-estimation-pipeline
Add vision middleware
Add panorama_depth_estimation
Unify token classfication model output
Faq support finetune and multilingual
Support stable diffusion and add DAMO chinese stable diffusion model
Add cv-bnext-image-classification-pipeline
Add VFI-RAFT model for video frame interpolation
Add face changing pipeline
Add DUT-RAFT model for video stabilization
Update token_cls default sequence_length: 128 -> 512
Add structure tasks for ofa: sudoku & text2sql
Add new ASR model speech_UniASR_asr_2pass-pt-16k-common-vocab1617-tensorflow1-offline and speech_UniASR_asr_2pass-pt-16k-com
Add model for multiple object tracking in video
Add ConvNeXt model
Add ppl metric
Add image colorization
Add User...

Assets 2

07 Dec 12:39

wenmengzhou

v1.1.1

c396ad5

v1.1 release

Highlight

Add Wenet model #35
Add code generation and code translation from ZHIPU #33
Hub support retry and continue-download after error
Add five finetune tasks for ofa
Add damoyolo-t and damoyolo-m
Add dpm-solver for diffusion models
Add image depth estimation pipeline
Add en-zh en-es es-en base translation models
Add GPT-3 tensor parallel finetuning

Feature

Add five finetune tasks for ofa
Add synonym for table question answering
Add jupyter lab plugin in docker
Add language_guided_video_summarization pipeline
Add nlp/addr/structure and update token classificaiton related method
Add damoyolo-t and damoyolo-m
Add Wenet model #35
Add code generation and code translation from ZHIPU #33
Add camouflaged-detection
Support batch inference in pipeline for some models
Add table recognition task
Add dpm-solver for diffusion models
Ofa add asr task
Add features for alimeeting competition dataset
Add funasr based asr inference
Add extractive-summarization and topic-segmentation
Add image depth estimation pipeline
Add en-zh en-es es-en base translation models
Add gpt-moe model and pipeline
Action-detection model predownload video before inference
Add finetune for cv/language_guided_video_summarization
Add plug finetune and pretrained model
Support license plate detection
Add nextvit-small_image-classification_Dailylife-labels model
Add support for UniTE
Add video human matting task
Add LSTMCRFForWordSegmentation
Add face mask model
support new asr paraformer and conformer model
Add GPT-3 tensor parallel finetuning
Update image-portait-enhancement trainer
Add FairFace face attribute model
Add facial landmark confidence model

Improvements

Hub support retry and continue-download after error
Refactor NLP and fix some user feedbacks
Speed up the ast indexing during editing
Add tensorboard hook for visualization
reduce the GPU usage of dialog trianer
substitute face detection model in skin_retouching_pipeline.py
update git-lfs install instruction

BugFix

fix output video path when person detect failed For 3d_body_keypoints
Fix lazy importing problem in text classification pipeline
Fix bug for distributed inference of gpt3
Fix bug for mplug evaluation
token preprocess bug fix
fix file encoding problem in windows
fix deadlock when setting the thread number up to 90 for kws model
fix bug in token classification postprecessor
fix: torch.concat compatibility with torch1.8
fix log print and extensions issue for datasets==2.5.2
fix interpolate value error for vitadapter semantic segmentation
nlp csanmt translation fix finetuning bug
Fix a bug that the logging file cannot save the correct lr, which is zero instead
fix bug of tableQA on gpu
Fix bug for text generation task model
fix download file timeout too short

Assets 2

06 Dec 15:02

wenmengzhou

v1.0.2

40ba97d

v1.0 release

Our first official version is released at November 1.

Using one line code for inference using pipeline interface
Using less than 10 lines of codes for finetuning using trainer
Provide models covering NLP, CV, MultiModal, Audio and Science
Provide up to 300 on-the-shelf models for convinient use

Assets 2

Releases: modelscope/modelscope

v1.4.1 release

中文版本

新模型推荐

高亮功能

功能列表

功能提升

BugFix

English Version

New Model List and Quick Access

Highlight

Breaking changes

Feature

Improvements

BugFix

v1.3.2 release

中文版本

新模型列表及快捷访问

Feature

不兼容行问题

问题修复：

English Version

New Model List and Quick Access

Features

Break changes

Bug Fixes:

v1.3.0 release

中文版本

模型功能特性说明

新模型列表及快捷访问

最佳实践教程

English Version

Highlight

v1.2.1 release

中文说明

English

v1.2.0 release

中文版本

模型功能特性说明

非兼容性修改

新模型列表及快捷访问

English Version

Highlight

Breaking changes

Feature

v1.1 release

Highlight

Feature

Improvements

BugFix

v1.0 release