Releases: modelscope/modelscope
v1.4.1 release
中文版本
新模型推荐
序号 | 模型名称&快捷链接 | 贡献组织 | 是否支持finetune |
---|---|---|---|
1 | ChatGLM-中英对话大模型-6B | 智谱.AI | |
2 | GLM130B-中英大模型 | 智谱.AI | |
3 | unidiffuser-v1 | 清华TSAIL | |
4 | 元语功能型对话大模型v2 | 元语智能 | |
5 | 盘古α 2.6B | 鹏城实验室 | |
6 | openjourney | 个人开发者-dienstag | |
7 | Rwkv-4-pile-14b | 个人开发者-Blink_DL | |
8 | SiameseUIE通用信息抽取-中文-base | ||
9 | SiameseUniNLU零样本通用自然语言理解-中文-base · 模型库 (modelscope.cn) |
高亮功能
- 外部repo可以以插件形式和modelscope库协同工作
- 增加SCRFD模型的onnx导出
- 增加damoyolo模型的onnx导出
- 支持序列标注模型的onnx/torchscript导出
- 支持cartoon模型的pb文件导出
- 重构taskdataset模块,用户现在可定制自己的数据集逻辑了
- 增加text-generation任务的examples,同样适用于GPT3
- Siamese uie模型支持finetune
功能列表
- 推理和训练中支持torch2.0 compile,注意因为测试尚不充分因此有些模型可能遇到错误
- Add adadet库的trainer支持
- ddcolor image colorization支持训练
- 增加video_instance_segmentation推理能力
- 增加CLI工具的插件能力
- 增加human reconstruction任务
- 增加vidt模型
- 增加speech_timestamp任务
- 增加disco guided diffusion模型
- ocr_reco_crnn支持训练
- 增加action detection的训练
- 增加ocr_detection_db的训练
- 增加 lore lineness table recognition任务
- 增加PEER模型
- 增加damoyolo的烟雾探测模型
- 增加RLEG模型
- 增加用于视频实例追踪的ProContEXT模型
- 增加视频感知模型longshortnet
- 增加dingding去噪模型
- 支持vision efficient tuning
- 支持text-to-video-synthesis任务
功能提升
- text-generation推理支持args输入
- 在video temporal grounding中支持soonet
- trainer支持DDPHook
- Kws支持继续训练能力
- 支持GPU上的正确DDIM采样能力
- 增加更多的CLI工具
- 修改语音推理的输入输出
- 优化kws的配置
- ImagePaintbyexamplePipeline支持demoservice
- 支持easycv trainer的load_from
BugFix
- 修复安装detecron2的报错
- 修复generate_scp_from_url方法的报错
- 修复speaker_verification_pipeline和speaker_diarization_pipeline
- 修复data releate case失败的问题
- 修复ast扫描失败的问题
- 修复Word alignment预处理器的bug
English Version
New Model List and Quick Access
No | Model Name & Link | Org | Finetune supported |
---|---|---|---|
1 | ChatGLM-English&Chinese-6B | ZhiPu.AI | |
2 | GLM130B-LLM English&Chinese | ZhiPu.AI | |
3 | unidiffuser-v1 | TsingHua TSAIL | |
4 | ChatYuan-large-v2 | YuanYu | |
5 | OpenICommunity/pangu_2_6B | PengCheng Lab | |
6 | openjourney | personal-dienstag | |
7 | Rwkv-4-pile-14b | personal-Blink_DL | |
8 | SiameseUIE information extraction-Chinese-base | ||
9 | SiameseUniNLU zero-shot NLU Chinese base model |
Highlight
- Support repos work with modelscope library via plugin
- Support onnx export for SCRFD model
- Add onnx exporter for damoyolo
- Add onnx/torchscript exporter for token classification models
- Add frozen graph def exporter for cartoon model
- Refactor taskdataset module, user now can write datasets with custom logics
- Add example for text-generation finetuning, also available for GPT3
- Siamese uie finetune support
Breaking changes
Feature
- Support torch2.0 compile in inference and training, this feature is not stable on all models
- Add ADADET && thirdparty arg for damoyolo trainer
- Add finetune for ddcolor image colorization
- Add video_instance_segmentation pipeline
- Add plugin with cli tool
- Add human reconstruction task
- Add vidt model
- Add task: speech_timestamp
- Add disco guided diffusion
- Add training support for ocr_reco_crnn
- Add action detection finetune
- Add ocr_detection_db training module
- Add lore lineness table recognition
- Add PEER model
- Add smoke and fire detection model using damoyolo
- Add generative multimodal embedding model RLEG
- Add vop_se for text video retrival
- Add ProContEXT model for video single object tracking
- Add video streaming perception models longshortnet
- Add dingding denoise model
- Support vision efficient tuning finetune
- Add text-to-video-synthesis
- Add MAN for image-quality-assessment
Improvements
-Support run text generation pipeline with args
- Add soonet for video temporal grounding
- Trainer support parallel_groups setting and DDP hook
- Kws support continue training from a checkpoint
- Correct DDIM sampling on GPU
- Add more cli tools
- Modify audio input types && punc postprocess
- Optimize kws pipeline and training conf
- Support ImagePaintbyexamplePipeline demo service
- Support load_from for easycv trainer
BugFix
- Fix bug for install detecron2
- Fix bug for modify function generate_scp_from_url
- Fix bug for speaker_verification_pipeline and speaker_diarization_pipeline: re-write the default config with configure.json
- Fix bug for data releate case failed bug
- Fix bug for ast scan funcitondef
- Word alignment preprocessor fix
v1.3.2 release
中文版本
新模型列表及快捷访问
该小版本共新增上架6个模型,其中新增2个模型支持finetune能力。
序号 | 模型名称&链接 | 支持finetune |
---|---|---|
1 | ControlNet可控图像生成 | |
2 | 兰丁宫颈细胞AI辅助诊断模型 | |
3 | 读光-文字检测-DB行检测模型-中英-通用领域 | |
4 | SOND说话人日志-中文-alimeeting-16k-离线-pytorch | |
5 | NeRF快速三维重建模型 | √ |
6 | DCT-Net人像卡通化 | √ |
Feature
- GPT3 Finetune功能完善,支持DDP+tensor parallel, finetune流程串接推理流程优化
- checkpoint保存逻辑优化,确保周期性保存和最优保存的文件可以直接用于推理
- Hooks方案重构,解耦各个功能hook,支持hooks间交互
- 支持ImagePaintbyExamplePipeline demo service
- 支持多种音频类型
- 支持Petr3D CPU推理支持兼容新版mmcv
- deberta v2 预处理器更新
- 支持NLP下游任务模型初始化仅加载backbone预训练权重
- 更新librosa.resample()参数支持最新版本
- 添加下游工具箱调用埋点统计功能
不兼容行问题
- checkpoint保存分拆了模型参数和训练状态参数,老版本的模型参数需要转换后加载
问题修复:
- 修复asr vad/lm/punc输入处理
- 修复gpt moe finetune checkpoint path error
- 修复args lm_train_conf is invalid
- 修复删除已有文件ci测试报错
- 修复OCR识别bug
- 移除preprocessing stage中图像分辨率的限制
- 修复输出wav文件是32-bit float而不是预期的16-bit int
- 设置num_workers=0,以防止在demo-service中创建子进程
English Version
New Model List and Quick Access
This minor version adds a total of six new models, including two models with finetuning capability.
Features
- GPT-3 finetune has been improved to support DDP+tensor parallel
- Checkpoint saving logic has been optimized to ensure that files saved periodically and those saved as the best can be used directly by pipeline
- The Hooks scheme has been refactored to decouple various functional hooks and support interaction between hooks.
- Supports ImagePaintbyExamplePipeline demo service
- Supports multi-machine data and tensor parallel finetuning for cartoon task
- Supports various audio types
- Supports Petr3D CPU inference with compatibility for the latest version of mmcv
- Updates deberta v2 preprocessor
- Supports initialization of downstream NLP task models with only backbone pre-training weights loaded
- Updates librosa.resample() parameter support to the latest version
- Adds downstream toolbox call tracking function
Break changes
- Saving model parameters and training state seperately, so previous trained checkpoints should be converted before resume training
Bug Fixes:
- Fixes asr vad/lm/punc input processing
- Fixes gpt moe finetune checkpoint path error
- Fixes args lm_train_conf is invalid
- Fixes ci test errors when deleting existing files
- Fixes OCR recognition bugs
- Removes image resolution restrictions in preprocessing stage
- Fixes output wav file being 32-bit float instead of expected 16-bit int
- Sets num_workers=0 to prevent creating sub-processes in demo-service.
v1.3.0 release
中文版本
该版本共新增上架51个模型,其中11个模型支持finetune能力。
模型功能特性说明
-
提供finetune的示例脚本,允许用户通过运行脚本命令行传参方式进行模型训练,详细可以参考github脚本
-
NLP领域新增了backbone + head的开发支持,允许用户任意组合已有的backbone(Encoder) 和任务head,方便在特定任务上切换不同模型进行建模,详细参考文档
-
贡献者文档完善模型贡献部分,详细参考接入流程概览
-
数据集接口支持本地文件直接加载 MsDataset.load('/to/path/abc.csv')
-
模型导出支持nlp_structbert_zero-shot、 nlp_csanmt_translation系列模型
更多SDK功能和变更可查看:https://github.com/modelscope/modelscope/releases/tag/v1.3.0
新模型列表及快捷访问
最佳实践教程
最后,我们还推出许多任务级别和模型级别的最佳实践教程文档,旨在帮助开发者更好地理解和应用模型。
欢迎关注我们的开源社区:https://github.com/modelscope/modelscope
English Version
Highlight
- Add vqa-degradation
- Add content check pipeline
- Add pipelines for en2zh-imt and zh2en-imt
- Add single and multiple human parsing models
- Add AdaInt model
- Add open vocabulary detection
- Support finetune for sentence-embedding
- Add bad image detection model and pipeline
- Support translation model exporting
- Add asr dataset for finetune
- Add ocr detection model and pipeline
- Add face quality assessment model
- Add video deinterlace model
- Add language model for audio task
- Add deeplpf for image color enhance and image debanding model
- Add ecbsr model for mobile image super-resolution
- Add msrresnetlite model for video super-resolution
- Support finetune and evaluation for image-fewshot-detection-defrcn
- Add yolopv2 model cv_yolopv2_image_driving_perception
- Add face liveness xc model
- Add paint-by-example model
- Add universal_matting pipeline
- Add multi-modal_gridvlp_classification_chinese-base-ecom-cate
- Add DINO detection with easycv
- Add speech speaker verification pipeline
- Add nerf-recon model
- Support finetune for real-time object detection with easycv
- Add single-camera depth estimation bts model
- Add MGIMN model
- Add fuse-in-decoder dialogue task
- Add vision_efficient_tuning models
- Add traffic-sign detection
- Add object_detection3d_depe model
- Add stable diffusion model for image inpainting
- Add head&phone detection models
- Add face_reconstruction model
- Add structured model probing pipeline for image classification
- Add video panorama segmentation with VideoKNet-SwinB
- Add image quality assessment mos(mean option score) model
- Add ddpm-segmentation pipeline
- Add plug mental model
- Add video-colorization pipeline
- Add image demoireing
- Add face recognition ir model
- Support batch inference for nlp_csanmt_translation_en2zh
- Add ima...
v1.2.1 release
中文说明
- 语音领域依赖拆分为子领域,减少依赖安装
- 语音唤醒增加返回中文配置支持
- funasr版本升级 & 语音识别、说话人确认、标点预测增加额外参数配置
- 移除基础框架对torchaudio的依赖
English
- separate audio requirements
- kws pipeline returns Chinese charactor by configuration
- add args for asr_infer_pipeline, punc_pipeline, sv_pipeline & modify funasr version
- re-place
import torchaudio
to avoid unnecessary requirements in framework
v1.2.0 release
中文版本
该版本共新增上架38个模型,其中14个模型支持finetune能力。
模型功能特性说明
-
高性能检测热门应用系列, 基于精度和速度均超越当前经典YOLO系列、面向工业落地的高性能检测框架DAMOYOLO,新增实时口罩检测模型、实时安全帽检测模型、实时人体检测模型、实时香烟检测模型上线,提供开箱即用的高效体验
-
语音识别、语音合成以及语音唤醒可以基于Modelscope Python SDK进行模型finetune
-
语音合成,新增方言模型四川话、广东粤语与上海话,新增俄语与韩语外语模型
-
SambertHifigan语音合成-四川话-通用领域-16k-发音人chuangirl, 方言四川话女声模型
-
SambertHifigan语音合成-广东粤语-通用领域-16k-发音人jiajia, 方言广东话女声模型
-
SambertHifigan语音合成-上海话-通用领域-16k-发音人xiaoda, 方言上海话女声模型
-
SambertHifigan语音合成-俄语-通用领域-16k-发音人masha, 俄语女声模型
-
SambertHifigan语音合成-韩语-通用领域-16k-发音人kyong, 韩语女声模型
-
-
语音文件后处理
- 新增英语、德语、菲律宾语、韩语、越南语、日语、俄语、印尼语、葡萄牙语、法语、西班牙等11中语言的文本规整模型
-
图像人脸融合
-
自动进行人脸区域提取&对齐,并完成面部特征提取,无需额外预处理。
-
引入3D重建网络对脸型进行拟合迁移,使得融合后的脸型相似度更高。
-
-
人脸人体
- GPEN人像增强修复-大分辨率人脸,基于GPEN框架,收集超大分辨率人脸数据训练的1024和2048模型。
-
视觉编辑
-
DDColor图像上色,相比Deoldify等之前方法在色彩丰富度和语义贴合上大幅提升。
-
VFI-RAFT视频插帧,和其它SOTA模型相比,在大运动和重复纹理场景下有较好的插帧效果。
-
DUT-RAFT视频稳像,对多种视频抖动都有稳定的去抖效果,相比原生DUT,能够更好地保持视频清晰度。
-
-
底层视觉
- RealBasicVSR视频超分辨率,对于大部分真实场景的视频超分辨率效果良好,对于小部分降质十分严重的情况可能表现不佳。
非兼容性修改
- 文图生成任务输出类型改为多图输出
- 语音合成任务输出数据从output_pcm改为output_wav
新模型列表及快捷访问
English Version
Highlight
- Add finetune support for DAMO-YOLO
- Add new real-time mask detection model, real-time helmet detection model, real-time human body detection model, real-time cigarette detection model
- Add finetune support asr, tts and kws model
- Batch inference support for nlp and ofa based multi-modal tasks
- Add high-resolution gpen model for face restoration
- Add DDColor model for image colorization
- Add VFI-RAFT model for video frame interpolation
- Add DUT-RAFT model for video stabilization
- Add RealBasicVSR model for video-super-resolution
Breaking changes
- change output of task text to image to list of images
- change output of task tts from output_pcm to output_wav
Feature
- Add easyrobust-models for image classification
- Video depth estimation support cpu mode
- asr pipeline add output_dir parame
- Add RTS face recognition ood model
- Add image-defrcn-fewshot-detection
- Add hires gpen model
- Add mgeo finetune and pipeline
- Add asr finetune & change inference
- Add quadtree image matching pipeline
- Add finetune for DAMO-YOLO
- Add FLRGB Face Liveness RGB Model
- Add speech separation finetune
- Asr inference: support new models, punctuation, vad, sv
- Add vop retrieval
- Add NAFNet Image Deblurring pipeline and finetune support
- Add megatron bert
- Add panovit-layout-estimation-pipeline
- Add vision middleware
- Add panorama_depth_estimation
- Unify token classfication model output
- Faq support finetune and multilingual
- Support stable diffusion and add DAMO chinese stable diffusion model
- Add cv-bnext-image-classification-pipeline
- Add VFI-RAFT model for video frame interpolation
- Add face changing pipeline
- Add DUT-RAFT model for video stabilization
- Update token_cls default sequence_length: 128 -> 512
- Add structure tasks for ofa: sudoku & text2sql
- Add new ASR model speech_UniASR_asr_2pass-pt-16k-common-vocab1617-tensorflow1-offline and speech_UniASR_asr_2pass-pt-16k-com
- Add model for multiple object tracking in video
- Add ConvNeXt model
- Add ppl metric
- Add image colorization
- Add User...
v1.1 release
Highlight
- Add Wenet model #35
- Add code generation and code translation from ZHIPU #33
- Hub support retry and continue-download after error
- Add five finetune tasks for ofa
- Add damoyolo-t and damoyolo-m
- Add dpm-solver for diffusion models
- Add image depth estimation pipeline
- Add en-zh en-es es-en base translation models
- Add GPT-3 tensor parallel finetuning
Feature
- Add five finetune tasks for ofa
- Add synonym for table question answering
- Add jupyter lab plugin in docker
- Add language_guided_video_summarization pipeline
- Add nlp/addr/structure and update token classificaiton related method
- Add damoyolo-t and damoyolo-m
- Add Wenet model #35
- Add code generation and code translation from ZHIPU #33
- Add camouflaged-detection
- Support batch inference in pipeline for some models
- Add table recognition task
- Add dpm-solver for diffusion models
- Ofa add asr task
- Add features for alimeeting competition dataset
- Add funasr based asr inference
- Add extractive-summarization and topic-segmentation
- Add image depth estimation pipeline
- Add en-zh en-es es-en base translation models
- Add gpt-moe model and pipeline
- Action-detection model predownload video before inference
- Add finetune for cv/language_guided_video_summarization
- Add plug finetune and pretrained model
- Support license plate detection
- Add nextvit-small_image-classification_Dailylife-labels model
- Add support for UniTE
- Add video human matting task
- Add LSTMCRFForWordSegmentation
- Add face mask model
- support new asr paraformer and conformer model
- Add GPT-3 tensor parallel finetuning
- Update image-portait-enhancement trainer
- Add FairFace face attribute model
- Add facial landmark confidence model
Improvements
- Hub support retry and continue-download after error
- Refactor NLP and fix some user feedbacks
- Speed up the ast indexing during editing
- Add tensorboard hook for visualization
- reduce the GPU usage of dialog trianer
- substitute face detection model in skin_retouching_pipeline.py
- update git-lfs install instruction
BugFix
- fix output video path when person detect failed For 3d_body_keypoints
- Fix lazy importing problem in text classification pipeline
- Fix bug for distributed inference of gpt3
- Fix bug for mplug evaluation
- token preprocess bug fix
- fix file encoding problem in windows
- fix deadlock when setting the thread number up to 90 for kws model
- fix bug in token classification postprecessor
- fix: torch.concat compatibility with torch1.8
- fix log print and extensions issue for datasets==2.5.2
- fix interpolate value error for vitadapter semantic segmentation
- nlp csanmt translation fix finetuning bug
- Fix a bug that the logging file cannot save the correct lr, which is zero instead
- fix bug of tableQA on gpu
- Fix bug for text generation task model
- fix download file timeout too short
v1.0 release
Our first official version is released at November 1.
- Using one line code for inference using pipeline interface
- Using less than 10 lines of codes for finetuning using trainer
- Provide models covering NLP, CV, MultiModal, Audio and Science
- Provide up to 300 on-the-shelf models for convinient use