Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NNFusion Backlog #41

Open
11 of 34 tasks
wenxcs opened this issue Sep 24, 2020 · 0 comments
Open
11 of 34 tasks

NNFusion Backlog #41

wenxcs opened this issue Sep 24, 2020 · 0 comments
Assignees
Labels

Comments

@wenxcs
Copy link
Member

wenxcs commented Sep 24, 2020

This is the backlog of NNFusion, which is to track issues that are not yet planned but consider as future candidate items.

Our current or upcoming release is tracking in #194 .

Our release procedures are listed in #195.

NNFusion users are highly encouraged to comment and suggest on the priority, preference and needs for the work items. Please feel free to share your ideas with us or contribute to NNFusion.

Backlog (no rank)

Typo: Module | Module Owner

Mechanism

  • Custom op support
  • support training | @mzmssg
    • Support learning rate scheduling
    • Support external optimizer
    • Freeze some layers for fine tuning
    • Support gradient stop
  • support low-precision & mixed-precision | @Niupple
    • Fp16 specific kernels
    • wait until Antares is integrated into NNFusion, by which nothing specific should be done
    • manually generate & inject FP16 specific kernels with Antares into Kernel DB.
    • Modify the data type in generated IR
  • auto kernel tuner integration | @jlxue
    • Kernel DB
    • Add kernelEmitters (CPU/CUDA/ROCm) to parse and emit Antares kernels
    • Modify kernel selection pass accordingly
  • offline inference(PAI) | @wenxcs
    • Reduce padding(bytedance/effective_transformer)
    • Docker image for BERT offline inference
    • Batch backet inference
    • Offline inference wrapper
  • parallel training support (via SuperScaler) | @lynex
    • v0.2 new datatype support

Refactor/Improvement

  • detect unsupported model & ops | @mzmssg
    • Others(unsupported op attr etc.)
  • support block-fusion as default | @xysmlx
    • Define and implement the interfaces between BlockFusion and kernel tuner
    • End-to-end test with kernel tuner enabled
    • automatic active block check [ENHANCEMENT] Active block check in -fblockfusion_level=2 #50
    • Tune-efficient policy in kernel tuner
    • Refactor BlockCudaEmitter
    • Advanced scheduling policy with kernel tuner
  • support reduce-fusion | @xiayuqing0622
    • add reduce fusion pass
    • optimize schedular
    • test performance
    • transport the code to github
  • sub-graph substitution | @wenxcs
    • Graph match feature, by FSM:
    • Replacing current Pattern Match;
    • Graph Re-writer Tool;
    • Antares Fusion
  • code refactor | @wenxcs
    • Move Operator define to opdefine_v2
    • Robust validation pipeline

Frontend & Backend support

  • support CPU | @guoshzhao
    • Update azure mirror download urls for thirdparty package
  • support HLSL | Pending
  • model support(training models & more inference models) | Pending

Common Tools

  • python interface | @mzmssg
    • Share const across multi nnf_rt
    • Install by pip

Documentation

@wenxcs wenxcs added the enhancement New feature or request label Sep 24, 2020
@microsoft microsoft deleted a comment from xiayuqing0622 Sep 24, 2020
@wenxcs wenxcs pinned this issue Sep 24, 2020
@wenxcs wenxcs changed the title First iteration Planning NNFusion Backlog Sep 24, 2020
@AlisaChen98 AlisaChen98 added backlog and removed enhancement New feature or request labels Nov 5, 2020
@AlisaChen98 AlisaChen98 self-assigned this Nov 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants