This is the official implementation of the paper "[InstructSeq: Unifying Vision Tasks with Instruction-conditioned Multi-modal Sequence Generation]".
Authors: Rongyao Fang, Shilin Yan, Zhaoyang Huang, Jingqiu Zhou, Hao Tian, Jifeng Dai, Hongsheng Li
The codes and model checkpoints will be released very soon!