-
Notifications
You must be signed in to change notification settings - Fork 455
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LongCoder encoder #276
Comments
It's hard to use LongCoder encoder part for obtaining **embeddings only ** for long source code snippets, because I modify the code that can only supports decoder-only mode. If you need, I can provide you the script code to convert unixcoder model to longformer model, so that you can use longformer model that are initialized from unixcoder to handle longer code snippets. |
Thank you! this is indeed helpful :) |
After conversion, you can directly use the longformer without additional pre-training. However, it needs to fine-tune on downstream tasks. |
@guoday This is what I tried: from transformers import LongformerConfig, RobertaTokenizer, pipeline
from models.longcoder import LongcoderModel
config = LongformerConfig.from_pretrained('/path-to-models/longformer-unixcoder')
tokenizer = RobertaTokenizer.from_pretrained('/path-to-models/longformer-unixcoder')
longcoder = LongcoderModel.from_pretrained('/path-to-models/longformer-unixcoder',config=config)
embedding = pipeline('feature-extraction', model=longcoder, tokenizer=tokenizer)
func = ("def f(a,b): if a>b: return a else return b")
embedding(func) Then I get the following error:
If I don't use the tokens=tokenizer.tokenize("return maximum value")
longcoder(tokens) I get this error:
|
Hey!
The LongCoder work is super impressive and important, thank you for that.
I was curious, is it possible to use LongCoder encoder part for obtaining **embeddings only ** for long (>2048 tokens) source code snippets ?
Currently I use UniXcoder for my research, but I need to handle longer code snippets, is it possible to use LongCoder for embeddings somehow?
The text was updated successfully, but these errors were encountered: