'tuple' object has no attribute 'loss' #5

skye95git · 2021-09-15T06:06:22Z

Hi, I want to run CodeT5-base on code generation task. I run the command:
python run_exp.py --model_tag codet5_base --task concode --sub_task none

There is an error: 'tuple' object has no attribute 'loss'.

I try to change
outputs = model(input_ids=source_ids, attention_mask=source_mask, labels=target_ids, decoder_attention_mask=target_mask)
to
outputs, _ = model(input_ids=source_ids, attention_mask=source_mask, labels=target_ids, decoder_attention_mask=target_mask)

There is an error: too many values to unpack (expected 2)

What should I do?

The text was updated successfully, but these errors were encountered:

yuewang-sf · 2021-09-16T02:15:37Z

Hi @skye95git, I could not reproduce your issue. You can check whether the model is a T5ForConditionalGeneration object or directly print the outputs. Also, make sure you download the correct version of transformers (>= 4.6.1).

skye95git · 2021-09-16T02:42:24Z

Hi @skye95git, I could not reproduce your issue. You can check whether the model is a T5ForConditionalGeneration object or directly print the outputs. Also, make sure you download the correct version of transformers (>= 4.6.1).

Thanks for your reply! After I update the transformers , it worked.
I want to pre-train a model using my data. Do you plan to share the pre-training code?

skye95git · 2021-09-16T07:49:08Z

Hi, I want to experience the generated code of Codet5. How to use the model to generate code after fine-tuning?

yuewang-sf · 2021-09-16T08:48:09Z

Hi @skye95git, I could not reproduce your issue. You can check whether the model is a T5ForConditionalGeneration object or directly print the outputs. Also, make sure you download the correct version of transformers (>= 4.6.1).

Thanks for your reply! After I update the transformers , it worked.
I want to pre-train a model using my data. Do you plan to share the pre-training code?

We currently do not have a plan to release the pre-training code, which should not be difficult to implement based on the paper. We are also happy to take questions regarding its implementation.

yuewang-sf · 2021-09-16T08:52:53Z

Hi, I want to experience the generated code of Codet5. How to use the model to generate code after fine-tuning?

You can refer to the following function for this:

CodeT5/run_gen.py

Line 84 in 100c7e5

    
           def eval_bleu_epoch(args, eval_data, eval_examples, model, tokenizer, split_tag, criteria):

skye95git · 2021-09-17T02:48:02Z

Hi, I want to experience the generated code of Codet5. How to use the model to generate code after fine-tuning?

You can refer to the following function for this:

CodeT5/run_gen.py

Line 84 in 100c7e5

def eval_bleu_epoch(args, eval_data, eval_examples, model, tokenizer, split_tag, criteria):

Thanks for your reply! When I fine-tuning the model, I meet an error:

There are many similar absolute paths in the repository. For example,
In models.py

In calc_code_bleu.py

In dataflow_match.py

In syntax_match.py

It would be nice to be reminded in the readme that alternate paths are needed.

skye95git · 2021-09-23T02:07:10Z

Hi, I have finished fine-tune. The result in sh/results is:

Are the results in sh/results evaluated on concode's test set or dev set? If it is evaluated on concode's dev set, how to evaluate on concode's test set?

skye95git · 2021-09-23T03:43:10Z

I read the source code in run_gen.py. I find the result in sh/results is evaluated on concode's test set.

I want to see the prediction result. Are the generated results of the test dataset stored in the sh/saved_models/concode/codet5_base_all_lr10_bs32_src320_trg150_pat3_e30/prediction ?

What do test_*.gold, test_*.output and test_*.src in the folder stand for respectively?

Is input data stored in test_*.src? Is output data stored in test_*.output?

skye95git · 2021-09-23T06:19:28Z

Hi, I want to experience the generated code of Codet5. How to use the model to generate code after fine-tuning?

You can refer to the following function for this:

CodeT5/run_gen.py

Line 84 in 100c7e5

def eval_bleu_epoch(args, eval_data, eval_examples, model, tokenizer, split_tag, criteria):

Hi, the paper describes we additionally collect two datasets of C/CSharp from BigQuery. How do you parse C and CSharp downloaded from Bigquery to extract functions? I also want to parse the source code I've acquired and retrain the model. Is it convenient for you to share the parsed code?

yuewang-sf · 2021-09-23T08:45:29Z

I read the source code in run_gen.py. I find the result in sh/results is evaluated on concode's test set.

I want to see the prediction result. Are the generated results of the test dataset stored in the sh/saved_models/concode/codet5_base_all_lr10_bs32_src320_trg150_pat3_e30/prediction ?

What do test_*.gold, test_*.output and test_*.src in the folder stand for respectively?

Is input data stored in test_*.src? Is output data stored in test_*.output?

Hi @skye95git, yes. Your understanding is correct. The test_*.src is the source input, test_*.output is the model output, and test_*.gold is the ground-truth target output.

yuewang-sf · 2021-09-23T09:22:10Z

Hi, I want to experience the generated code of Codet5. How to use the model to generate code after fine-tuning?

You can refer to the following function for this:

CodeT5/run_gen.py

Line 84 in 100c7e5

def eval_bleu_epoch(args, eval_data, eval_examples, model, tokenizer, split_tag, criteria):

Hi, the paper describes we additionally collect two datasets of C/CSharp from BigQuery. How do you parse C and CSharp downloaded from Bigquery to extract functions? I also want to parse the source code I've acquired and retrain the model. Is it convenient for you to share the parsed code?

We parse it using the tree-sitter similar to the CodeSearchNet dataset. We will release this additional data (C/C#) soon.

skye95git · 2021-09-23T09:45:59Z

Hi, I want to experience the generated code of Codet5. How to use the model to generate code after fine-tuning?

You can refer to the following function for this:

CodeT5/run_gen.py

Line 84 in 100c7e5

def eval_bleu_epoch(args, eval_data, eval_examples, model, tokenizer, split_tag, criteria):

Hi, the paper describes we additionally collect two datasets of C/CSharp from BigQuery. How do you parse C and CSharp downloaded from Bigquery to extract functions? I also want to parse the source code I've acquired and retrain the model. Is it convenient for you to share the parsed code?

We parse it using the tree-sitter similar to the CodeSearchNet dataset. We will release this additional data (C/C#) soon.

That's cool. In addition to the additional data (C/C#) you will release, I want to parse the source code of C and C# that we obtained. Is it convenient for you to share the parsed code for C and C#?

There is a fork of the awesome function_parser library from Github's CodeSearchNet Challenge repo. Currently, it supports 6 languages: Python, Java, Go, Php, Ruby, and Javascript. But it doesn't support C and C#. I tried to use the tree-sitter similar to the CodeSearchNet dataset to parse C and C#. Unfortunately, the effect isn't satisfactory.

I would be grateful if you could share share the C and C# parse codes, I plan to fork it and update the function_parser. It can help more people.

skye95git · 2021-09-24T08:33:03Z

Hi, what is the difference between concode_field_sep and concode_elem_sep in the NL field in the Concode dataset? Which one represents the variable? which one represents the function?

The description about CONCODE in CodeXGLUE is nl combines natural language description and class environment. Elements in class environment are seperated by special tokens like con_elem_sep and con_func_sep.

If concode_elem_sep refers to con_elem_sep, it represents the variable. The content in dev.json seems different.
The Line 269 in dev.json:

{
    "code": "int function ( double [ ] arg0 , double [ ] arg1 ) { int loc0 = arg0 . length - arg1 . length ; outer : for ( int loc1 = 0 ; loc1 <= loc0 ; loc1 ++ ) { for ( int loc2 = 0 ; loc2 < arg1 . length ; loc2 ++ ) { if ( ne ( arg0 [ loc1 + loc2 ] , arg1 [ loc2 ] ) ) { continue outer ; } } return ( loc1 ) ; } return ( - 1 ) ; }",
    "nl": "searches for the first subsequence of a that matches sub elementwise . elements of sub are considered to match elements of a if they pass the #eq test . concode_field_sep double max_ratio concode_elem_sep double min_ratio concode_elem_sep boolean off concode_field_sep boolean isElemMatch concode_elem_sep int compare concode_elem_sep boolean isSubset concode_elem_sep boolean ne concode_elem_sep boolean lt concode_elem_sep boolean gte concode_elem_sep void set_rel_diff concode_elem_sep boolean eq concode_elem_sep boolean lte concode_elem_sep boolean gt"
}

The env in nl is

concode_field_sep double max_ratio 
concode_elem_sep double min_ratio 
concode_elem_sep boolean off 
concode_field_sep boolean isElemMatch 
concode_elem_sep int compare 
concode_elem_sep boolean isSubset 
concode_elem_sep boolean ne 
concode_elem_sep boolean lt 
concode_elem_sep boolean gte 
concode_elem_sep void set_rel_diff 
concode_elem_sep boolean eq 
concode_elem_sep boolean lte 
concode_elem_sep boolean gt

The code is

int function(double[] arg0, double[] arg1) {
    int loc0 = arg0 . length - arg1 . length
    outer: for (int loc1=0 loc1 <= loc0 loc1 + +) {
                for (int loc2=0 loc2 < arg1 . length loc2 + +) {
                    if (ne(arg0[loc1 + loc2], arg1[loc2])) {
                        continue outer
                    }
                } 
                return (loc1)
            } 
    return (- 1)
}

The ne() in the code field is a function, not a variable. But But it's concode_elem_sep in env. So I'm a little confused.

skye95git · 2021-10-09T09:53:08Z

Hi @skye95git, I could not reproduce your issue. You can check whether the model is a T5ForConditionalGeneration object or directly print the outputs. Also, make sure you download the correct version of transformers (>= 4.6.1).

Thanks for your reply! After I update the transformers , it worked.
I want to pre-train a model using my data. Do you plan to share the pre-training code?

We currently do not have a plan to release the pre-training code, which should not be difficult to implement based on the paper. We are also happy to take questions regarding its implementation.

Hi, I try to implement pre-training code. I have a couple of questions about the pre-training data:

The paper describes you employ CodeSearchNet as pre-training data. Do I need to preprocess CodeSearchNet before pretraining? If necessary, how should it be preprocessed?
The statistical data in table 1 are different from CodeSearchNet to some extent.
In CodeT5 table1:

In CodeSearchNet table1:

CodeT5 uses less pre-training data than the original Codesearchnet data. Did you do data cleansing before pre-training?

skye95git · 2021-10-12T02:14:22Z

The data set used to fine-tune the code generation task is concode, which contains only the Java corpus. So can CodeT5 only generate Java code, or can all eight of the code used for pre-training be generated? If so, does that mean CodeT5 can generate code directly without fine-tuning it?

yuewang-cuhk closed this as completed Oct 5, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

'tuple' object has no attribute 'loss' #5

'tuple' object has no attribute 'loss' #5

skye95git commented Sep 15, 2021

yuewang-sf commented Sep 16, 2021 •

edited

skye95git commented Sep 16, 2021

skye95git commented Sep 16, 2021

yuewang-sf commented Sep 16, 2021

yuewang-sf commented Sep 16, 2021

skye95git commented Sep 17, 2021

skye95git commented Sep 23, 2021 •

edited

skye95git commented Sep 23, 2021 •

edited

skye95git commented Sep 23, 2021

yuewang-sf commented Sep 23, 2021

yuewang-sf commented Sep 23, 2021

skye95git commented Sep 23, 2021

skye95git commented Sep 24, 2021 •

edited

skye95git commented Oct 9, 2021

skye95git commented Oct 12, 2021

'tuple' object has no attribute 'loss' #5

'tuple' object has no attribute 'loss' #5

Comments

skye95git commented Sep 15, 2021

yuewang-sf commented Sep 16, 2021 • edited

skye95git commented Sep 16, 2021

skye95git commented Sep 16, 2021

yuewang-sf commented Sep 16, 2021

yuewang-sf commented Sep 16, 2021

skye95git commented Sep 17, 2021

skye95git commented Sep 23, 2021 • edited

skye95git commented Sep 23, 2021 • edited

skye95git commented Sep 23, 2021

yuewang-sf commented Sep 23, 2021

yuewang-sf commented Sep 23, 2021

skye95git commented Sep 23, 2021

skye95git commented Sep 24, 2021 • edited

skye95git commented Oct 9, 2021

skye95git commented Oct 12, 2021

yuewang-sf commented Sep 16, 2021 •

edited

skye95git commented Sep 23, 2021 •

edited

skye95git commented Sep 23, 2021 •

edited

skye95git commented Sep 24, 2021 •

edited