Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use TOGA on my own project/tests? #1

Open
jose opened this issue Apr 24, 2022 · 6 comments
Open

How to use TOGA on my own project/tests? #1

jose opened this issue Apr 24, 2022 · 6 comments

Comments

@jose
Copy link

jose commented Apr 24, 2022

Dear @elizabethdinella,

Let me say you've done a very nice and interesting work and, of course, thank for sharing this artifact with the community.

Quick question, how would one use TOGA on their own project / tests? If possible, a step-by-step on how to execute the many scripts would be great.

--
Best,
Jose

@elizabethdinella
Copy link
Collaborator

Hi Jose. Thank you for your kind words :)

To run TOGA on your project, you can use toga.py in the root directory.
TOGA takes 2 arguments: an input data file and a metadata file.

The input file should be a csv with (focal_method, test_prefix) pairs.
The metadata should specify (project,bug_num,test_name,exception_bug,assertion_bug,exception_lbl,assertion_lbl,assert_err)

Rather than actually executing the test and oracles, we check the "expected" oracle in the metadata against the predicted oracle and report an evaluation of our technique. If you would like to execute your tests with TOGA oracles, the outputs are saved in predicted_oracles.csv file upon running toga.py.

If you don't want to construct such an elaborate metadata file (we understand, it's formatted for Defects4J bugs), you can pass in a dummy metadata file. For example, copy over the one in data/evosuite_reaching_tests. The reported results from toga.py won't be correct, but the predicted_oracles.csv will be.

Hope this helps!

@jose
Copy link
Author

jose commented Apr 26, 2022

Yes, it does help, thanks.

Could you please describe each column and/or the values each column might take? Thanks in advance.

  • the input data file, i.e., focal_method and test_prefix.
  • the metadata file, i.e., project, bug_num, test_name, exception_bug, assertion_bug, exception_lbl, assertion_lbl, and assert_err.

Some columns are self explanatory, for instance project might be, e.g., Chart or Math, bug_num might be 7, and test_name represents the name of the test case, but others, for instance, exception_bug, assertion_bug, exception_lbl, assertion_lbl, assert_err are not that easy to understand. Also, what is the expected format of the focal_method column? Same question for the test_prefix column, is it a single string (perhaps without newlines) of the test case's source code?

@elizabethdinella
Copy link
Collaborator

Apologies for the confusion.

inputs file:

  • focal_method is a src of the method under test as a single string (newlines are ok).

  • test_prefix is the src of your test (newlines also ok).

meta file:

  • exception_bug 1/0 binary label. If you are testing exceptional behavior, entry should be 1.
  • assertion_bug 1/0 binary label. Should be opposite of exception_bug. If you are testing non-exceptional behavior (assertion) entry should be 1.
  • exception_lbl True/False. True if an exception is expected and False otherwise.
  • assertion_lbl the assertion written in your test src as a string. If your test has multiple assertions split into multiple rows.
  • assert_err this is the error printed when the assertion is violated. This is something we used for defects4j testing. You can leave this as an empty string for testing on your own project.

@bentodaniel
Copy link

Hello @elizabethdinella,
I have managed to run TOGA but I have a few questions.

  1. Regarding the input file, could we give TOGA no focal_method and let it try its best or is the focal method 100% necessary?
  2. As for the meta file, you mention a dummy could be used. Could a dummy be a file with just commas (i.e., no data in the file)?

After running TOGA, a series of files are generated assertion_preds.csv, assert_model_inputs.csv, exception_preds.csv, except_model_inputs.csv, oracle_preds.csv and results.csv. I imagine the first 4 are used by the model for inference and, therefore, are not something we really need to understand.

Could you describe what the other two files are for and what each collumn represents?

  • oracle_preds.csv file -> project, bug_num, test_name, test_prefix, except_pred, assert_pred
  • results.csv file -> , project, bug_num, test_name, exception_bug, assertion_bug, exception_lbl, assertion_lbl, assert_err, id, except_pred, except_correct, assert_pred, assert_correct, assert_bug_found, except_bug_found, expected_except_bug, unexpected_except_bug, bug_found, tp, fp, tn, fn

Thank you in advance,
Daniel

@elizabethdinella
Copy link
Collaborator

Hi Daniel,

  1. Although we evaluated using the focal method signature and docstring, a focal method input is not necessarily required. Feel free to input the empty string or anything else.
  2. If you populate the meta file with empty strings, the models should still be invoked correctly and the oracle_preds.csv will be populated. The toga.py script will crash before populating the results.csv file, but if you're only interested in seeing model outputs this should work fine :)

The oracle_preds.csv file shows the outputs of our model(s) for each input sample. The important columns are except_pred and assert_pred. If except_pred == 1, our system predicted that an exception is expected on the test_prefix. If an exception is expected, assert_pred will be empty. In the other case, where except_pred == 0, the assert_pred column will indicate the system's predicted assertion.

The results.csv file is a bit more complicated. It stores the information necessary to compute if our predictions were correct. The important columns are except_correct and assert_correct. If these are both true, the model predicted the test oracle correctly. Other columns that might be useful are tp, fp, tn, and fn. These columns contain binary entries (1/0) based on the outputs to the model and the entries exception_bug and assertion_bug in the metadata input file.

@qingshanyuluo
Copy link

Hello @elizabethdinella
when i just change this
image
then run toga.py

the result of oracle_preds.csv just change to this
image
and miss the assert_pred

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants