Replication package for paper submission. This repository contains an archive with IntelliJ IDEA plugin, installation and usage instructions.
- Overview
- Requirements
- OpenAI key configuration
- Plugin installation
- Usage example
- Prompt example
- Datasets
- Developer Survey
An IntelliJ IDEA plugin that recommends Extract Function refactoring based on the LLM's suggestions.
The plugin is available for free on JetBrains Marketplace.
- You need to install IntelliJ IDEA 2023.1 or higher.
- Configure OpenAI key.
- Sign up for OpenAI at https://beta.openai.com/signup.
- Get your OpenAI API key.
- Open IntelliJ IDEA, go to
Settings | Tools | Large Language Models
and enter your API key in theOpenAI Key
field.
Install the EM-Assist plugin from JetBrains Marketplace.
To use the plugin, you need to right-click on the method, select Show Context Actions
and click
on Extract Function experiment
.
The plugin will then send a request to OpenAI and await a response. Once received, it will present the suggestions in a
dialog box,
the corresponding code for extraction will be highlighted.
To choose one of the suggestions, just right-click on it, and the plugin will automatically apply extract function
refactoring.
Below you can find a sample prompt structure:
OpenAiChatMessage(
"system",
"""
You are a skilled software developer. You have immense knowledge on software refactoring.
You communicate with a remote server that sends you code of functions (one function in a message) that it wants to simplify by applying extract method refactoring.
In return, you send a JSON object with suggestions of helpful extract method refactorings. It is important for suggestions to not contain the entire function body.
Each suggestion consists of the start line, end line, and name for the extracted function.
The JSON should have the following format: [{"function_name": <new function name>, "line_start": <line start>, "line_end": <line end>}, ..., ].
""".trimIndent()
),
OpenAiChatMessage(
"user",
"""
280. public void connect(Figure figure) {
281. if (fObservedFigure != null)
282. fObservedFigure.removeFigureChangeListener(this);
283.
284. fObservedFigure = figure;
285. fLocator = new OffsetLocator(figure.connectedTextLocator(this));
286. fObservedFigure.addFigureChangeListener(this);
287. if (fLocator != null) {
288. Point p = fLocator.locate(fObservedFigure);
289. p.x -= size().width/2 + fOriginX;
290. p.y -= size().height/2 + fOriginY;
291.
292. if (p.x != 0 || p.y != 0) {
293. willChange();
294. basicMoveBy(p.x, p.y);
295. changed();
296. }
297. }
298. }
""".trimIndent()
),
OpenAiChatMessage(
"assistant",
"""
[
{"function_name": "updateLocator", "line_start": 288, "line_end": 296}
]
""".trimIndent()
),
OpenAiChatMessage(
"user",
"""
92. public void mouseUp(MouseEvent e, int x, int y) {
93. if (e.isPopupTrigger()) {
94. Figure figure = drawing().findFigure(e.getX(), e.getY());
95. if (figure != null) {
96. Object attribute = figure.getAttribute(Figure.POPUP_MENU);
97. if (attribute == null) {
98. figure = drawing().findFigureInside(e.getX(), e.getY());
99. }
100. if (figure != null) {
101. showPopupMenu(figure, e.getX(), e.getY(), e.getComponent());
102. }
103. }
104. }
105. else if (e.getClickCount() == 2) {
106. handleMouseDoubleClick(e, x, y);
107. }
108. else {
109. super.mouseUp(e, x, y);
110. handleMouseUp(e, x, y);
111. handleMouseClick(e, x, y);
112. }
113. }
""".trimIndent()
),
OpenAiChatMessage(
"assistant",
"""
[
{"function_name": "computeFigure", "line_start": 94, "line_end": 103},
{"function_name": "computeAttribute", "line_start": 96, "line_end": 102}
]
""".trimIndent()
),
OpenAiChatMessage(
"user",
codeSnippet
)
Here's an actual example of a prompt used:
2023-10-12 11:30:35,883 [2976602] INFO - #c.i.m.l.t.models - Sending request to OpenAI API with model=gpt-3.5-turbo and
messages=[OpenAiChatMessage(role=system, content=You are a skilled software developer. You have immense knowledge on software refactoring.
You communicate with a remote server that sends you code of functions (one function in a message) that it wants to simplify by applying extract method refactoring.
In return, you send a JSON object with suggestions of helpful extract method refactorings. It is important for suggestions to not contain the entire function body.
Each suggestion consists of the start line, end line, and name for the extracted function.
The JSON should have the following format: [{"function_name": <new function name>, "line_start": <line start>, "line_end": <line end>}, ..., ].),
OpenAiChatMessage(role=user, content=
280. public void connect(Figure figure) {
281. if (fObservedFigure != null)
282. fObservedFigure.removeFigureChangeListener(this);
283.
284. fObservedFigure = figure;
285. fLocator = new OffsetLocator(figure.connectedTextLocator(this));
286. fObservedFigure.addFigureChangeListener(this);
287. if (fLocator != null) {
288. Point p = fLocator.locate(fObservedFigure);
289. p.x -= size().width/2 + fOriginX;
290. p.y -= size().height/2 + fOriginY;
291.
292. if (p.x != 0 || p.y != 0) {
293. willChange();
294. basicMoveBy(p.x, p.y);
295. changed();
296. }
297. }
298. }),
OpenAiChatMessage(role=assistant, content=[
{"function_name": "updateLocator", "line_start": 288, "line_end": 296}
]), OpenAiChatMessage(role=user, content=
92. public void mouseUp(MouseEvent e, int x, int y) {
93. if (e.isPopupTrigger()) {
94. Figure figure = drawing().findFigure(e.getX(), e.getY());
95. if (figure != null) {
96. Object attribute = figure.getAttribute(Figure.POPUP_MENU);
97. if (attribute == null) {
98. figure = drawing().findFigureInside(e.getX(), e.getY());
99. }
100. if (figure != null) {
101. showPopupMenu(figure, e.getX(), e.getY(), e.getComponent());
102. }
103. }
104. }
105. else if (e.getClickCount() == 2) {
106. handleMouseDoubleClick(e, x, y);
107. }
108. else {
109. super.mouseUp(e, x, y);
110. handleMouseUp(e, x, y);
111. handleMouseClick(e, x, y);
112. }
113. }), OpenAiChatMessage(role=assistant, content=[
{"function_name": "computeFigure", "line_start": 94, "line_end": 103},
{"function_name": "computeAttribute", "line_start": 96, "line_end": 102}
]), OpenAiChatMessage(role=user, content=
63. static void writeJvmClass(JvmClass jvmClass, DataOutput out) throws IOException {
64. writeJVMClassNode(jvmClass, out);
65. out.writeUTF(jvmClass.getSuperFqName());
66. out.writeUTF(jvmClass.getOuterFqName());
67. // Write myInterfaces;
68. int interfacesCount = 0;
69. for (String myInterface : jvmClass.getInterfaces()) {
70. interfacesCount++;
71. }
72. DataInputOutputUtil.writeINT(out, interfacesCount);
73. for (String myInterface : jvmClass.getInterfaces()) {
74. out.writeUTF(myInterface);
75. }
76. // Write myFields
77. int fieldsCount = 0;
78. for (JvmField field : jvmClass.getFields()) fieldsCount++;
79. DataInputOutputUtil.writeINT(out, fieldsCount);
80. for (JvmField field : jvmClass.getFields()) {
81. writeJvmField(field, out);
82. }
83.
84. // Write myMethods
85. int methodCount = 0;
86. for (JvmMethod jvmMethod : jvmClass.getMethods()) methodCount++;
87. DataInputOutputUtil.writeINT(out, methodCount);
88. for (JvmMethod jvmMethod : jvmClass.getMethods()) {
89. writeJvmMethod(jvmMethod, out);
90. }
91.
92. // Write AnnotationTargets
93. int elemTypeCount = 0;
94. for (ElemType elemType : jvmClass.getAnnotationTargets()) elemTypeCount++;
95. DataInputOutputUtil.writeINT(out, elemTypeCount);
96. for (ElemType elemType : jvmClass.getAnnotationTargets()) {
97. writeElemType(elemType, out);
98. }
99.
100. if (jvmClass.getRetentionPolicy() != null) {
101. out.writeUTF(jvmClass.getRetentionPolicy().name());
102. }
103. else {
104. out.writeUTF("");
105. }
106. })]
2023-10-12 11:30:39,250 [2979969] INFO - #c.i.m.llm - Raw response:
{
"id": "chatcmpl-88ss0b30S6ZiMDQ24ImloFrYjVrEy",
"object": "chat.completion",
"created": 1697128236,
"model": "gpt-3.5-turbo-0613",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "[\n
{\"function_name\": \"writeInterfaces\", \"line_start\": 69, \"line_end\": 75},\n
{\"function_name\": \"writeFields\", \"line_start\": 78, \"line_end\": 82},\n
{\"function_name\": \"writeMethods\", \"line_start\": 86, \"line_end\": 90},\n
{\"function_name\": \"writeAnnotationTargets\", \"line_start\": 94, \"line_end\": 98}\n]"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 1116,
"completion_tokens": 99,
"total_tokens": 1215
}
}
Datasets are accessible via the provided link.
To validate our technique, we used the following datasets:
-
Community Corpus consists of 122 Java methods and their corresponding Extract Method refactorings collected from five open-source repositories: MyWebMart, SelfPlanner, WikiDev, JHotDraw, and JUnit. This dataset previously served as the foundation for evaluating various state-of-the-art Extract Method refactoring automation tools, including JExtract, JDeodorant, SEMI, GEMS, and REMS.
-
Extended Corpus: To enhance the robustness of our evaluation with a sizable oracle of actual refactorings performed by developers, we constructed Extended Corpus. To create it, we employed RefactoringMiner for detecting Extract Method. We ran it on 12 highly regarded open-source repositories: CoreNLP, infinispan, HtmlUnit, robovm, google/guava, mbassador, spring-boot, google/gson, smart-doc, bytes-java, apache/datasketches-java, javaparser. After filtering to remove refactoring commits that mixed feature additions (the one-liners and the extracted methods whose body overlapped a large proportion of the host method), we retain 1752 Extract Methods from these repositories.
The datasets included in this repository are represented in JSON format. We have used MongoDB to perform queries and navigate through the data.
Each record has an attribute named "oracle". For example:
{
...
"oracle": {
"line_start": 612,
"line_end": 630,
"url": "https://github.com/JetBrains/intellij-community/tree/405abc6878abe05f755a5a0a349a880139b9163e/plugins/maven/src/test/java/org/jetbrains/idea/maven/dom/MavenFilteredPropertiesCompletionAndResolutionTest.java#L612-L630"
},
...
}
The "oracle" represents the Extract Method refactoring that was performed by the developer. This oracle is used throughout the evaluation process.
Our study involved querying multiple LLMs with various temperatures. This LLM response data was saved and further used for answering RQ1, RQ2, and RQ3. This data can be found in the attribute "response_extracted". For example:
{
...
"llm_multishot_data": {
"temperature_<temperature_value>": [
{
"response_extracted": "[{\"function_name\": \"extractedMethodName\", \"line_start\": 442, \"line_end\": 443}\n]",
"shot_no": 0
}
],
...
}
}
...
]
}
}
Suggestions coming from LLMs are transformed into candidates by our tool. The data used to answer RQ3: How effective is EM-Assist in providing refactoring recommendations over existing approaches? we used the data stored in the "jetgpt_ranking_samples" property as shown in the example below. There are 30 objects corresponding to 30 random samples of the LLM data. We report averages over these samples.
{
...
"jetgpt_ranking_samples": [
{
"<HEURISTICS_KEY>": {
"temperature_<temperature_value>": {
"rank_by_popularity_times_heat": [
{
"candidate_type": "AS_IS",
"application_result": "OK",
"line_start": 612,
"line_end": 630,
...
}
],
...
}
}
},
...
],
...
}
The data used to answer RQ1: How effective are LLMs at generating refactoring suggestions?, can be found in the "suggestion_evaluation" JSON attribute:
{
...
"suggestion_evaluation": {
"llm_multishot_data": {
"temperature_<temperature_value>": [
{
"candidate_type": "AS_IS",
"application_result": "OK",
"application_reason": "",
...
}
]
}
},
...
}
To further strengthen the validity of our results, we applied EM-Assist on the Extended Corpus that includes 1752 actual refactorings from open-source projects. We applied the previous best in class static analysis tool J-Extract, to the same dataset. The raw data for J-Extract results is stored in the "jextract_result" JSON attribute:
{
...
"jextract_result": {
"candidates": [
{
"line_start": 274,
"line_end": 276,
"length": 3
}
]
},
...
}
The raw data described above was further processed to obtain various statistical analysis data.
LLM effectiveness: datasets/evaluation/extended_corpus/tool_evaluation__extended_corpus_llm_effectiveness_RQ1.csv
Our tool: datasets/evaluation/extended_corpus/tool_evaluation__extended_corpus_ranking_RQ3.csv
LiveRef: datasets/evaluation/extended_corpus/tool_evaluation__extended_corpus_ranking_RQ3.csv
Our tool is a code renovation tool that developers use interactively, it is important to evaluate whether our tool makes suggestions that developers accept. We employ firehouse surveys with professional developers from our collaborating enterprises, focusing on newly created long methods they committed into code repositories. Then, we report their responses regarding the refactoring recommendations proposed by our tool.
The table below summarizes the results meant to answer RQ4. How useful are the provided recommendations to developers?
User | Method | Screenshot | Strongly Agree |
Agree | Somewhat Agree |
Somewhat Disagree |
Disagree | Strongly Disagree |
---|---|---|---|---|---|---|---|---|
P1 | doGetSeverity() | screenshot_p1 | X | |||||
P2 | buildExternalProjectHierarchy() | screenshot_p2 | X | |||||
P3 | fillQualifierAsArgumentContributor() | screenshot_p3 | X | |||||
P4 | createDependencyDataNode() | screenshot_p4 | X | |||||
P5 | findMethodToRun() | screenshot_p5 | X | |||||
P6 | loadPluginModules() | screenshot_p6 | X | |||||
P7 | showInlayRunToCursor() | screenshot_p7 | X | |||||
P8 | doGetIllegalDependencies() | screenshot_p8 | X | |||||
P9 | stripTextBlockIndent() | screenshot_p9 | X | |||||
P10 | checkLibraries() | screenshot_p10 | X | |||||
P11 | step() | screenshot_p11 | X | |||||
P12 | checkXInput() | screenshot_p12 | X | |||||
P13 | collectAnnotations() | screenshot_p13 | X | |||||
P14 | writeJvmClass() | screenshot_p14 | X | |||||
P15 | detectLombokJarsSlow() | screenshot_p15 | X | |||||
P16 | getProcessOutput() | screenshot_p16 | X |