EM-Assistant

Replication package for paper submission. This repository contains an archive with IntelliJ IDEA plugin, installation and usage instructions.

Overview

An IntelliJ IDEA plugin that recommends Extract Function refactoring based on the LLM's suggestions.

The plugin is available for free on JetBrains Marketplace.

Requirements

You need to install IntelliJ IDEA 2023.1 or higher.
Configure OpenAI key.

OpenAI key configuration

Sign up for OpenAI at https://beta.openai.com/signup.
Get your OpenAI API key.
Open IntelliJ IDEA, go to Settings | Tools | Large Language Models and enter your API key in the OpenAI Keyfield.

Plugin installation

Install the EM-Assist plugin from JetBrains Marketplace.

Usage example

To use the plugin, you need to right-click on the method, select Show Context Actions and click on Extract Function experiment. The plugin will then send a request to OpenAI and await a response. Once received, it will present the suggestions in a dialog box, the corresponding code for extraction will be highlighted. To choose one of the suggestions, just right-click on it, and the plugin will automatically apply extract function refactoring.

Prompt example

Below you can find a sample prompt structure:

OpenAiChatMessage(
   "system",
   """
           You are a skilled software developer. You have immense knowledge on software refactoring.
           You communicate with a remote server that sends you code of functions (one function in a message) that it wants to simplify by applying extract method refactoring.
           In return, you send a JSON object with suggestions of helpful extract method refactorings. It is important for suggestions to not contain the entire function body.
           Each suggestion consists of the start line, end line, and name for the extracted function.
           The JSON should have the following format: [{"function_name": <new function name>, "line_start": <line start>, "line_end": <line end>}, ..., ].
           """.trimIndent()
),
OpenAiChatMessage(
   "user",
   """
   280. public void connect(Figure figure) {
   281.     if (fObservedFigure != null)
   282.         fObservedFigure.removeFigureChangeListener(this);
   283.
   284.     fObservedFigure = figure;
   285.     fLocator = new OffsetLocator(figure.connectedTextLocator(this));
   286.     fObservedFigure.addFigureChangeListener(this);
   287.     if (fLocator != null) {
   288.         Point p = fLocator.locate(fObservedFigure);
   289.         p.x -= size().width/2 + fOriginX;
   290.         p.y -= size().height/2 + fOriginY;
   291.    
   292.         if (p.x != 0 || p.y != 0) {
   293.             willChange();
   294.             basicMoveBy(p.x, p.y);
   295.             changed();
   296.         }
   297.     }
   298. }
""".trimIndent()
),
OpenAiChatMessage(
   "assistant",
   """
           [
           {"function_name":  "updateLocator", "line_start":  288, "line_end": 296}
           ]
           """.trimIndent()
),
OpenAiChatMessage(
   "user",
   """
   92.  public void mouseUp(MouseEvent e, int x, int y) {
   93.      if (e.isPopupTrigger()) {
   94.          Figure figure = drawing().findFigure(e.getX(), e.getY());
   95.          if (figure != null) {
   96.              Object attribute = figure.getAttribute(Figure.POPUP_MENU);
   97.              if (attribute == null) {
   98.                  figure = drawing().findFigureInside(e.getX(), e.getY());
   99.              }
   100.             if (figure != null) {
   101.                 showPopupMenu(figure, e.getX(), e.getY(), e.getComponent());
   102.             }
   103.         }
   104.     }
   105.     else if (e.getClickCount() == 2) {
   106.         handleMouseDoubleClick(e, x, y);
   107.     }
   108.     else {
   109.         super.mouseUp(e, x, y);
   110.         handleMouseUp(e, x, y);
   111.         handleMouseClick(e, x, y);
   112.     }
   113. }
""".trimIndent()
),
OpenAiChatMessage(
   "assistant",
   """
           [
           {"function_name":  "computeFigure", "line_start":  94, "line_end": 103},
           {"function_name":  "computeAttribute", "line_start":  96, "line_end": 102}
           ]
           """.trimIndent()
),
OpenAiChatMessage(
   "user",
   codeSnippet
)

Here's an actual example of a prompt used:

2023-10-12 11:30:35,883 [2976602]   INFO - #c.i.m.l.t.models - Sending request to OpenAI API with model=gpt-3.5-turbo and

messages=[OpenAiChatMessage(role=system, content=You are a skilled software developer. You have immense knowledge on software refactoring.
You communicate with a remote server that sends you code of functions (one function in a message) that it wants to simplify by applying extract method refactoring.

In return, you send a JSON object with suggestions of helpful extract method refactorings. It is important for suggestions to not contain the entire function body.
Each suggestion consists of the start line, end line, and name for the extracted function.
The JSON should have the following format: [{"function_name": <new function name>, "line_start": <line start>, "line_end": <line end>}, ..., ].),

OpenAiChatMessage(role=user, content=
280. public void connect(Figure figure) {
281. 	if (fObservedFigure != null)
282.     	fObservedFigure.removeFigureChangeListener(this);
283.
284. 	fObservedFigure = figure;
285. 	fLocator = new OffsetLocator(figure.connectedTextLocator(this));
286. 	fObservedFigure.addFigureChangeListener(this);
287. 	if (fLocator != null) {
288.     	Point p = fLocator.locate(fObservedFigure);
289.     	p.x -= size().width/2 + fOriginX;
290.     	p.y -= size().height/2 + fOriginY;
291.
292.     	if (p.x != 0 || p.y != 0) {
293.         	willChange();
294.         	basicMoveBy(p.x, p.y);
295.         	changed();
296.     	}
297. 	}
298. }),

OpenAiChatMessage(role=assistant, content=[
{"function_name":  "updateLocator", "line_start":  288, "line_end": 296}
]), OpenAiChatMessage(role=user, content=

92.  public void mouseUp(MouseEvent e, int x, int y) {
93.  	if (e.isPopupTrigger()) {
94.      	Figure figure = drawing().findFigure(e.getX(), e.getY());
95.      	if (figure != null) {
96.          	Object attribute = figure.getAttribute(Figure.POPUP_MENU);
97.          	if (attribute == null) {
98.              	figure = drawing().findFigureInside(e.getX(), e.getY());
99.          	}
100.         	if (figure != null) {
101.             	showPopupMenu(figure, e.getX(), e.getY(), e.getComponent());
102.         	}
103.     	}
104. 	}
105. 	else if (e.getClickCount() == 2) {
106.     	handleMouseDoubleClick(e, x, y);
107. 	}
108. 	else {
109.     	super.mouseUp(e, x, y);
110.     	handleMouseUp(e, x, y);
111.     	handleMouseClick(e, x, y);
112. 	}
113. }), OpenAiChatMessage(role=assistant, content=[
{"function_name":  "computeFigure", "line_start":  94, "line_end": 103},
{"function_name":  "computeAttribute", "line_start":  96, "line_end": 102}
]), OpenAiChatMessage(role=user, content=

63. static void writeJvmClass(JvmClass jvmClass, DataOutput out) throws IOException {
64. 	writeJVMClassNode(jvmClass, out);
65. 	out.writeUTF(jvmClass.getSuperFqName());
66. 	out.writeUTF(jvmClass.getOuterFqName());
67. 	//  Write myInterfaces;
68. 	int interfacesCount = 0;
69. 	for (String myInterface : jvmClass.getInterfaces()) {
70.   	interfacesCount++;
71. 	}
72. 	DataInputOutputUtil.writeINT(out, interfacesCount);
73. 	for (String myInterface : jvmClass.getInterfaces()) {
74.   	out.writeUTF(myInterface);
75. 	}
76. 	//  Write myFields
77. 	int fieldsCount = 0;
78. 	for (JvmField field : jvmClass.getFields()) fieldsCount++;
79. 	DataInputOutputUtil.writeINT(out, fieldsCount);
80. 	for (JvmField field : jvmClass.getFields()) {
81.   	writeJvmField(field, out);
82. 	}
83.
84. 	//  Write myMethods
85. 	int methodCount = 0;
86. 	for (JvmMethod jvmMethod : jvmClass.getMethods()) methodCount++;
87. 	DataInputOutputUtil.writeINT(out, methodCount);
88. 	for (JvmMethod jvmMethod : jvmClass.getMethods()) {
89.   	writeJvmMethod(jvmMethod, out);
90. 	}
91.
92. 	//  Write AnnotationTargets
93. 	int elemTypeCount = 0;
94. 	for (ElemType elemType : jvmClass.getAnnotationTargets()) elemTypeCount++;
95. 	DataInputOutputUtil.writeINT(out, elemTypeCount);
96. 	for (ElemType elemType : jvmClass.getAnnotationTargets()) {
97.   	writeElemType(elemType, out);
98. 	}
99.
100. 	if (jvmClass.getRetentionPolicy() != null) {
101.   	out.writeUTF(jvmClass.getRetentionPolicy().name());
102. 	}
103. 	else {
104.   	out.writeUTF("");
105. 	}
106.   })]
2023-10-12 11:30:39,250 [2979969]   INFO - #c.i.m.llm - Raw response:
{
  "id": "chatcmpl-88ss0b30S6ZiMDQ24ImloFrYjVrEy",
  "object": "chat.completion",
  "created": 1697128236,
  "model": "gpt-3.5-turbo-0613",
  "choices": [
	{
  	"index": 0,
  	"message": {
    	"role": "assistant",
    	"content": "[\n
{\"function_name\":  \"writeInterfaces\", \"line_start\":  69, \"line_end\": 75},\n
{\"function_name\":  \"writeFields\", \"line_start\":  78, \"line_end\": 82},\n
{\"function_name\":  \"writeMethods\", \"line_start\":  86, \"line_end\": 90},\n
{\"function_name\":  \"writeAnnotationTargets\", \"line_start\":  94, \"line_end\": 98}\n]"
  	},
  	"finish_reason": "stop"
	}
  ],
  "usage": {
	"prompt_tokens": 1116,
	"completion_tokens": 99,
	"total_tokens": 1215
  }
}

Datasets

Datasets are accessible via the provided link.

To validate our technique, we used the following datasets:

Community Corpus consists of 122 Java methods and their corresponding Extract Method refactorings collected from five open-source repositories: MyWebMart, SelfPlanner, WikiDev, JHotDraw, and JUnit. This dataset previously served as the foundation for evaluating various state-of-the-art Extract Method refactoring automation tools, including JExtract, JDeodorant, SEMI, GEMS, and REMS.
Extended Corpus: To enhance the robustness of our evaluation with a sizable oracle of actual refactorings performed by developers, we constructed Extended Corpus. To create it, we employed RefactoringMiner for detecting Extract Method. We ran it on 12 highly regarded open-source repositories: CoreNLP, infinispan, HtmlUnit, robovm, google/guava, mbassador, spring-boot, google/gson, smart-doc, bytes-java, apache/datasketches-java, javaparser. After filtering to remove refactoring commits that mixed feature additions (the one-liners and the extracted methods whose body overlapped a large proportion of the host method), we retain 1752 Extract Methods from these repositories.

Dataset Details

The datasets included in this repository are represented in JSON format. We have used MongoDB to perform queries and navigate through the data.

Raw Data

Oracle

Each record has an attribute named "oracle". For example:

{
  ...
  "oracle": {
    "line_start": 612,
    "line_end": 630,
    "url": "https://github.com/JetBrains/intellij-community/tree/405abc6878abe05f755a5a0a349a880139b9163e/plugins/maven/src/test/java/org/jetbrains/idea/maven/dom/MavenFilteredPropertiesCompletionAndResolutionTest.java#L612-L630"
  },
  ...
}

The "oracle" represents the Extract Method refactoring that was performed by the developer. This oracle is used throughout the evaluation process.

LLM raw data

Our study involved querying multiple LLMs with various temperatures. This LLM response data was saved and further used for answering RQ1, RQ2, and RQ3. This data can be found in the attribute "response_extracted". For example:

{
  ...
  "llm_multishot_data": {
    "temperature_<temperature_value>": [
      {
        "response_extracted": "[{\"function_name\":  \"extractedMethodName\", \"line_start\":  442, \"line_end\": 443}\n]", 
        "shot_no": 0
      }
    ],
     ...
  }
}
...
]
}
}

Ranked candidates

Suggestions coming from LLMs are transformed into candidates by our tool. The data used to answer RQ3: How effective is EM-Assist in providing refactoring recommendations over existing approaches? we used the data stored in the "jetgpt_ranking_samples" property as shown in the example below. There are 30 objects corresponding to 30 random samples of the LLM data. We report averages over these samples.

{
  ...
  "jetgpt_ranking_samples": [
     {
       "<HEURISTICS_KEY>": {
         "temperature_<temperature_value>": {
           "rank_by_popularity_times_heat": [
             {
               "candidate_type": "AS_IS",
               "application_result": "OK",
               "line_start": 612,
               "line_end": 630,
               ...
             }
           ],
           ...
         }
       }
     },
     ...
  ],
  ...
}

Suggestion evaluation

The data used to answer RQ1: How effective are LLMs at generating refactoring suggestions?, can be found in the "suggestion_evaluation" JSON attribute:

{
  ...
  "suggestion_evaluation": {
    "llm_multishot_data": {
      "temperature_<temperature_value>": [
        {
          "candidate_type": "AS_IS",
          "application_result": "OK",
          "application_reason": "",
          ...
        }
      ]
    }
  },
  ...
}

J-Extract execution data

To further strengthen the validity of our results, we applied EM-Assist on the Extended Corpus that includes 1752 actual refactorings from open-source projects. We applied the previous best in class static analysis tool J-Extract, to the same dataset. The raw data for J-Extract results is stored in the "jextract_result" JSON attribute:

{
  ...
  "jextract_result": {
    "candidates": [
      {
        "line_start": 274,
        "line_end": 276,
        "length": 3
      }
    ]
  },
  ...
}

Processed data

The raw data described above was further processed to obtain various statistical analysis data.

RQ1 processed data

LLM effectiveness: datasets/evaluation/extended_corpus/tool_evaluation__extended_corpus_llm_effectiveness_RQ1.csv

RQ3 processed data

Our tool: datasets/evaluation/extended_corpus/tool_evaluation__extended_corpus_ranking_RQ3.csv

LiveRef: datasets/evaluation/extended_corpus/tool_evaluation__extended_corpus_ranking_RQ3.csv

Developer Survey

Our tool is a code renovation tool that developers use interactively, it is important to evaluate whether our tool makes suggestions that developers accept. We employ firehouse surveys with professional developers from our collaborating enterprises, focusing on newly created long methods they committed into code repositories. Then, we report their responses regarding the refactoring recommendations proposed by our tool.

The table below summarizes the results meant to answer RQ4. How useful are the provided recommendations to developers?

User	Method	Screenshot	Strongly Agree	Agree	Somewhat Agree	Disagree	Strongly Disagree
P1	doGetSeverity()	screenshot_p1		X
P2	buildExternalProjectHierarchy()	screenshot_p2					X
P3	fillQualifierAsArgumentContributor()	screenshot_p3		X
P4	createDependencyDataNode()	screenshot_p4		X
P5	findMethodToRun()	screenshot_p5			X
P6	loadPluginModules()	screenshot_p6				X
P7	showInlayRunToCursor()	screenshot_p7			X
P8	doGetIllegalDependencies()	screenshot_p8					X
P9	stripTextBlockIndent()	screenshot_p9		X
P10	checkLibraries()	screenshot_p10		X
P11	step()	screenshot_p11	X
P12	checkXInput()	screenshot_p12	X
P13	collectAnnotations()	screenshot_p13			X
P14	writeJvmClass()	screenshot_p14			X
P15	detectLombokJarsSlow()	screenshot_p15			X
P16	getProcessOutput()	screenshot_p16		X

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
datasets		datasets
oracle		oracle
pictures		pictures
plugin		plugin
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

EM-Assistant

Table of Contents

Overview

Requirements

OpenAI key configuration

Plugin installation

Usage example

Prompt example

Datasets

Dataset Details

Raw Data

Oracle

LLM raw data

Ranked candidates

Suggestion evaluation

J-Extract execution data

Processed data

RQ1 processed data

RQ3 processed data

Developer Survey

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

llm-refactoring/llm-refactoring-plugin

Folders and files

Latest commit

History

Repository files navigation

EM-Assistant

Table of Contents

Overview

Requirements

OpenAI key configuration

Plugin installation

Usage example

Prompt example

Datasets

Dataset Details

Raw Data

Oracle

LLM raw data

Ranked candidates

Suggestion evaluation

J-Extract execution data

Processed data

RQ1 processed data

RQ3 processed data

Developer Survey

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages