The dataset we construct is available here.
Each csv file contains all the crowdsourced test reports for an app, with the following format in each line: <report-id>, <description>, <screenshot-url>, <category-label>.
Constructing new dataset based on the same format is also feasible.
Notice: screenshots are part of the crowdsourced test reports but are not used in LLMCluster.
$ pip install openai scikit-learn evaluateSet OpenAI API key in Environment Variable "OPENAI_API_KEY".
Make sure the "GPT-4o" model is available by running python llm.py.
Change the LLM service in llm.py is also feasible.
$ python main.py <dataset-root> <app-id><app-id> is derived from the csv file name: app2.csv => app-id is 2.
For example, python main.py /workspace/llmcluster/reports 17
main.py: implementation of the LLM-based crowdsourced test report clustering
eval.py: implementation of the evaluation of clustering result and generated summaries
llm.py: managing the LLM querying service
prompt.py: managing the task prompts for querying