This repo contains a replication package for the paper entitled ‘‘Why is Developing Machine Learning Applications Challenging? A Study on Stack Overflow Posts’’ published as part of the 2019 ESEM conference.
@inproceedings{DBLP:conf/esem/AlshangitiSMLY19,
author = {Moayad Alshangiti and
Hitesh Sapkota and
Pradeep K. Murukannaiah and
Xumin Liu and
Qi Yu},
title = {Why is Developing Machine Learning Applications Challenging? {A} Study
on Stack Overflow Posts},
booktitle = {2019 {ACM/IEEE} International Symposium on Empirical Software Engineering
and Measurement, {ESEM} 2019, Porto de Galinhas, Recife, Brazil, September
19-20, 2019},
pages = {1--11},
publisher = {{IEEE}},
year = {2019},
url = {https://doi.org/10.1109/ESEM.2019.8870187},
doi = {10.1109/ESEM.2019.8870187},
timestamp = {Wed, 23 Oct 2019 17:15:06 +0200},
biburl = {https://dblp.org/rec/bib/conf/esem/AlshangitiSMLY19},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
The package consists of the following:
-
code.R: This file contains all the code needed to replicate all the figures found in the paper. We have provided detailed commentary with the code to help explain the content.
-
quantitative_sample: This folder contains the StackOverflow quantitative study sample discussed in the paper consisting of 86983 ML related questions posts. Moreover, the answers (when an accepted answer is available) are also provided for the sample. Finally, we provided the web development sample that was used as part of RQ1 to compare the response time between web development questions and machine learning questions.
-
qualitative_sample: This folder contains the StackOverflow qualitative study sample discussed in the paper consisting of 684 ML related questions generated by 50 unique users alongside their labels. Moreover, the user expertise labels are also provided.
-
custom: This folder encapsulates all other data used within the paper. Specifically, the LDA and topic-term matrices for the discovered 30 topics. The tags and their statistics, and the ExpertiseRank score generated to compare the number of experts in machine learning against web development.