-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create a ML model for the patch recipients based on the recipients of sent patches #65
Comments
2: if we do go for a deep learning based technique, it would be better to pass the entire patch as the model would discover optimal higher level features from those. I am not sure if this is indeed a good usecase for a ML model as the problem statement isnt that fuzzy. There are a finite and tractable number of rules that can be written down that will give a definite answer to who are the maintainers of a patch. |
It would be helpful if the data available could be made public @bulwahn . If not the entire data set, I guess sufficient amount of data to train and experiment various models might be helpful. Also a metadata file would be highly useful |
All data is available in this repository here: https://github.com/lfd/PaStA-resources |
@rsarky Your hypothesis (2:) needs to be proved or disproved; I would not state that as a matter of fact. The task here would be to investigate that. |
I agree. I was feeling a bit off about point 2, because if we do indeed choose a subset of features such as those you mentioned, each of them has some rule that you could state that would help in determining the maintainers if I am not wrong. But I would be interested in seeing if this indeed works. Another question to consider is how do we form a good ground truth. If it is driven by get maintainers script then the model will be inherently limited. A manual ground truth is ideal but quite involved. |
Here the assumption is that we simply take the full email data as ground truth. How reliably can we predict who a patch will be sent to based on the previous observations? |
So if I understand correctly the assumption is that the existing email data gives a good approximation of who patches should be sent to? |
Yes, that would be the assumption of the first investigation. We may refine this by giving more weight on "confidence of its correctness" to patches that have been accepted vs. the ones being ignored, or based on the sender's known involvement, e.g., active for many years, known maintainer, etc. |
We have a lot of data from patches and we can use this for creating a ML model to predict the recipients a patch would be sent to, based on the existing data.
Various questions immediately arise:
The text was updated successfully, but these errors were encountered: