New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Python Matching Script and Associated Changes #100
Conversation
@andersonfrailey, taxdata pull request #100 is excellent work. Thanks for your efforts on this complicated task. |
@andersonfrailey, I have a couple of questions about taxdata pull request #100.
|
@martinholmer, I looked into this. Here's the email I sent @andersonfrailey with my conclusions: "After looking into SciPy’s linsolve, it doesn’t look very reliable. This pull-request from March 2017 proposes a completely different solver than the one implemented now. It also points out a bunch of places where the current SciPy solver performs sub optimally. So, if we use SciPy, we would have to carefully inspect the results to make sure that they are accurate. On the other hand, according to this paper by a government contractor, CLP is the best open source linear program solver. And CyLp is a Python interface to this package. The paper was written in 2013, but I think the findings are still relevant. The downsides of CyLp are that it is not Python 3+ compatible and it is not being actively maintained. Although there is some activity on its github page, there does not appear to be any major development. For now, I think that we should keep using CyLp. It’s just more reliable right now. We should keep an eye on the development of Scipy’s linsolve. I would much rather be in the SciPy/NumPy sphere." Here's an additional link to a thread on the lin_solve package development on the scipy mailing list. |
Correct.
I believe we should keep the age_consistency function. I added a more detailed explanation in issue #92.
@hdoupe and I have talked about this recently. While I agree it would be preferable to use the scipy package, after @hdoupe did some digging into the package, we came to the conclusion it wouldn't be an improvement over CyLp yet. Hopefully sometime down the road it will be better suited for our needs. |
@martinholmer, setting up cylp is a pain. I had to install Python 2.7, and then |
@martinholmer said
I totally agree we should make this process easier for users. At the beginning there wasn't a very specific reason to use cylp -- simply it was the only LP solver in python I made work with a large equation set like the one we have. I'm happy to offer help if anyone has the bandwidth to revamp the code for scipy. |
@andersonfrailey, Please do something like this:
Meanwhile I'm running |
This PR is the culmination of the work discussed in #92. It removes all of the SAS Matching scripts and replaces them with python scripts.
Instructions for how to run the scripts can be found in the updated
README.md
file.It also updates
stage2/3.py
and the stage 2 notebook so that they can be used with the Python produced matched file.@martinholmer @MattHJensen @Amy-Xu @hdoupe