-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Review Sherlock Extension on Practice Task #18
Comments
@gentzkow, @anacarolinapq, @snairdesai let me know what you think. Also @anacarolinapq could you please tag other lab members who might be trying out the Sherlock Extension task? I think the purpose of the task is to give lab members experience with using Sherlock within their workflow. Thus, I think the current task should be modified to fit into the GitHub workflow. That would entail creating a new issue, creating a new branch, cloning the branch on Sherlock, editing a script in a module, submitting a job on Sherlock, committing the change, pushing it to the branch, and creating the pull request. My suggestion for the extra task would be to cluster the standard errors at the county level on both regressions and adding a note on the tables mentioning the fixed effects the model uses and that the std. errors are clustered at the county level. Below you can find my comments for each bullet on Sherlock Extension —————————————————————
This is helpful and clear —————————————————————
Some comments on the “Installing and loading software” subsection on the referenced RA Manual section: ————————————————————— “Setting up GitHub” was to me the most confusing instruction as there is no guidance on the cited documentation. In essence, what “setting up GitHub” means is to have authorization to perform git commands on your repositories (repositories for ongoing projects are usually private). To do this, it would be useful to point lab members to this page in the GitHub Docs. —————————————————————
If the lab member successfully connected to GitHub (use —————————————————————
The first step is to create a Usually this step can be skipped if we plan to use the default file created by fetching the The second step is to install miniconda and oracle-idk. As mentioned above, Homebrew is not available on Sherlock. For practice sake, lab users could install both of these packages in their —————————————————————
Before this bullet, we could include checking if the setup process was done correctly as a step. If you load all the required modules in Sherlock, running —————————————————————
Here I would add a reference to the Research clusters section in the RA Manual. I would even go further to make everyone’s life easier and provide one (or several) bash script templates for use in the lab. The
————————————————————— Additional comments If we decide to include the extra task (clustering std. errors at county level), it would be useful to add guidance on SSHFS. I provided some info related to SSHFS below. The task could be done directly on the terminal (i.e. editing the Python script using nano), using SSHFS to edit the script on your preferred IDE, or using the longer process described below:
|
Following up with a few additional comments.
|
Thanks @jc-cisneros @snairdesai for all the great work here. Having your eyes on how to improve this task is very helpful. I'm happy to defer to you on the details, but the changes you outline above look broadly fine. One note: Let's keep a clear distinction between instructions that are specific to the practice task and instructions that apply more broadly to setting up and using Sherlock. The latter should live in an appropriate place in the RA manual separate from the practice task. The practice task instructions can then link to them. |
@jc-cisneros and I made a revised Markdown which can be integrated into the RA manual if helpful. We edited both the Practice Task section and the Research Clusters section. You can find our script here. Tagging here for edits/feedback: @gentzkow @anacarolinapq. |
Thanks @snairdesai. This looks good to me on a quick review. One thing to flag for further discussion: We've run into various points at which the use of Conda for the template causes friction. The whole point of using Conda is to reduce friction. Should we think about making the template more flexible so using Conda is optional rather than required? Also, can you remind me why we need oracle-jdk? |
|
(1) OK, I agree about local use. The question is whether we could set up the template so you have the option not to use Conda. I think that might only require giving instructions for how to install the Python and R dependencies listed in (2) I meant in general. I just forget what we're using that for in the template. |
@gentzkow @jc-cisneros @anacarolinapq In response to (1) relating to Conda, we can make Conda optional for users. There are a couple of paths here:
The first option comes with two key drawbacks. First, we have to change the instructions in the README every time a dependency is changed. This could lead to some errors on our side. Also, the burden on the user has increased. We would have to hope they install each dependency correctly, conditional on the instructions being up to date. The second option might be more robust. The script can first check the user's local application versions (i.e., for Python and R), and compare these to the dependency versions listed in @jc-cisneros and I have built a basic script which attempts to do this. You can find it in the last code chunk here. We have also drafted revised template instructions to provide this functionality for non-conda users in the Colab. We can create an issue and move these to GitHub if helpful. The script currently runs successfully with template, notifying the user that they do not have a conda environment active on their local computer. There are a few drawbacks/comments here:
In response to (2) relating to oracle-jdk, we have not had much luck determining why oracle is needed to run template. When we run the template with conda, but without installing or initializing oracle jdk, everything works as expected. @anacarolinapq and @szahedian were also unsure why oracle-jdk is needed. Here is the earliest commit I could find referencing oracle-jdk in the template (discussed in Issue #30 in Our best guess is that oracle-jdk might make the conda setup process slightly faster, but it can likely be dropped. In terms of next steps, @jc-cisneros and I are now trying to see if it is possible to integrate our build script into Sherlock, since Conda is more likely to cause issues there. |
Thanks @snairdesai. My initial reaction is to prefer the first option (manual install) to the second (script). Here's my reasoning:
Let me know if that sounds right to you or if you think we should discuss further. You all are closer to the details than I am so my intuition may well be off. On oracle-jdk, I'd vote we remove it from the template. We can revisit if we discover doing so causes problems. |
Understood, we were also thinking through the different user groups locally (ii) vs. in Sherlock (i), and how lab members would require less instruction in (i). If Docker will be implemented in the local case (ii), then your point on debugging also makes sense. @jc-cisneros and I will update the template draft linked above to reflect this (including removing oracle-jdk). Happy to also draft generic instructions for a project as a test case if that would be useful. We might consider slightly revising the message printed to the user in the case that Also quick follow up to above, is there a reason we don't list the specific version numbers of packages in |
Great. Thanks. I like the idea of revising the error message. I'd also be in favor of including version numbers for all packages. |
@gentzkow We (@jc-cisneros and I) have finished working through three important deliverables here: 1. Rewriting the instructions for the Sherlock Extension Practice Task. If 1. looks good to you (text is attached below), I will go ahead and make these edits directly in Practice Task Addition2) Sherlock Extension
|
Thanks @snairdesai. This sounds great! Just a reminder that this task should be lower priority than any ongoing work for FB2020. |
@szahedian Tagging this PR for review whenever you get a chance |
@szahedian Just wanted to tag this PR again for review whenever you get a chance. In the meantime, I'll go ahead and close out this issue with a summary comment. Thanks again! |
SummaryIn this issue (#18), we (@jc-cisneros and @snairdesai) completed the following tasks:
|
The goal of this issue is to capture feedback from lab members testing the Sherlock Extension on the Practice Task.
The text was updated successfully, but these errors were encountered: