-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update RA manual and standardize language across documentation #20
Comments
Thanks @z-y-huang. I've read over the wiki and made some minor edits. Here are some additional to dos I'd suggest we knock off before sending for review.
|
Got it. It sounds like from the emails that we want to support Jupyter Notebooks, which I just added with a run function called |
Hmm.
Can the installation of Jupyter be handled as a dependency in
/setup/requirements.txt? Or do you think it would need to be installed
manually?
… |
I believe we can add Jupyter to the |
Great. Seems like a desiterata generally that our repos play nicely with
Anaconda python installations.
|
Hi @gentzkow: I pushed a branch addressing the items here. Four comments:
|
Great. On (1), I understand that reasoning. I think the rule we want to stick to is just that code can never reference a directory outside its own module. I continue to go back and forth between these two structures we've discussed before /data/ with subdirectory /raw/ or /raw/ The former would correspond to a case where /raw/ is consumed only by code within /data/ and then (most) everything downstream consumes /data/output/ but not /data/raw/. The latter would correspond to a case where many different modules read from /raw/. An annoying detail is what to do about things like image files we want to use only for slides. Current plan would be those go in /paper_slides/raw/. In the second scheme, are we OK having /raw/ both as a standalone module and as a sub within other modules? Or do we want to force those image files to all be in the top-level /raw/ directory. Let me know your views. On (2), I like the idea of a utility script but let's sequence that later after we finish tidying up and do the next round of review. For now let's just make sure the manual instructions are clear. Good on (3)-(4). One note: on the hard-committed input files for paper_slides what we currently have doesn't work. I still can't compile paper.lyx without running make.py first. I think the input files were committed as symbolic links rather than actual files. |
Also, this looks like a typo in setup_repository.py...
|
Can we rename setup_repository.py -> check_setup.py? That seems more accurate if the script is not doing any actual setup just checking if it's correct. |
I get the following error even when I run make.py in /analysis/, where there are no files >0.5mb.
|
My mistake, I didn't push a bugfix. I fixed the typo and committed the hard files in The size check function is currently repository-wide, but I can make it only walk through the module if that's what we want. |
Ah, I see. Yes, I think it would make sense to have the size check function be module-specific. It would also be great if it could list the offending files. Do you know why I would have been getting that warning on a clean checkout of the template? Presumably it should be set up so all large files are tracked w/ LFS. |
The list of offending files are saved in The data-generating example script is saving TXTs instead of a CSVs, which we don't have added to LFS. It looks like |
I personally like the idea of a centralized I think either way, we do want a clear distinction between files that are used only for a specific module vs. files that are used across multiple modules. That way, the dependencies are clearly outlined and it's much less likely that someone forgets to run all of the modules that rely on a file. |
Great. Thanks. On the large files, saving in make.log is fine. Let's update the warning message to point the user to make.log for the list. Let's switch the data-generating example so it saves .csv rather than .txt and add .tar to LFS. (We might want to make a note to clean up the previously committed versions of these at some point so the repository isn't heavier-weight than necessary; will matter since this repo will be a base for all other repos.) On raw, let's go to a centralized There is a general question of how we look at dependencies downstream. |
…s to CSV, rewrite warning/error messages, and add tar files to LFS
I have addressed the comments here and here and have updated all of the documentation accordingly. I can put (i) cleaning the commit history (the |
Great! I think we're basically done and ready to go to review. The only small thing I'd suggest is that we store some actual raw data in /raw/ that is then consumed by /data/ (rather than having the code in /data/ generate the data itself). We might as well make the example code a little more live by having a raw data set in /raw/ then doing some transformations of it in /data/. When we send for review we should make clear to everyone that it's next on the to do list to replace all of this with a more interesting / rich set of example scripts. Oh, and also: I think I'm going to backtrack on my suggestion to have the .ipynb code as part of the example code in the main repo. It's great that you wrote the run_jupyter function and we definitely will want to put a jupyter notebook directory as one of the add-on examples we include but I think it will be simpler if all the examples are plain vanilla python. |
I added a dataset in |
…ation (#22) * Add Jupyter notebook support for #20 * Update run_jupyter to use Python API for #20 * Test out run_jupyter * Make run_jupyter Python version agnostic for #20 * Update README and setup_repository for #20 * Make edits to README for #20 * Simplify setup_repository * Track /input/ and /external/ for /paper_slides/, generalize executable for git-lfs for #20 * Bugfix and commit paper_slides input for #20 * Typechange for symlink conversion for #20 * Change check_repo_size to check_module size for #20 * Commit everything from 882ef90 * #20: Update documentation, standardize language, change TXT data files to CSV, rewrite warning/error messages, and add tar files to LFS * #20: Reword README to be repository agnostic and update setup scripts * Fix typo * #20 correct typo in readme.md * Add context to administrator mode in README * #20: Restructure repository to use already generated data file in /raw/ * #20: Delete notebook in analysis * #20: Set merge strategy to ours for log files * Fix .gitattributes reference * #20: Add merge strategy ours to config file * #20: Set merge strategy to binary for log files
gslab_make
The text was updated successfully, but these errors were encountered: