Pull request for #28: gslab_scons restructure#47
Conversation
Conflicts: gslab_scons/tests/log/make.log
Conflicts: gslab_scons/tests/log/make.log
|
Request noted. Barring something new on congress_text, this is up next for me. |
arosenbe
left a comment
There was a problem hiding this comment.
Nice job @stanfordquan! Don't let the number of comments suggest that your work wasn't very good. They're more my desire to make sure that everything is up to snuff.
To your specific comments in the changelog. Do you want my comments on the decisions you made, or are you just reporting them?
Some comments of my own that I couldn't fit into particular comments on the code.
-
It might be good to document in a README the way in which we specify executables in the
sourceandtargetlists inSCons(i.e. the first element of the list.) This would remove redundancy in the docstrings of the individual functions.- We could also use the README to give instructions on how to use the builders.
-
There's a typo in the docstring to the
current_timefunction inmisc.py. -
Update the version in the README
-
As a verb, one should use "set up."
-
It seems like there's some redundancy in the Prelims, Close log, and Append builder-log, etc. sections across the builders. Can I get your opinion on handling them in a single more unified function? For the record, I do like how clean and clear the current system is, and I'm not positive that abstraction would improve readability (the tradeoff between reading something once/making explicit that different applications use the same logic on the one hand and the benefit of keeping everything more or less in the same place on the other). This is a point for discussion, not a recommendation.
- I think we should be using
'ab'and'rU'when writing to and reading from logs. - We're vulnerable to race conditions because several scripts may be reading from and writing to the same logs at the same time. To the extent that the order of output in the logs is useful to us, we have to modify this.
- How should we handle logging when one script builds targets across several directories in build?
- How do you want to handle the concerns you note in the restructuring log file box.
- I think we should be using
-
I prefer to keep import statements from third_party and GSLab modules together. See the
from sys...import in build_stata. -
build_stata.pyis missing the#headings to break the script into logical chunks. -
I'm not sure how I feel about the stata helpers living in the same place as the more general helpers. It might create headache for future users if they have to scroll through several function with possibly relevant names in order to find the function their looking for.
-
The
make_list_if_stringfunction will also make lists out of lists, dictionaries, ints, pandas data frames, etc. -
There are different numbers of blank lines before the return statement in each builder.
| start_time = misc.current_time() | ||
|
|
||
| # Setup source file | ||
| source_file = str(source[0]) |
There was a problem hiding this comment.
What are the cases when the first element of the "source list" would not be already be a string? If there are such cases, do we want to coerce to a string and try to run, or do we want to raise an Exception, or do we want to leave "as is" and have SCons raise an exception when it tries to execute, say, an int? The last might be the most Pythonic, and I'd be in favor of the first. This applies to all builders.
| def check_code_extension(source_file, software): | ||
| def check_code_extension(source_file, extension): | ||
| ''' | ||
| This function raises an exception if `source_file`'s extension |
There was a problem hiding this comment.
I'm not a huge fan of the visual on
`source_file`'s
Maybe we could change it to the extension of source_file? What do you think?
| misc.check_code_extension(source_file, '.lyx') | ||
|
|
||
| # Setup target file and log file | ||
| newpdf = source_file.replace('.lyx','.pdf') |
There was a problem hiding this comment.
This won't handle edge.lyx.case.lyx well.
| # Move rendered pdf to the target | ||
| shutil.move(newpdf, target_file) | ||
| except subprocess.CalledProcessError: | ||
| message = system_call_error("lyx", command) |
There was a problem hiding this comment.
I can't seem to find the system_call_error function. Did you mean command_error_msg? Applies to all such cases.
| import os | ||
| import subprocess | ||
| import shutil | ||
| import gslab_scons.misc as misc |
There was a problem hiding this comment.
Do we want to specify the particular functions that we import from gslab_scons.misc? I'm not sure, but it seems silly to always drag around the Stata stuff into a Lyx builder. If so, then this applies to all such import statements.
There was a problem hiding this comment.
I'm agnostic about this, so I'll implement it.
There was a problem hiding this comment.
On second thought, I disagree. There are only three Stata only helper functions in misc; this method means we'll be importing four to five helper functions separately and I'm not convinced that will be cleaner than the current implementation.
| @@ -2,27 +2,42 @@ | |||
| import sys | |||
There was a problem hiding this comment.
The import statements in this file are structured differently than those in the builders. Could we treat them in a unified manner?
|
|
||
| if not (mode in ['develop', 'cache', 'release']): | ||
| print("Error: %s is not a defined mode" % mode) | ||
| sys.exit() |
There was a problem hiding this comment.
I assume there'a a reason, but can you explain why we prefer the print and sys.exit combination to the more parsimonious raise Exception('message') option?
| print("Error: Version must be defined in release mode") | ||
| sys.exit() | ||
|
|
||
| start_message = "\n{0}{0}{0} New build: " + misc.current_time() + " {0}{0}{0}" |
There was a problem hiding this comment.
This might be better handled with sprintf style syntax.
| f.seek(0, 0) | ||
| f.write('Log created: ' + start_time + '\n' + | ||
| 'Log completed: ' + end_time + '\n \n' + content) | ||
| f.write('\n*** Builder log created: ' + start_time + '\n' + |
There was a problem hiding this comment.
Maybe sprintf format here as well?
| def log_timestamp(start_time, end_time, filename): | ||
| def log_timestamp(start_time, end_time, filename = 'sconstruct.log'): | ||
| '''Adds beginning and ending times to a log file.''' | ||
| with open(filename, mode = 'r+U') as f: |
There was a problem hiding this comment.
Why do we use mode 'r+U' instead of our standard 'rU'?
|
See this comment on the R builder in the sister issue in gslab-econ/gslab_template#27. |
|
Back on this after short vacation in TRANSville. |
Conflicts: gslab_scons/builders/build_python.py gslab_scons/builders/build_r.py gslab_scons/builders/build_stata.py gslab_scons/misc.py gslab_scons/tests/log/make.log gslab_scons/tests/test_misc.py
|
Remaining action items
|
|
@arosenbe what I understand from our discussion is that regarding to logging, we will:
Let me know if you agree. |
different appending mechanism from builder logs to sconstruct.log
|
Regarding the concerns that were raised, here is how I chose to tackle them:
which is in response to the MG's desire to separate source code and assets in template.
|
arosenbe
left a comment
There was a problem hiding this comment.
Nice work @stanfordquan. To your three points
- Cool
- I like this implementation a lot.
- Great
I think there are some stray files lying around. Make sure to remove them before merging.
| misc.check_code_extension(source_file, '.lyx') | ||
|
|
||
| # Set up target file and log file | ||
| newpdf = source_file.replace('.lyx','.pdf') |
There was a problem hiding this comment.
Doesn't this handle edge.lyx.case.lyx poorly?
| shell = True) | ||
| except subprocess.CalledProcessError: | ||
| message = misc.command_error_msg("Python", command) | ||
| raise BadExecutableError(message) |
There was a problem hiding this comment.
It doesn't look like BadExecutableError is loaded.
| log_timestamp(start_time, end_time, log_file) | ||
|
|
||
| return None | ||
|
No newline at end of file |
| builder_log_collect = {} | ||
|
|
||
| for root, dirs, files in os.walk(parent_dir): # take a walk in parent and child dirs | ||
| for file in files: |
There was a problem hiding this comment.
Let's not overwrite base python objects.
for f in files:| for file in files: | ||
| if file.endswith("sconscript.log"): | ||
| full_file = os.path.join(root, file) | ||
| with open(full_file, 'rU') as f: |
There was a problem hiding this comment.
We can make this f f_full.
| if file.endswith("sconscript.log"): | ||
| full_file = os.path.join(root, file) | ||
| with open(full_file, 'rU') as f: | ||
| s = f.readlines()[1] # skip a line |
There was a problem hiding this comment.
Why do we use the variable s?
|
|
||
| # scan sconstruct.log for start time (if looks unpythonic, see xkcd 1171) | ||
| with open('sconstruct.log', "r") as f: | ||
| s = f.readline() |
There was a problem hiding this comment.
Why do we use the variable s here?
|
|
||
| def log_timestamp(start_time, end_time, filename): | ||
| with open('sconstruct.log', "a") as sconstruct: | ||
| for file in this_run_list: |
There was a problem hiding this comment.
Don't overwrite base python.
| 'linux2': '-b'} | ||
| option = options[platform] | ||
| command = flavor + ' ' + option + ' %s ' + cl_arg # %s will take filename later | ||
| command = flavor + ' ' + option + ' %s %s' # %s will take filename later |
There was a problem hiding this comment.
Resolved, @arosenbe happy to leave it the way it is.
| Windows platform. | ||
| ''' | ||
| command = flavor + ' /e do' + ' %s ' + cl_arg # %s will take filename later | ||
| command = flavor + ' /e do' + ' %s %s' # %s will take filename later |
There was a problem hiding this comment.
Resolved, @arosenbe happy to leave it the way it is.
|
I'm getting the following error when trying to run the template. I think you're brining too much information into (gslab-python/issue28_restructure) template$ scons
scons: Reading SConscript files ...
scons: done reading SConscript files.
scons: Building targets ...
scons: `.' is up to date.
scons: done building targets.
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/atexit.py", line 24, in _run_exitfuncs
func(*targs, **kargs)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/GSLab_Tools-3.0.4-py2.7.egg/gslab_scons/log.py", line 43, in end_log
builder_logs = collect_builder_logs(parent_dir)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/GSLab_Tools-3.0.4-py2.7.egg/gslab_scons/log.py", line 79, in collect_builder_logs
builder_log_end_time = datetime.strptime(s, "%Y-%m-%d %H:%M:%S")
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/_strptime.py", line 325, in _strptime
(data_string, format))
ValueError: time data 'Log completed: 2017-02-06 14:31:02' does not match format '%Y-%m-%d %H:%M:%S'
Error in sys.exitfunc:
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/atexit.py", line 24, in _run_exitfuncs
func(*targs, **kargs)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/GSLab_Tools-3.0.4-py2.7.egg/gslab_scons/log.py", line 43, in end_log
builder_logs = collect_builder_logs(parent_dir)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/GSLab_Tools-3.0.4-py2.7.egg/gslab_scons/log.py", line 79, in collect_builder_logs
builder_log_end_time = datetime.strptime(s, "%Y-%m-%d %H:%M:%S")
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/_strptime.py", line 325, in _strptime
(data_string, format))
ValueError: time data 'Log completed: 2017-02-06 14:31:02' does not match format '%Y-%m-%d %H:%M:%S'Figured out the issue. the new logger expects |
|
@arosenbe , your log files do not have the timestamp as specified, probably because you had a build folder. This release is backward incompatible due to the fact that these logs must have timestamps. |
|
Nice work @stanfordquan; good to merge. |
|
For We decided that it's good practice to direct these changes into a "dev_*" branch, where * is a new "major release" number in the semantic versioning. This seems to be what big projects do (e.g. here and here): directing issues into big milestones that are backward incompatible. This way we don't have to make backward incompatible changes in this repo and |
|
@stanfordquan I'm happy to defer to @gentzkow on that but it sounds reasonable to me. |
* Pull request for #28: gslab_scons restructure (#47) * #28 modify check_code_extension() * #28 make distinction between `target_dir` and `log_dir` for all but `build_stata()`, add code flow comments in builder * #28 minor change, `return None` switched to `return False` for logical check * #28 add informative BadExecutableError messages * #28 merging remote to local * #28 add single log file + appending mechanism * #28 refactor build_stata() * #28 add formatting for logs * #28 fix typo in docstring * #28 update error messages and tests * #28 add mode check in start_log() * #28 change formatting in master tests to match work in this task * #47 dev commit to work from home * #47 rework logging different appending mechanism from builder logs to sconstruct.log * #28 respond to PR * #28 make pretty * #28 fix lyx pdf issue * #28 respond to more comments * #28 fix typo * Pull request for #59: Fix size warnings (#60) * #59: Redesigned size testing * #59: debugging * #59: additional work on list_ignored_files() * #59: Additional testing progress * #59: Brought tests to 100% coverage * change * #60: responded to PR * 60: response to PR * #59: brought size_warning tests to 100% coverage * #59: removed sys.exit() and raw_input() from size warning function * #59: scons debrief function * #59: altered size warnings * #59: small revisions to size warnings * #60: misc changes * Add note about sudo * Update README.md * Pull request for #43 in Template: Configuration tests (#66) * Migrates configuration tests * Preliminary tests setup * Additional tests * More edits * Adds more tests * Adds more tests * Adds more tests * Adds more tests * Adds more unit tests * Adds more unit tests * Fixes load_yaml_value tests * Dev commit * Dev commit * Brings coverage to 100 percent * Changes name * Edits tests for dropping --sf command line call * Changes name back * Fixes name in function * #66 Implements prelim PR comments * #66 Other PR comments * #66: Standardised exception classes * #66: updated some tests * #66: misc changes to tests * #66: some more changes * #66: changed error message * #66: removed extra newline from yaml creation * #66: aesthetic changes * #66: updated version number to 4 * Pull request for gslab-econ/template#45: Standardise stata_executable (#67) * gslab-econ/template#45 Changed to 'stata_executable' * Fixed a comment... * #67: Responses to PR * #67: Response to PR * Pull request for #58: Prepare for 4.0 (#65) * #58 cosmetic changes * Issue 58 - implement comments to builders 1) Fix R builder command line argument issue. Will test if issue persists for other builders. 2) Time-log failure is now kinder (does not block completion of SConstruct, just raise a noisy warning) * #58 small changes during merge * Pull request for #69: Update tablefill builder with 4.0 logging machinery (#70) * #69 add logging functionalities * #69 decrease clutter in sconscript.log * #69 prettify output * Pull request for #75: Print traceback for python builder (#76) * #75 cat \nTraceback to error message in Python builder * #75 remove new line * #75 fix crashing log.py if sconscript.log is empty * Pull request for #68: Remove Google Drive from default release (#72) * issue #68 - Fix broken imports in test - Move `read_yaml_value` to `misc` from `config` - Fix appropriate tests broken because of above move * #68 accommodate atexit implementation of state_of_repo * #68 fix numargs * #68 - instead of _ for user-config.yaml * #68 fix regex error in _release_tools * #68 respond to PR * #68 move check_and_expand_cache_path to misc.py, rename check_and_expand_path * #68 response to PR about release paths * #68 remove prompt for github token * #68 respond to PR * #68 add logical gate to use getpass.getpass for github_token * Pull request for #78: Updates readme (#79) * #78 Updates readme * Update README.md * Pull request for #73: Allows tablefill to use tex documents (#82) * #73 Preliminary implementation * #73 Updates info * #73 add newline after log for easy viewing * Pull request for #77: Replace <*> with <source file name> in <sconscript_*.log> (#80) * #77 add optional log extension through way of log_ext in ENV * #77 revert matlab reordering * #77 fix GitKraken merge typos * 77 add newline back in tablefill logging * #77 fix unittests * #77 respond to PR * #77 remove stray backslash * #77 fix broken unittests that allow dumb input as env * #83 add coverage to dependencies (#84) * Pull request for #85: Find way to speed up log.py (#87) * #85 log search with CL tool * #85 new log search works for mac * #85 get logging working on windows * #87 #85 var named cmd -> command, filter + map removes null paths, new warning msg * Update log.py * Update README.md * Update README.md * Update release.py * #88 format release instructions * #88 remove repeated word in readme

Changelog:
check_code_extension
Done
target_file
Done
informative error message
Done
build_tables syntax
Done.
restructure build_stata
Implemented.
start_log mode = "develop, cache, release"
I implemented this, but it's not immediately obvious to me why we want to do this, since this is code related to the GSLab template, whereas this package is supposed to contain only general-purpose builders. I feel the same about the
check_lfs()that we do at beginning of log files.Change is implemented in both
templateandgslab_scons, and it will break compatibility if users updategslab_sconswithout updatingtemplatestarting this commit.tracks build time
Right now, tracking build time is the builder's job. It writes to stdout (and thus also the log) at the start and the end the machine's current time.
unit test for release.py
done in Create release.py #33
cache reconfig
I read the scons-configure-cache.py and relevant man pages. The use of this file seems to be reconfiguring the cache so that the subdirectory's prefix have different lengths (e.g. 2: 00 0A 0B, 3: 000 00A 00B...). That's the only information the config file in the cache repo holds, for what it's worth. I did not implement this suggestion but we should talk more about what this is for since I may not be up to page on this.
env input
As of now, my solution is to not edit the gslab_python but change the way we specify source code vs. input asset in the SConscripts themselves. I see this as the most clear solution. Runnerup solution is:
where the builder definition in SConstruct is modified so that it runs
but this requires frontloading the Builders definitions with more scripting that isn't necessarily cleaner than my proposed solution here. Either way, there's no gslab_python code change, just convention changes in template. I could have restrict "source" to be single-element string but that would be backward incompatible which I think is unnecessary at this point for our lab and collaborators.
Log files are now copied and appended into the overall SConstruct log in the following order:
SConstruct log also no longer overwrite with each build but rather append. I do not yet delete the SConscript.log files, which have the same behaviour as before: overwrite with each build. I figure it'll make it easier to debug specific logs, but I'm happy to simply delete the SConscript.log file after build.
I have some concerns about this implementation, in particular: accumulating records may introduce significant merge conflicts. Branches A and B from Master will have different log files after merging in. Even with timestamps, it'll make things confusing. I think that Git history tool or blame tool are appropriate for this use of searching the most recent run of a particular build, or printing out SConsign file, rather than this implementation.