Support extraction by filename as well as directory #324
Conversation
Hi, can you fix the issues pointed out by the gitmate bot and refactor your stuff into atomic commits as described in http://coala.readthedocs.org/en/latest/Getting_Involved/Writing_Good_Commits/ ? |
Thanks for submitting :) |
Current coverage is
|
Appeased the LineLengthBot with Yelp's preferred syntax (which is vertically long, but eh this line is super-awkward regardless). I'll refactor my commits later today or tomorrow. (I started this change on Yelp's old fork of babel, so this version was more of a copy/paste job. But I'll make the history make some sense.) |
@ENuge cool, ping us when it's done. |
strip_comment_tags=self.strip_comments | ||
) | ||
else: | ||
extracted = check_and_call_extract_file(path, method_map, |
akx
Jan 12, 2016
Member
Could you use the same indent style as in the extract_from_dir
call above? :)
Could you use the same indent style as in the extract_from_dir
call above? :)
ENuge
Jan 13, 2016
Author
Contributor
Yup! Done.
Yup! Done.
if os.path.isdir(path): | ||
filepath = os.path.normpath(os.path.join(path, filename)) | ||
else: | ||
filepath = filename |
akx
Jan 12, 2016
Member
Add a comment like # already normalized
Add a comment like # already normalized
options = odict | ||
if callback: | ||
callback(filename, method, options) | ||
for lineno, message, comments, context in \ |
akx
Jan 12, 2016
Member
Maybe don't extract the tuple here?
for message_tuple in extract...:
yield (filename,) + message_tuple
Maybe don't extract the tuple here?
for message_tuple in extract...:
yield (filename,) + message_tuple
filename = relpath(filepath, dirpath) | ||
|
||
for pattern, method in method_map: | ||
if pathmatch(pattern, filename): |
akx
Jan 12, 2016
Member
Would be better to invert this if (avoids over-indentation):
if not pathmatch(pattern, filename):
continue
options = ....
Would be better to invert this if (avoids over-indentation):
if not pathmatch(pattern, filename):
continue
options = ....
:param dirpath: the path to the directory to extract messages from. | ||
""" | ||
if dirpath is None: | ||
dirpath = os.getcwd() |
akx
Jan 12, 2016
Member
This should not be done in this function; it's fine to just do it in the caller.
EDIT: To expand on what I mean, this function is too low-level for it to be reading facts from the execution environment (the current working directory in this case); it's better to have the callers explicitly pass this in.
This should not be done in this function; it's fine to just do it in the caller.
EDIT: To expand on what I mean, this function is too low-level for it to be reading facts from the execution environment (the current working directory in this case); it's better to have the callers explicitly pass this in.
ENuge
Jan 13, 2016
Author
Contributor
fwiw extract_from_dir(..) above it does the same check. Though I'm fine with having the caller explicitly pass it in.
fwiw extract_from_dir(..) above it does the same check. Though I'm fine with having the caller explicitly pass it in.
and include in the results | ||
:param strip_comment_tags: a flag that if set to `True` causes all comment | ||
tags to be removed from the collected comments. | ||
:param dirpath: the path to the directory to extract messages from. |
akx
Jan 12, 2016
Member
Can you expand a little on this -- namely that this is used to calculate the relative names of files?
Can you expand a little on this -- namely that this is used to calculate the relative names of files?
|
||
for path in self.input_paths: | ||
if not os.path.isdir(path) and not os.path.isfile(path): | ||
raise DistutilsOptionError("Input path: %s is not a file or directory" % path) |
akx
Jan 12, 2016
Member
Is this even possible? :D
Is this even possible? :D
ENuge
Jan 13, 2016
Author
Contributor
Heh, this is meant to be checking if the path exists. I didn't really think about the much more direct way of doing that...namely, using os.path.exists(..). Will update.
Heh, this is meant to be checking if the path exists. I didn't really think about the much more direct way of doing that...namely, using os.path.exists(..). Will update.
break | ||
filepath = os.path.join(root, filename).replace(os.sep, '/') | ||
|
||
for ( |
akx
Jan 12, 2016
Member
Don't expand the tuple here either.
for message_tuple in check...:
yield message_tuple
is fine and dandy!
Don't expand the tuple here either.
for message_tuple in check...:
yield message_tuple
is fine and dandy!
ENuge
Jan 13, 2016
Author
Contributor
Cool, I'll add a comment/docstring line with the return type or something to make up for the loss in explicitness.
EDIT: The docstring already does that! \o/
Cool, I'll add a comment/docstring line with the return type or something to make up for the loss in explicitness.
EDIT: The docstring already does that! \o/
And aside from the handful of comments I had, great work @ENuge ! :)
EDIT: Wagh, I'm a dummy who can't scroll down. |
@ENuge: Looking good! Please squash all of the commits into one though :) |
Interesting, travis errored out on mac only for pypy 2.6, maybe a nondeterministic bug? |
@sils1297 Heh, yes... race condition against the clock, it seems. If the wallclock ticks forward a second while those tests are being run, they may fail. We should probably freeze time for the duration of those tests or maybe compare the outputs ignoring datetime differences. |
79ddbf4
to
4b0d4c5
One can now supply a filename or a directory to be extracted. For large codebases, this allows the consumer to optimize their string extraction process by, for instance, only supplying the files that have actually been changed on the given dev's branch compared to master. Relates to #253 . I don't want to say "fixes", but makes further optimization unnecessary for most use cases.
4b0d4c5
to
19957e2
Super nice work @ENuge! I'm gonna merge this, but come to think of it, we should add an alias for |
Support extraction by filename as well as directory
459d30f
into
python-babel:master
Reason for the change: #253 . This obviates the need for parallelizing to make things faster (on the consumer's side, we only scan files that are different in a given branch compared to origin's master, which brings scan time down from ~4min 30s to ~5-20s, depending on how old one's branch is).
Code changes: moved half of the logic from extract_from_dir into its own function. Then added a few "is this a path or directory?" type checks to frontend.py to use the right thing. Those checks are kind of ugly right now but I couldn't think of a more elegant way of doing it. Added/rejiggered a couple of tests.