New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Share code for %pycat and %loadpy, make %pycat aware of URLs #1606
Conversation
@@ -3293,22 +3294,54 @@ def magic_bookmark(self, parameter_s=''): | |||
bkms[args[0]] = args[1] | |||
self.db['bookmarks'] = bkms | |||
|
|||
def _get_file_or_url(self, parameter_s=''): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably be in utils - perhaps in openpy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'll do it next week.
As I mentioned in #1110, I'm leaning towards loadpy becoming simply load, and not checking any language at all, simply being used to load a url/file into the working environment verbatim. |
Done, |
|
||
try : | ||
pyfile = get_py_filename(target) | ||
return open(pyfile, "r").read() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May as well use openpy
here as well.
Does anyone object to renaming |
None from me, as long as the alias remains. |
Certainly, the alias would be there for a long time. |
added the safety limits for %load, at 1600 caracters, prompting the user if it really want to load it, and a flag to force load for frontend with raw_input not yet implemented .... |
if os.path.isfile(target): # Read file | ||
return open(target, "r").read() | ||
return openpy.read_py_url(target, skip_encoding_cookie=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
read_py_file
when it's for local files (Unix-style paths seem to work as URLs, but Windows paths don't). read_py_file
is also slightly more efficient.
1600 characters seems awfully short t prompt a warning - I would think it should be at least twice that, maybe 5k? |
|
||
This magic command can either take a local filename or a url:: | ||
where source can be a filename,URL, history patern or macro |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
space before URL, and 'pattern'. Although 'history pattern' isn't very clear, I haven't come up with a concise term for the history mini-language. Maybe 'input history range'?
Is a warning needed at all? If you decide it was a bad idea, it's easy The only place where it could be ugly is the terminal, where hitting |
-y : Don't ask confirmation for loading source above 1600 caracters. | ||
Usefull for Frontend not supporting `raw_input` | ||
|
||
This magic command can either take a local filename, an url, an history |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Grammar question - to me 'a URL' reads more naturally than 'an URL', because I think of it as spelling yew-ar-el, and I would refer to 'a yew tree'.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you're the native speaker...
When we aren't restricting to Python files, the most common case I would expect that we actually want catching would be a large file that may not even be Python at all. Some text editors give warnings at a few MB because they will start to have memory issues. It seems sensible to have a similar warning, but the bar should be quite high (maybe 32k). |
# so in average, more than 40 lines | ||
# this might be made configurable | ||
if l > 1600 and 'y' not in opts: | ||
ans = raw_input("The text you'r trying to load seem pretty big ( %d characters). Continue (y/[N])?" % l ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a function IPython.utils.io.ask_yes_no()
to do this in a consistent way. But remember that the notebook doesn't provide access to stdin, so it will throw a StdinNotImplementedError
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, cool i'll update magic_save too then.
That makes sense, but yes, I'd set it quite high. Some of the |
Yes, I think we want to have space for at least 30-40 80 char lines by default. On Wed, Apr 18, 2012 at 10:30 AM, Min RK
Brian E. Granger |
Now that the notebook can split cells, I imagine people might well |
Agreed. 32k = 400 dense, 800 likely. There's no need to warn until actual adverse effects are probable, and that certainly doesn't happen before 100 lines. |
well, I was thinking that the limit would not apply to the notebook as Stdin is not implemented, and that for longer than 100 lines of codes, %edit / %run seem to be a litte more appropriate than %load. So up to 200k (2500x80 caracters lines), bigest ipython file seem to be (148792 caracters : IPython/external/pyparsing/_pyparsing.py) I also use ask_yes_no, catch StdinNotImplemented and skip warning. Code update soon, but internet connexion comes and goes... |
BTW, I agree with @minrk in that the idea of a safety should really just apply to the accidental case of somebody loading by error a binary image, (because they misclick a tab completion, for example). It's kind of like readline, which asks you if you really want to see all completions when it goes beyond a hundred or so. Thinking more about it, perhaps we should never trigger the warning with a .py file, and only do it for files with other extensions bigger than say a few megabytes. Someone could legitimately want to load a giant .py script to break it up into cells, which is now quite easy to do (as we have proper cell split/merge capabilities). Does that sound like a good idea? |
skipping .py sounds reasonable, but the warning line should probably be a bit lower than a few MB (think of a remote notebook - 1MB will take quite a while). I think ~100 or so kB is still high enough that it will not catch much of anything it shouldn't. I'd say 1MB is an upper limit on what we should consider. |
I think behaving differently for What might make sense is a separate check, similar to what |
I'm OK going for uniformity of behavior regarding file extensions, and 100kb looks like a good cutoff. @Carreau, if you want to implement the heuristic for binary data suggested by @takluyver, go for it, but it's not necessary. A 100kb cutoff on file size and a flag to let the user override it is enough. |
strip_encoding_cookie seem to crash on some binary file. (Unicode decode error) |
I'm not quite sure how best to handle binary files. The Catching binary files with the heuristic I described earlier is also not 100% foolproof. If someone loads a non-Python text file in say, cp1252, it likely won't have a PEP 263 encoding declaration (whereas a Python file would be invalid without one). So it will default to UTF-8 and fail to decode. Semantically, |
as: ranges of input history (see %history for syntax), a filename, or | ||
an expression evaluating to a string or Macro in the user namespace. | ||
as: ranges of input history (see %history for syntax), url, | ||
correspnding .py file,filename, or an expression evaluating to a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
corresponding. Space before filename.
also remove unused import from openpy
retry to open file/url without stripping encodding cookie if it fails with UnicodeDecodingError
I've tried to fallback on raw data when UnicodeError is raised, but I'm not quite sure how to handle file/url that don't decode well. |
That's not going to work as you expect on Python 3 - Also, I don't think we should fallback to bytes. Trying to mix byte strings and unicode is a recipe for problems, and that was a major reason Python 3 was made at all. If we're working with text, it should always be unicode. The fallback here should be to use the latin1 codec, which will decode any set of bytes. We can do that with |
What is the status of this PR? |
buffer = BytesIO(response.read()) | ||
text = TextIOWrapper(buffer, 'latin1', errors='replace', line_buffering=True) | ||
text.mode = 'r' | ||
return text.read() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using BytesIO
and TextIOWrapper
here is overkill - it's easier just to return response.read().decode('latin1', errors='replace')
.
In openpy
we use BytesIO so we can sniff the encoding before we decode it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Though I think errors='replace'
is redundant - latin1 should be able to decode any bytes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
Apart from those couple of minor clean-ups, I think it's OK. |
Test results for commit f9f2400 merged into master
Not available for testing: python2.6, python3.1 |
except: | ||
from IPython.utils.nested_context import nested | ||
import urllib | ||
from io import BytesIO,TextIOWrapper |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this import is no longer necessary.
Test results for commit 195949e merged into master
Not available for testing: python2.6, python3.1 |
I don't know quite why the test is failing for 3.2, but I can replicate it on master as well, so it doesn't seem to be related to this. |
Thanks, Matthias. This all looks fine now. I'll leave it for a few hours in case the devs across the pond want to take a look at it, but if they don't object, we'll get it merged. |
Quick question: is it true that loadpy will still check for a .py extension? I thought we'd said we wanted to abandon that (to allow loading of arbitrary scripts)... |
I don't think so - the diff shows the code that checked that removed. |
OK, thanks! Then, don't worry about me: I'm locked in the epic battle of #1732 and won't be able to look at this until I come out of it, so move along without me. Thanks a lot for the review work, @takluyver !! |
Share code for %pycat and %loadpy, make %pycat aware of URLs
OK, I've merged this now. Thanks again, @Carreau. |
Share code for %pycat and %loadpy, make %pycat aware of URLs
Restore loadpy to load. closes ipython#1783, just the part of ipython#1606 eaten by ipython#1732, where some code was accidentally removed.
this make %pycat aware of URLs and %loadpy able to load file with the user not
having to write the py extension.
%loadpy is still verifying that the loaded file as a
.py extension