Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Share code for %pycat and %loadpy, make %pycat aware of URLs #1606

Merged
merged 21 commits into from May 24, 2012

Conversation

Carreau
Copy link
Member

@Carreau Carreau commented Apr 15, 2012

this make %pycat aware of URLs and %loadpy able to load file with the user not
having to write the py extension.

%loadpy is still verifying that the loaded file as a
.py extension

@@ -3293,22 +3294,54 @@ def magic_bookmark(self, parameter_s=''):
bkms[args[0]] = args[1]
self.db['bookmarks'] = bkms

def _get_file_or_url(self, parameter_s=''):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be in utils - perhaps in openpy.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'll do it next week.

@fperez
Copy link
Member

fperez commented Apr 16, 2012

As I mentioned in #1110, I'm leaning towards loadpy becoming simply load, and not checking any language at all, simply being used to load a url/file into the working environment verbatim.

@Carreau
Copy link
Member Author

Carreau commented Apr 17, 2012

Done,
renamed %loadpy to %load kept an alias, %loadpy and %pycat now share code throught find_user_code.


try :
pyfile = get_py_filename(target)
return open(pyfile, "r").read()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May as well use openpy here as well.

@fperez
Copy link
Member

fperez commented Apr 18, 2012

Does anyone object to renaming %loadpy to %load and letting it handle any language? See my justification here. @ellisonbg, @minrk, any objection?

@minrk
Copy link
Member

minrk commented Apr 18, 2012

None from me, as long as the alias remains.

@fperez
Copy link
Member

fperez commented Apr 18, 2012

Certainly, the alias would be there for a long time.

@Carreau
Copy link
Member Author

Carreau commented Apr 18, 2012

added the safety limits for %load, at 1600 caracters, prompting the user if it really want to load it, and a flag to force load for frontend with raw_input not yet implemented ....

if os.path.isfile(target): # Read file
return open(target, "r").read()
return openpy.read_py_url(target, skip_encoding_cookie=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

read_py_file when it's for local files (Unix-style paths seem to work as URLs, but Windows paths don't). read_py_file is also slightly more efficient.

@minrk
Copy link
Member

minrk commented Apr 18, 2012

1600 characters seems awfully short t prompt a warning - I would think it should be at least twice that, maybe 5k?


This magic command can either take a local filename or a url::
where source can be a filename,URL, history patern or macro
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

space before URL, and 'pattern'. Although 'history pattern' isn't very clear, I haven't come up with a concise term for the history mini-language. Maybe 'input history range'?

@takluyver
Copy link
Member

Is a warning needed at all? If you decide it was a bad idea, it's easy
to get rid of the cell in the Qt console or the notebook.

The only place where it could be ugly is the terminal, where hitting
Ctrl+C will give you a fresh prompt, but leave all the previous
content just above. But %loadpy isn't intended so much for terminal
use anyway.

-y : Don't ask confirmation for loading source above 1600 caracters.
Usefull for Frontend not supporting `raw_input`

This magic command can either take a local filename, an url, an history
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grammar question - to me 'a URL' reads more naturally than 'an URL', because I think of it as spelling yew-ar-el, and I would refer to 'a yew tree'.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're the native speaker...

@minrk
Copy link
Member

minrk commented Apr 18, 2012

Is a warning needed at all? If you decide it was a bad idea, it's easy to get rid of the cell in the Qt console or the notebook.

When we aren't restricting to Python files, the most common case I would expect that we actually want catching would be a large file that may not even be Python at all. Some text editors give warnings at a few MB because they will start to have memory issues. It seems sensible to have a similar warning, but the bar should be quite high (maybe 32k).

# so in average, more than 40 lines
# this might be made configurable
if l > 1600 and 'y' not in opts:
ans = raw_input("The text you'r trying to load seem pretty big ( %d characters). Continue (y/[N])?" % l )
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a function IPython.utils.io.ask_yes_no() to do this in a consistent way. But remember that the notebook doesn't provide access to stdin, so it will throw a StdinNotImplementedError.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, cool i'll update magic_save too then.

@takluyver
Copy link
Member

Some text editors give warnings at a few MB because they will start to have memory issues.  It seems sensible to have a similar warning, but the bar should be quite high (maybe 32k).

That makes sense, but yes, I'd set it quite high. Some of the
matplotlib gallery examples are 1200-1300 characters, so it should
definitely be well clear of that.

@ellisonbg
Copy link
Member

Yes, I think we want to have space for at least 30-40 80 char lines by default.

On Wed, Apr 18, 2012 at 10:30 AM, Min RK
reply@reply.github.com
wrote:

1600 characters seems awfully short t prompt a warning - I would think it should be at least twice that, maybe 5k?


Reply to this email directly or view it on GitHub:
#1606 (comment)

Brian E. Granger
Cal Poly State University, San Luis Obispo
bgranger@calpoly.edu and ellisonbg@gmail.com

@takluyver
Copy link
Member

Yes, I think we want to have space for at least 30-40 80 char lines by default.

Now that the notebook can split cells, I imagine people might well
%load a long script and then break it up, so I'd say a few hundred
lines is reasonable.

@minrk
Copy link
Member

minrk commented Apr 18, 2012

I'd say a few hundred lines is reasonable.

Agreed. 32k = 400 dense, 800 likely.

There's no need to warn until actual adverse effects are probable, and that certainly doesn't happen before 100 lines.

@Carreau
Copy link
Member Author

Carreau commented Apr 18, 2012

well, I was thinking that the limit would not apply to the notebook as Stdin is not implemented, and that for longer than 100 lines of codes, %edit / %run seem to be a litte more appropriate than %load.

So up to 200k (2500x80 caracters lines), bigest ipython file seem to be (148792 caracters : IPython/external/pyparsing/_pyparsing.py)

I also use ask_yes_no, catch StdinNotImplemented and skip warning.

Code update soon, but internet connexion comes and goes...

@fperez
Copy link
Member

fperez commented Apr 18, 2012

BTW, I agree with @minrk in that the idea of a safety should really just apply to the accidental case of somebody loading by error a binary image, (because they misclick a tab completion, for example). It's kind of like readline, which asks you if you really want to see all completions when it goes beyond a hundred or so.

Thinking more about it, perhaps we should never trigger the warning with a .py file, and only do it for files with other extensions bigger than say a few megabytes. Someone could legitimately want to load a giant .py script to break it up into cells, which is now quite easy to do (as we have proper cell split/merge capabilities).

Does that sound like a good idea?

@minrk
Copy link
Member

minrk commented Apr 18, 2012

skipping .py sounds reasonable, but the warning line should probably be a bit lower than a few MB (think of a remote notebook - 1MB will take quite a while). I think ~100 or so kB is still high enough that it will not catch much of anything it shouldn't. I'd say 1MB is an upper limit on what we should consider.

@takluyver
Copy link
Member

I think behaving differently for .py files would be a bit surprising, since you can have Python scripts without the extension.

What might make sense is a separate check, similar to what less does, if you try to %load a binary file (as in, non text). I think it's a pretty simple heuristic - something like grab the first 200 bytes and see what proportion are > 127.

@fperez
Copy link
Member

fperez commented Apr 18, 2012

I'm OK going for uniformity of behavior regarding file extensions, and 100kb looks like a good cutoff. @Carreau, if you want to implement the heuristic for binary data suggested by @takluyver, go for it, but it's not necessary. A 100kb cutoff on file size and a flag to let the user override it is enough.

@Carreau
Copy link
Member Author

Carreau commented Apr 19, 2012

strip_encoding_cookie seem to crash on some binary file. (Unicode decode error)
should I catch it and prevent stripping in this case at this level ?

@takluyver
Copy link
Member

I'm not quite sure how best to handle binary files. The openpy machinery is all written with the assumption that it will only handle Python files, and it defaults to UTF-8 if it can't find an encoding declaration (per PEP 3120). Just skipping strip_encoding_cookie won't help, because it will still be attempting to decode the file into a unicode string, which doesn't make any sense for binary files.

Catching binary files with the heuristic I described earlier is also not 100% foolproof. If someone loads a non-Python text file in say, cp1252, it likely won't have a PEP 263 encoding declaration (whereas a Python file would be invalid without one). So it will default to UTF-8 and fail to decode.

Semantically, find_user_code should only deal with Python code. So perhaps the decoding error should propagate all the way up to magic_load, which would fall back to attempting to read the file/url with cp1252 (which AFAIK can't error on decoding).

as: ranges of input history (see %history for syntax), a filename, or
an expression evaluating to a string or Macro in the user namespace.
as: ranges of input history (see %history for syntax), url,
correspnding .py file,filename, or an expression evaluating to a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

corresponding. Space before filename.

@Carreau
Copy link
Member Author

Carreau commented May 11, 2012

I've tried to fallback on raw data when UnicodeError is raised, but I'm not quite sure how to handle file/url that don't decode well.
I've also added the py_only flag not to try other methods.

@takluyver
Copy link
Member

That's not going to work as you expect on Python 3 - open(target) will default to text mode files, with utf-8 on most Linux/Mac systems, so you'll get the same error again.

Also, I don't think we should fallback to bytes. Trying to mix byte strings and unicode is a recipe for problems, and that was a major reason Python 3 was made at all. If we're working with text, it should always be unicode.

The fallback here should be to use the latin1 codec, which will decode any set of bytes. We can do that with io.open() for the local files, and an explicit decode() for URLs.

@ellisonbg
Copy link
Member

What is the status of this PR?

buffer = BytesIO(response.read())
text = TextIOWrapper(buffer, 'latin1', errors='replace', line_buffering=True)
text.mode = 'r'
return text.read()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using BytesIO and TextIOWrapper here is overkill - it's easier just to return response.read().decode('latin1', errors='replace').

In openpy we use BytesIO so we can sniff the encoding before we decode it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Though I think errors='replace' is redundant - latin1 should be able to decode any bytes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

@takluyver
Copy link
Member

Apart from those couple of minor clean-ups, I think it's OK.

@takluyver
Copy link
Member

Test results for commit f9f2400 merged into master
Platform: linux2

  • python2.7: OK (libraries not available: matplotlib pymongo wx wx.aui)
  • python3.2: Failed, log at https://gist.github.com/2774196 (libraries not available: pymongo wx wx.aui)

Not available for testing: python2.6, python3.1

except:
from IPython.utils.nested_context import nested
import urllib
from io import BytesIO,TextIOWrapper
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this import is no longer necessary.

@takluyver
Copy link
Member

Test results for commit 195949e merged into master
Platform: linux2

  • python2.7: OK (libraries not available: matplotlib pymongo wx wx.aui)
  • python3.2: Failed, log at https://gist.github.com/2774385 (libraries not available: pymongo wx wx.aui)

Not available for testing: python2.6, python3.1

@takluyver
Copy link
Member

I don't know quite why the test is failing for 3.2, but I can replicate it on master as well, so it doesn't seem to be related to this.

@takluyver
Copy link
Member

Thanks, Matthias. This all looks fine now. I'll leave it for a few hours in case the devs across the pond want to take a look at it, but if they don't object, we'll get it merged.

@fperez
Copy link
Member

fperez commented May 23, 2012

Quick question: is it true that loadpy will still check for a .py extension? I thought we'd said we wanted to abandon that (to allow loading of arbitrary scripts)...

@takluyver
Copy link
Member

I don't think so - the diff shows the code that checked that removed.

@fperez
Copy link
Member

fperez commented May 23, 2012

OK, thanks! Then, don't worry about me: I'm locked in the epic battle of #1732 and won't be able to look at this until I come out of it, so move along without me. Thanks a lot for the review work, @takluyver !!

takluyver added a commit that referenced this pull request May 24, 2012
Share code for %pycat and %loadpy, make %pycat aware of URLs
@takluyver takluyver merged commit 24a3c00 into ipython:master May 24, 2012
@takluyver
Copy link
Member

OK, I've merged this now. Thanks again, @Carreau.

@Carreau Carreau mentioned this pull request May 29, 2012
fperez added a commit that referenced this pull request May 29, 2012
Restore loadpy to load. 

closes #1783, just the part of #1606 eaten by #1732, where some code was accidentally removed.
mattvonrocketstein pushed a commit to mattvonrocketstein/ipython that referenced this pull request Nov 3, 2014
Share code for %pycat and %loadpy, make %pycat aware of URLs
mattvonrocketstein pushed a commit to mattvonrocketstein/ipython that referenced this pull request Nov 3, 2014
Restore loadpy to load. 

closes ipython#1783, just the part of ipython#1606 eaten by ipython#1732, where some code was accidentally removed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants