Share code for %pycat and %loadpy, make %pycat aware of URLs #1606

Carreau · 2012-04-15T16:22:56Z

this make %pycat aware of URLs and %loadpy able to load file with the user not
having to write the py extension.

%loadpy is still verifying that the loaded file as a
.py extension

takluyver · 2012-04-15T16:45:12Z

IPython/core/magic.py

@@ -3293,22 +3294,54 @@ def magic_bookmark(self, parameter_s=''):
                bkms[args[0]] = args[1]
        self.db['bookmarks'] = bkms

+    def _get_file_or_url(self, parameter_s=''):


This should probably be in utils - perhaps in openpy.

i'll do it next week.

fperez · 2012-04-16T01:19:59Z

As I mentioned in #1110, I'm leaning towards loadpy becoming simply load, and not checking any language at all, simply being used to load a url/file into the working environment verbatim.

Carreau · 2012-04-17T18:40:31Z

Done,
renamed %loadpy to %load kept an alias, %loadpy and %pycat now share code throught find_user_code.

takluyver · 2012-04-17T18:51:03Z

IPython/core/interactiveshell.py

+
+        try :
+            pyfile = get_py_filename(target)
+            return open(pyfile, "r").read()


May as well use openpy here as well.

fperez · 2012-04-18T00:21:14Z

Does anyone object to renaming %loadpy to %load and letting it handle any language? See my justification here. @ellisonbg, @minrk, any objection?

minrk · 2012-04-18T01:15:46Z

None from me, as long as the alias remains.

fperez · 2012-04-18T01:16:23Z

Certainly, the alias would be there for a long time.

Carreau · 2012-04-18T17:24:55Z

added the safety limits for %load, at 1600 caracters, prompting the user if it really want to load it, and a flag to force load for frontend with raw_input not yet implemented ....

takluyver · 2012-04-18T17:28:50Z

IPython/core/interactiveshell.py

        if os.path.isfile(target):                        # Read file
-            return open(target, "r").read()
+            return openpy.read_py_url(target, skip_encoding_cookie=True)


read_py_file when it's for local files (Unix-style paths seem to work as URLs, but Windows paths don't). read_py_file is also slightly more efficient.

minrk · 2012-04-18T17:30:37Z

1600 characters seems awfully short t prompt a warning - I would think it should be at least twice that, maybe 5k?

takluyver · 2012-04-18T17:32:28Z

IPython/core/magic.py


-        This magic command can either take a local filename or a url::
+          where source can be a filename,URL, history patern or macro 


space before URL, and 'pattern'. Although 'history pattern' isn't very clear, I haven't come up with a concise term for the history mini-language. Maybe 'input history range'?

takluyver · 2012-04-18T17:36:48Z

Is a warning needed at all? If you decide it was a bad idea, it's easy
to get rid of the cell in the Qt console or the notebook.

The only place where it could be ugly is the terminal, where hitting
Ctrl+C will give you a fresh prompt, but leave all the previous
content just above. But %loadpy isn't intended so much for terminal
use anyway.

takluyver · 2012-04-18T17:38:38Z

IPython/core/magic.py

+          -y : Don't ask confirmation for loading source above 1600 caracters.
+               Usefull for Frontend not supporting `raw_input`
+
+        This magic command can either take a local filename, an url, an history


Grammar question - to me 'a URL' reads more naturally than 'an URL', because I think of it as spelling yew-ar-el, and I would refer to 'a yew tree'.

you're the native speaker...

minrk · 2012-04-18T17:45:59Z

Is a warning needed at all? If you decide it was a bad idea, it's easy to get rid of the cell in the Qt console or the notebook.

When we aren't restricting to Python files, the most common case I would expect that we actually want catching would be a large file that may not even be Python at all. Some text editors give warnings at a few MB because they will start to have memory issues. It seems sensible to have a similar warning, but the bar should be quite high (maybe 32k).

takluyver · 2012-04-18T18:05:43Z

IPython/core/magic.py

+        # so in average, more than 40 lines
+        # this might be made configurable
+        if l > 1600 and 'y' not in opts:
+            ans = raw_input("The text you'r trying to load seem pretty big ( %d characters). Continue (y/[N])?" % l )


We have a function IPython.utils.io.ask_yes_no() to do this in a consistent way. But remember that the notebook doesn't provide access to stdin, so it will throw a StdinNotImplementedError.

Ah, cool i'll update magic_save too then.

takluyver · 2012-04-18T18:06:25Z

Some text editors give warnings at a few MB because they will start to have memory issues. It seems sensible to have a similar warning, but the bar should be quite high (maybe 32k).

That makes sense, but yes, I'd set it quite high. Some of the
matplotlib gallery examples are 1200-1300 characters, so it should
definitely be well clear of that.

ellisonbg · 2012-04-18T18:08:28Z

Yes, I think we want to have space for at least 30-40 80 char lines by default.

On Wed, Apr 18, 2012 at 10:30 AM, Min RK
reply@reply.github.com
wrote:

1600 characters seems awfully short t prompt a warning - I would think it should be at least twice that, maybe 5k?

Reply to this email directly or view it on GitHub:
#1606 (comment)

Brian E. Granger
Cal Poly State University, San Luis Obispo
bgranger@calpoly.edu and ellisonbg@gmail.com

takluyver · 2012-04-18T18:17:59Z

Yes, I think we want to have space for at least 30-40 80 char lines by default.

Now that the notebook can split cells, I imagine people might well
%load a long script and then break it up, so I'd say a few hundred
lines is reasonable.

minrk · 2012-04-18T18:22:32Z

I'd say a few hundred lines is reasonable.

Agreed. 32k = 400 dense, 800 likely.

There's no need to warn until actual adverse effects are probable, and that certainly doesn't happen before 100 lines.

Carreau · 2012-04-18T18:58:52Z

well, I was thinking that the limit would not apply to the notebook as Stdin is not implemented, and that for longer than 100 lines of codes, %edit / %run seem to be a litte more appropriate than %load.

So up to 200k (2500x80 caracters lines), bigest ipython file seem to be (148792 caracters : IPython/external/pyparsing/_pyparsing.py)

I also use ask_yes_no, catch StdinNotImplemented and skip warning.

Code update soon, but internet connexion comes and goes...

fperez · 2012-04-18T20:13:58Z

BTW, I agree with @minrk in that the idea of a safety should really just apply to the accidental case of somebody loading by error a binary image, (because they misclick a tab completion, for example). It's kind of like readline, which asks you if you really want to see all completions when it goes beyond a hundred or so.

Thinking more about it, perhaps we should never trigger the warning with a .py file, and only do it for files with other extensions bigger than say a few megabytes. Someone could legitimately want to load a giant .py script to break it up into cells, which is now quite easy to do (as we have proper cell split/merge capabilities).

Does that sound like a good idea?

minrk · 2012-04-18T20:22:31Z

skipping .py sounds reasonable, but the warning line should probably be a bit lower than a few MB (think of a remote notebook - 1MB will take quite a while). I think ~100 or so kB is still high enough that it will not catch much of anything it shouldn't. I'd say 1MB is an upper limit on what we should consider.

takluyver · 2012-04-18T20:23:50Z

I think behaving differently for .py files would be a bit surprising, since you can have Python scripts without the extension.

What might make sense is a separate check, similar to what less does, if you try to %load a binary file (as in, non text). I think it's a pretty simple heuristic - something like grab the first 200 bytes and see what proportion are > 127.

fperez · 2012-04-18T23:05:55Z

I'm OK going for uniformity of behavior regarding file extensions, and 100kb looks like a good cutoff. @Carreau, if you want to implement the heuristic for binary data suggested by @takluyver, go for it, but it's not necessary. A 100kb cutoff on file size and a flag to let the user override it is enough.

Carreau · 2012-04-19T08:31:05Z

strip_encoding_cookie seem to crash on some binary file. (Unicode decode error)
should I catch it and prevent stripping in this case at this level ?

takluyver · 2012-04-19T10:19:22Z

I'm not quite sure how best to handle binary files. The openpy machinery is all written with the assumption that it will only handle Python files, and it defaults to UTF-8 if it can't find an encoding declaration (per PEP 3120). Just skipping strip_encoding_cookie won't help, because it will still be attempting to decode the file into a unicode string, which doesn't make any sense for binary files.

Catching binary files with the heuristic I described earlier is also not 100% foolproof. If someone loads a non-Python text file in say, cp1252, it likely won't have a PEP 263 encoding declaration (whereas a Python file would be invalid without one). So it will default to UTF-8 and fail to decode.

Semantically, find_user_code should only deal with Python code. So perhaps the decoding error should propagate all the way up to magic_load, which would fall back to attempting to read the file/url with cp1252 (which AFAIK can't error on decoding).

takluyver · 2012-04-19T10:20:14Z

IPython/core/interactiveshell.py

-          as: ranges of input history (see %history for syntax), a filename, or
-          an expression evaluating to a string or Macro in the user namespace.
+          as: ranges of input history (see %history for syntax), url,
+          correspnding .py file,filename, or an expression evaluating to a


corresponding. Space before filename.

also remove unused import from openpy

retry to open file/url without stripping encodding cookie if it fails with UnicodeDecodingError

Carreau · 2012-05-11T23:03:16Z

I've tried to fallback on raw data when UnicodeError is raised, but I'm not quite sure how to handle file/url that don't decode well.
I've also added the py_only flag not to try other methods.

takluyver · 2012-05-11T23:18:58Z

That's not going to work as you expect on Python 3 - open(target) will default to text mode files, with utf-8 on most Linux/Mac systems, so you'll get the same error again.

Also, I don't think we should fallback to bytes. Trying to mix byte strings and unicode is a recipe for problems, and that was a major reason Python 3 was made at all. If we're working with text, it should always be unicode.

The fallback here should be to use the latin1 codec, which will decode any set of bytes. We can do that with io.open() for the local files, and an explicit decode() for URLs.

ellisonbg · 2012-05-23T04:38:44Z

What is the status of this PR?

takluyver · 2012-05-23T09:10:58Z

IPython/core/interactiveshell.py

+                buffer = BytesIO(response.read())
+                text = TextIOWrapper(buffer, 'latin1', errors='replace', line_buffering=True)
+                text.mode = 'r'
+                return text.read()


Using BytesIO and TextIOWrapper here is overkill - it's easier just to return response.read().decode('latin1', errors='replace').

In openpy we use BytesIO so we can sniff the encoding before we decode it.

(Though I think errors='replace' is redundant - latin1 should be able to decode any bytes.

takluyver · 2012-05-23T09:17:58Z

Apart from those couple of minor clean-ups, I think it's OK.

takluyver · 2012-05-23T09:23:39Z

Test results for commit f9f2400 merged into master
Platform: linux2

python2.7: OK (libraries not available: matplotlib pymongo wx wx.aui)
python3.2: Failed, log at https://gist.github.com/2774196 (libraries not available: pymongo wx wx.aui)

Not available for testing: python2.6, python3.1

takluyver · 2012-05-23T10:04:15Z

IPython/core/interactiveshell.py

-except:
-    from IPython.utils.nested_context import nested
+import urllib
+from io import BytesIO,TextIOWrapper


I think this import is no longer necessary.

takluyver · 2012-05-23T10:07:04Z

Test results for commit 195949e merged into master
Platform: linux2

python2.7: OK (libraries not available: matplotlib pymongo wx wx.aui)
python3.2: Failed, log at https://gist.github.com/2774385 (libraries not available: pymongo wx wx.aui)

Not available for testing: python2.6, python3.1

takluyver · 2012-05-23T10:52:14Z

I don't know quite why the test is failing for 3.2, but I can replicate it on master as well, so it doesn't seem to be related to this.

takluyver · 2012-05-23T12:02:43Z

Thanks, Matthias. This all looks fine now. I'll leave it for a few hours in case the devs across the pond want to take a look at it, but if they don't object, we'll get it merged.

fperez · 2012-05-23T18:46:08Z

Quick question: is it true that loadpy will still check for a .py extension? I thought we'd said we wanted to abandon that (to allow loading of arbitrary scripts)...

takluyver · 2012-05-23T19:01:03Z

I don't think so - the diff shows the code that checked that removed.

fperez · 2012-05-23T19:07:12Z

OK, thanks! Then, don't worry about me: I'm locked in the epic battle of #1732 and won't be able to look at this until I come out of it, so move along without me. Thanks a lot for the review work, @takluyver !!

Share code for %pycat and %loadpy, make %pycat aware of URLs

takluyver · 2012-05-24T09:27:05Z

OK, I've merged this now. Thanks again, @Carreau.

Restore loadpy to load. closes #1783, just the part of #1606 eaten by #1732, where some code was accidentally removed.

Share code for %pycat and %loadpy, make %pycat aware of URLs

Restore loadpy to load. closes ipython#1783, just the part of ipython#1606 eaten by ipython#1732, where some code was accidentally removed.

takluyver reviewed Apr 15, 2012
View reviewed changes

takluyver reviewed Apr 17, 2012
View reviewed changes

takluyver reviewed Apr 18, 2012
View reviewed changes

takluyver reviewed Apr 19, 2012
View reviewed changes

Carreau added 5 commits May 11, 2012 23:36

read_py_url-> read_py_file

27b67f2

fix english, typo, remove code that was a test

c3cea33

also remove unused import from openpy

find_user_code improvement for not py-files

8c40bf8

retry to open file/url without stripping encodding cookie if it fails with UnicodeDecodingError

find_user_code binary fallback

ba0a58d

remove unused input according to pylint

4b59773

Carreau added 2 commits May 12, 2012 12:12

fallback on latin1

634dbea

share code for file and py_file

f9f2400

takluyver reviewed May 23, 2012
View reviewed changes

remove TextIOWrapper, use strig mode for load options

195949e

takluyver reviewed May 23, 2012
View reviewed changes

remove unused import

117fa30

takluyver added a commit that referenced this pull request May 24, 2012

Merge pull request #1606 from Carreau/loadpycat

24a3c00

Share code for %pycat and %loadpy, make %pycat aware of URLs

takluyver merged commit 24a3c00 into ipython:master May 24, 2012

Carreau mentioned this pull request May 29, 2012

restore loadpy to load #1784

Merged

fperez added a commit that referenced this pull request May 29, 2012

Merge pull request #1784 from Carreau/loadpycat2

a0ab74e

Restore loadpy to load. closes #1783, just the part of #1606 eaten by #1732, where some code was accidentally removed.

mattvonrocketstein pushed a commit to mattvonrocketstein/ipython that referenced this pull request Nov 3, 2014

Merge pull request ipython#1606 from Carreau/loadpycat

e3bfae8

Share code for %pycat and %loadpy, make %pycat aware of URLs


		This magic command can either take a local filename or a url::
		where source can be a filename,URL, history patern or macro

Share code for %pycat and %loadpy, make %pycat aware of URLs #1606

Share code for %pycat and %loadpy, make %pycat aware of URLs #1606

Conversation

Carreau commented Apr 15, 2012

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fperez commented Apr 16, 2012

Carreau commented Apr 17, 2012

Choose a reason for hiding this comment

fperez commented Apr 18, 2012

minrk commented Apr 18, 2012

fperez commented Apr 18, 2012

Carreau commented Apr 18, 2012

Choose a reason for hiding this comment

minrk commented Apr 18, 2012

Choose a reason for hiding this comment

takluyver commented Apr 18, 2012

Choose a reason for hiding this comment

Choose a reason for hiding this comment

minrk commented Apr 18, 2012

Choose a reason for hiding this comment

Choose a reason for hiding this comment

takluyver commented Apr 18, 2012

ellisonbg commented Apr 18, 2012

takluyver commented Apr 18, 2012

minrk commented Apr 18, 2012

Carreau commented Apr 18, 2012

fperez commented Apr 18, 2012

minrk commented Apr 18, 2012

takluyver commented Apr 18, 2012

fperez commented Apr 18, 2012

Carreau commented Apr 19, 2012

takluyver commented Apr 19, 2012

Choose a reason for hiding this comment

Carreau commented May 11, 2012

takluyver commented May 11, 2012

ellisonbg commented May 23, 2012

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

takluyver commented May 23, 2012

takluyver commented May 23, 2012

Choose a reason for hiding this comment

takluyver commented May 23, 2012

takluyver commented May 23, 2012

takluyver commented May 23, 2012

fperez commented May 23, 2012

takluyver commented May 23, 2012

fperez commented May 23, 2012

takluyver commented May 24, 2012