-
-
Notifications
You must be signed in to change notification settings - Fork 18.8k
Allow read_csv to take URLs #970
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
On Python 3, the HTTP response produces bytes, but the parser implementation expects strings (i.e. unicode). I think the correct way would be to wrap the response in an io.TextIOWrapper, using the encoding passed to |
what about numpy.compat.asstr or similar? io doesn't exist in python 2.5 |
|
Is this what you have in mind? |
Yes, that looks sensible, although I haven't tested it yet. I'd also specify |
Hmm. Who guesses the encoding? The user? Or is there some BOM checking somewhere? If the user, it seems to me I'd want to know if I'm not working with the encoding I think I am. Either way, you're probably going to have to either fix your text or pass another encoding. Replace makes it such that you wouldn't discover this until later right? |
The user has the option of doing so, but I guess most of the time they're going to leave it at the default, which I believe follows the encoding specified by locale (UTF-8 for Mac & most Linux, a particular code page for Windows). In general, I'd agree with failing early and loudly, but with encoding, either it's a tiny detail, and it's a pain to have to keep guessing at encodings when you don't really care, or the text will be obviously gibberish if you get it wrong. I think for opening files, we use a compromise - if the user specifies an encoding, we use |
Added the error handling. |
Sorry, having tested it, it turns out that |
Thanks. Updated the PR. |
Great, then this is alright as far as I'm concerned, so I'll ping @wesm and @adamklein to look at it. |
Yeah, I'll have a look. |
Rebased on master and force pushed. Should be okay now as long as it doesn't screw up @takluyver |
thanks dude. everything looks OK (haven't run tests on py3 yet but will soon) |
might want to add some logic (at soem point) to skip the url test in some cases, but maybe no big deal. i pointed it at pydata/pandas now |
This allows read_csv to take URLs. The tests are going to need to be modified after it's merged to point the the new repo URL or if you want to host a test file somewhere else. I also have no idea what the file:// path should be on non-posix systems, so this path might need some adjustment. Not sure. There's no test case for ftp, but I don't see why it wouldn't work the same as http.