Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pandas.read_csv throws an abort error #351

Closed
hamilton opened this issue Mar 20, 2019 · 15 comments
Closed

pandas.read_csv throws an abort error #351

hamilton opened this issue Mar 20, 2019 · 15 comments

Comments

@hamilton
Copy link
Contributor

I noticed this error with the following iodide notebook, hitting "run all".

Screen Shot 2019-03-19 at 10 23 00 PM

@mdboom
Copy link
Collaborator

mdboom commented Mar 21, 2019

The first problem is that it's trying to use the Python network stack to download the file, which won't work due to browser sandboxing. That can be fixed with:

import pyodide
raw_data = pd.read_csv(pyodide.open_url(data_url), skiprows=1)

However, behind that is "Blocked loading mixed active content “http://www.sentiweb.fr/datasets/incidence-PAY-3.csv" explained here

@mdboom
Copy link
Collaborator

mdboom commented Mar 21, 2019

Cc: @khinsen

@khinsen
Copy link

khinsen commented Mar 22, 2019

Thanks @hamilton and @mdboom for taking care of a bug before I found the time to look into it!

I guess I need to use a %% fetch section to download my CSV then, right? I have seen that in other examples.

@hamilton
Copy link
Contributor Author

@khinsen yes, a fetch section will do the job. Unfortunately, it looks like sentiweb.fr has a restrictive CORs policy, so your browser cannot directly fetch the csv from it. This is an inconvenience we're brainstorming around at the moment, but in the meantime, we have a way of directly uploading data to our server and then serving from there (see this page for an explanation).

@usingdatascience
Copy link

Hi, have just started using iodide, looks fantastic. Apologies if this is answered already, but i was trying to load a csv file, via pandas/python. Can this be done at the moment ?

@mdboom
Copy link
Collaborator

mdboom commented Mar 22, 2019

Yes, it can be done, but you can't have pandas do the network fetching for you. You have to fetch it using the Web APIs first, then pass that data to pandas.

@mdboom
Copy link
Collaborator

mdboom commented Mar 22, 2019

@usingdatascience : It looks like in your notebook the issue with loading that .csv is CORS. If you open up the browser's development console, you can see:

Blocked loading mixed active content “http://www.usingdatascience.co.uk/wp-content/skills.csv”

This means (in short) that server is not configured to allow pulling that data from a website served at another domain. As @hamilton said, we've been brainstorming a general solution to that problem, but in the meantime, you can download it to your local machine and upload it back to the Iodide server and then use it from there.

@khinsen
Copy link

khinsen commented Mar 22, 2019

@hamilton Thanks, that workaround works. But I get the impression that we will all pay a steep price for the Web not having a consistent security model.

What I couldn't yet figure out is how to access the fetched data from Python. The variable isn't just there, and I can't import it from module js either. Nor can I find anything in the documentation, which only mentions the inverse direction.

@mdboom
Copy link
Collaborator

mdboom commented Mar 22, 2019

@khinsen: Looking at your notebook here, it mostly works for me.

I needed to run the fetch cell first:

%% fetch
text: csv_data=files/incidence-PAY-3.csv

Then the from js import ... works:

%% py
from js import csv_data

I had to make one tweak to load it into pandas. The fetch text type is a Unicode string, but io.BytesIO needs a bytes string. Encoding the Unicode as utf-8 does the trick:

%% py
raw_data = pd.read_csv(io.BytesIO(csv_data.encode('utf8'), skiprows=1)
raw_data

I haven't tested anything beyond that point, but is does print out a nice Pandas data frame to the output console...

@hamilton
Copy link
Contributor Author

hamilton commented Mar 22, 2019

There's definitely further documentation to be added here, if it doesn't exist already.

@mdboom
Copy link
Collaborator

mdboom commented Mar 22, 2019

Yes -- absolutely we need better docs and examples around this stuff.

@khinsen
Copy link

khinsen commented Mar 23, 2019

@mdboom Right, it works, once you know where to look. I saw "undefined" printed in the console and concludes that my variable was undefined, but it seems that it merely means that the result of a Python code block was None. And the output is in the workspace, not the console.

There are exceptions further down which look related to my brute-force inclusion of the isoweeks module. I should be able to figure those out - by running the Python code in CPython. pyodide doesn't look like a decent debugging environment yet.

@mdboom
Copy link
Collaborator

mdboom commented Mar 26, 2019

Yes -- Pyodide debugging (and WebAssembly debugging more largely) has a ways to go.

One thing that might help is described in #366

@khinsen
Copy link

khinsen commented Mar 26, 2019

@mdboom I guess I have no other option since the code works just fine in CPython but raises a Python exception with pyodide. To make the idea of debugging even more attractive, the exception is raised not in my code, but in Pandas. So... this looks like a good reason to procrastinate.

@rth
Copy link
Member

rth commented Aug 2, 2021

Closing as I think the original issue was answered, and if the remote server doesn't have permissive CORS settings, there isn't much we can do about it. Except maybe using a CORS proxy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants