v0.15.0 Can't read csv.gz from url #8685

olgabot · 2014-10-30T18:11:22Z

import pandas as pd
pd.read_csv('https://raw.githubusercontent.com/YeoLab/shalek2013/master/expression.csv.gz', compression='gzip', index_col=0)


---------------------------------------------------------------------------
CParserError                              Traceback (most recent call last)
<ipython-input-17-2e5c16b1e504> in <module>()
----> 1 pd.read_csv('https://raw.githubusercontent.com/YeoLab/shalek2013/master/expression.csv.gz', compression='gzip', index_col=0)

/usr/local/lib/python2.7/site-packages/pandas/io/parsers.pyc in parser_f(filepath_or_buffer, sep, dialect, compression, doublequote, escapechar, quotechar, quoting, skipinitialspace, lineterminator, header, index_col, names, prefix, skiprows, skipfooter, skip_footer, na_values, na_fvalues, true_values, false_values, delimiter, converters, dtype, usecols, engine, delim_whitespace, as_recarray, na_filter, compact_ints, use_unsigned, low_memory, buffer_lines, warn_bad_lines, error_bad_lines, keep_default_na, thousands, comment, decimal, parse_dates, keep_date_col, dayfirst, date_parser, memory_map, float_precision, nrows, iterator, chunksize, verbose, encoding, squeeze, mangle_dupe_cols, tupleize_cols, infer_datetime_format, skip_blank_lines)
    461                     skip_blank_lines=skip_blank_lines)
    462 
--> 463         return _read(filepath_or_buffer, kwds)
    464 
    465     parser_f.__name__ = name

/usr/local/lib/python2.7/site-packages/pandas/io/parsers.pyc in _read(filepath_or_buffer, kwds)
    237 
    238     # Create the parser.
--> 239     parser = TextFileReader(filepath_or_buffer, **kwds)
    240 
    241     if (nrows is not None) and (chunksize is not None):

/usr/local/lib/python2.7/site-packages/pandas/io/parsers.pyc in __init__(self, f, engine, **kwds)
    551             self.options['has_index_names'] = kwds['has_index_names']
    552 
--> 553         self._make_engine(self.engine)
    554 
    555     def _get_options_with_defaults(self, engine):

/usr/local/lib/python2.7/site-packages/pandas/io/parsers.pyc in _make_engine(self, engine)
    688     def _make_engine(self, engine='c'):
    689         if engine == 'c':
--> 690             self._engine = CParserWrapper(self.f, **self.options)
    691         else:
    692             if engine == 'python':

/usr/local/lib/python2.7/site-packages/pandas/io/parsers.pyc in __init__(self, src, **kwds)
   1050         kwds['allow_leading_cols'] = self.index_col is not False
   1051 
-> 1052         self._reader = _parser.TextReader(src, **kwds)
   1053 
   1054         # XXX

/usr/local/lib/python2.7/site-packages/pandas/parser.so in pandas.parser.TextReader.__cinit__ (pandas/parser.c:4693)()

/usr/local/lib/python2.7/site-packages/pandas/parser.so in pandas.parser.TextReader._get_header (pandas/parser.c:6091)()

/usr/local/lib/python2.7/site-packages/pandas/parser.so in pandas.parser.TextReader._tokenize_rows (pandas/parser.c:8119)()

/usr/local/lib/python2.7/site-packages/pandas/parser.so in pandas.parser.raise_parser_error (pandas/parser.c:20349)()

CParserError: Error tokenizing data. C error: Calling read(nbytes) on source failed. Try engine='python'.

But the local file works:

I have v0.15.0

The text was updated successfully, but these errors were encountered:

rockg · 2014-10-30T23:56:35Z

Is this a documented behavior that is supposed to work? It doesn't seem like the current code handles reading a compressed file from a URL from a quick glance at it.

rockg · 2014-10-30T23:58:02Z

We would have to add something similar to this so

jreback · 2014-10-31T00:08:10Z

yep this looks not supported at the moment

welcome a pull request to fix

dhimmel · 2015-05-26T03:25:40Z

+1, this is an important feature for the modern workflow

Until someday, I've been using the following workaround in Python 3.4 and Pandas 0.16.0:

response = requests.get(url)
bytes_io = io.BytesIO(response.content)
with gzip.open(bytes_io, 'rt') as read_file:
    df = pandas.read_csv(read_file)

jreback · 2015-05-26T10:44:59Z

@dhimmel pull-requests are welcome to add this feature.

jreback · 2015-08-20T13:25:31Z

closed by #10649

For `process.ipynb`: + Improve documentation with markdown cells. + Switch to commit specific links for dhimmel/uniprot. + Adopt pandas 17.0 gzipped url support. See pandas-dev/pandas#8685 + Exclude rows 192304-192473 (one indexed) where `BindingDB Reactant_set_id` was missing. + Handle affinities that cannot be converted to floats. For `collapse.Rmd`: + Use readr for tsv io. + Retain pubmed_ids and sources when collapsing.

jreback added IO CSV read_csv, to_csv Enhancement labels Oct 31, 2014

jreback added this to the Someday milestone Oct 31, 2014

mdagost mentioned this issue Jul 21, 2015

ENH: allow gzip de-compression for files specified by a url #10649

Merged

jreback modified the milestones: 0.17.0, Someday Jul 23, 2015

jreback closed this as completed Aug 20, 2015

dhimmel mentioned this issue Nov 2, 2016

Reading bz2, zip, and xz-compressed files from URL fails #14570

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.15.0 Can't read csv.gz from url #8685

v0.15.0 Can't read csv.gz from url #8685

olgabot commented Oct 30, 2014

rockg commented Oct 30, 2014

rockg commented Oct 30, 2014

jreback commented Oct 31, 2014

dhimmel commented May 26, 2015

jreback commented May 26, 2015

jreback commented Aug 20, 2015

v0.15.0 Can't read csv.gz from url #8685

v0.15.0 Can't read csv.gz from url #8685

Comments

olgabot commented Oct 30, 2014

rockg commented Oct 30, 2014

rockg commented Oct 30, 2014

jreback commented Oct 31, 2014

dhimmel commented May 26, 2015

jreback commented May 26, 2015

jreback commented Aug 20, 2015