-
-
Notifications
You must be signed in to change notification settings - Fork 18.8k
Closed
Labels
Dtype ConversionsUnexpected or buggy dtype conversionsUnexpected or buggy dtype conversionsEnhancementIO CSVread_csv, to_csvread_csv, to_csv
Description
How can I force a dtype on a column and ensure that any not-parseable data entry are filled as NaN? This is important in cases where there are unpredictable data entry errors in CSVs or database streams that cannot be mapped to missing values a priori.
Eg: Below I want column 'a' to be parsed as np.float but the erroneous 'Dog' entry causes an exception. Is there a way to tell read_csv() to force parsing a column 'a' as np.float and fill all non-parseable entries with NaN?
data = 'a,b,c\n1.1,2,3\nDog,5,6\n7.7,8,9.5'
df = pd.read_csv(StringIO.StringIO(data), dtype={'a': np.float})
df.dtypes
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-12-cd8b6f868aec> in <module>()
1 data = 'a,b,c\n1.1,2,3\nDog,5,6\n7.7,8,9.5'
----> 2 df = pd.read_csv(StringIO.StringIO(data), dtype={'a': np.float})
3 df.dtypes
C:\Python27\lib\site-packages\pandas\io\parsers.pyc in parser_f(filepath_or_buffer, sep, dialect, compression, doublequote, escapechar, quotechar, quoting, skipinitialspace, lineterminator, header, index_col, names, prefix, skiprows, skipfooter, skip_footer, na_values, true_values, false_values, delimiter, converters, dtype, usecols, engine, delim_whitespace, as_recarray, na_filter, compact_ints, use_unsigned, low_memory, buffer_lines, warn_bad_lines, error_bad_lines, keep_default_na, thousands, comment, decimal, parse_dates, keep_date_col, dayfirst, date_parser, memory_map, nrows, iterator, chunksize, verbose, encoding, squeeze)
389 buffer_lines=buffer_lines)
390
--> 391 return _read(filepath_or_buffer, kwds)
392
393 parser_f.__name__ = name
C:\Python27\lib\site-packages\pandas\io\parsers.pyc in _read(filepath_or_buffer, kwds)
205 return parser
206
--> 207 return parser.read()
208
209 _parser_defaults = {
C:\Python27\lib\site-packages\pandas\io\parsers.pyc in read(self, nrows)
622 # self._engine.set_error_bad_lines(False)
623
--> 624 ret = self._engine.read(nrows)
625
626 if self.options.get('as_recarray'):
C:\Python27\lib\site-packages\pandas\io\parsers.pyc in read(self, nrows)
943
944 try:
--> 945 data = self._reader.read(nrows)
946 except StopIteration:
947 if nrows is None:
C:\Python27\lib\site-packages\pandas\_parser.pyd in pandas._parser.TextReader.read (pandas\src\parser.c:5785)()
C:\Python27\lib\site-packages\pandas\_parser.pyd in pandas._parser.TextReader._read_low_memory (pandas\src\parser.c:6002)()
C:\Python27\lib\site-packages\pandas\_parser.pyd in pandas._parser.TextReader._read_rows (pandas\src\parser.c:6870)()
C:\Python27\lib\site-packages\pandas\_parser.pyd in pandas._parser.TextReader._convert_column_data (pandas\src\parser.c:7919)()
AttributeError: 'NoneType' object has no attribute 'dtype'
frol, ian-contiamo, Socob, AndersonJo, rvvincelli and 3 more
Metadata
Metadata
Assignees
Labels
Dtype ConversionsUnexpected or buggy dtype conversionsUnexpected or buggy dtype conversionsEnhancementIO CSVread_csv, to_csvread_csv, to_csv