Reading of fixed width file is detecting incorrect number of columns

Hi,
I have been trying to use the `read_fwf()` dataframe method in pandas 0.16.0. I have supplied a list of tuples for the _colspec_ parameter. There are 9 tuples specified. I have also supplied a list to the _names_ parameter, with 9 column names. 

Here's some code:

```
df = pd.read_fwf('c:/6starts.tab', 
    header=None,
    colspec=[(0, 11), (11, 14), (14, 43), (43, 49), (49, 69), (69, 98), (98, 110), (110, 133), (133, 145)], 
    names=['lotid', 'lottype', 'part', 'qty', 'startdate', 'proc', 'rnum', 'material', 'who'])
```

Here's the traceback (sorry about the formatting):

---

ValueError                                Traceback (most recent call last)
<ipython-input-15-ff4a00f4ef2b> in <module>()
     10                                      (110, 133),
     11                                      (133, 145)], 
---> 12                             names=['lotid', 'lottype', 'part', 'qty', 'startdate', 'proc', 'rnum', 'material', 'who']
     13                             )

C:\Program Files\Anaconda\lib\site-packages\pandas\io\parsers.py in read_fwf(filepath_or_buffer, colspecs, widths, **kwds)
    499     kwds['colspecs'] = colspecs
    500     kwds['engine'] = 'python-fwf'
--> 501     return _read(filepath_or_buffer, kwds)
    502 
    503 

C:\Program Files\Anaconda\lib\site-packages\pandas\io\parsers.py in _read(filepath_or_buffer, kwds)
    254         return parser
    255 
--> 256     return parser.read()
    257 
    258 _parser_defaults = {

C:\Program Files\Anaconda\lib\site-packages\pandas\io\parsers.py in read(self, nrows)
    713                 raise ValueError('skip_footer not supported for iteration')
    714 
--> 715         ret = self._engine.read(nrows)
    716 
    717         if self.options.get('as_recarray'):

C:\Program Files\Anaconda\lib\site-packages\pandas\io\parsers.py in read(self, rows)
   1561             content = content[1:]
   1562 
-> 1563         alldata = self._rows_to_cols(content)
   1564         data = self._exclude_implicit_index(alldata)
   1565 

C:\Program Files\Anaconda\lib\site-packages\pandas\io\parsers.py in _rows_to_cols(self, content)
   1936             msg = ('Expected %d fields in line %d, saw %d' %
   1937                    (col_len, row_num + 1, zip_len))
-> 1938             raise ValueError(msg)
   1939 
   1940         if self.usecols:

**ValueError: Expected 9 fields in line 1, saw 7**

---

The problem I'm seeing is that pandas does not appear to be using _colspec_ to parse the file. Instead it seems to be using whitespace and is detecting 7 distinct columns based on that.

I have tried specifying `delimiter=''`to see if that would make a difference, but it doesn't fix it. I have also tried specifying `usecols=[0, 1, 2, 3, 4, 5, 6, 7, 8]` and that seems to prevent the exception occurring, but it still only reads in 7 columns and pads the last 2 columns with _NaN_.

![pandas_1](https://cloud.githubusercontent.com/assets/8472772/7771371/c0dc4b56-008e-11e5-934e-96dbeffdea63.jpg)

In the snapshot above, the first column is fine, however the 2nd and 3rd columns are merged together (the 'D ' and the '8SL*****' bit). The next snapshot shows some of the desired column widths (highlighted in pink):

![pandas_4](https://cloud.githubusercontent.com/assets/8472772/7771735/74ec4018-0091-11e5-84c6-54a29e2ebd88.jpg)

I think the issue is in the PythonParser class within the parsers.py file, in the __rows_to_cols()_ method, but I'm not familiar enough with it yet to attempt any sort of fix.

Here is the version information from pandas.show_versions():
## INSTALLED VERSIONS

commit: None
python: 2.7.6.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 58 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.16.0
nose: 1.3.4
Cython: 0.22
numpy: 1.9.2
scipy: 0.15.1
statsmodels: 0.6.1
IPython: 3.1.0
sphinx: 1.2.3
patsy: 0.3.0
dateutil: 2.4.1
pytz: 2015.2
bottleneck: None
tables: 3.1.1
numexpr: 2.3.1
matplotlib: 1.4.3
openpyxl: 1.8.5
xlrd: 0.9.3
xlwt: 0.7.5
xlsxwriter: 0.6.7
lxml: 3.4.2
bs4: 4.3.2
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 0.9.9
pymysql: None
psycopg2: None

Please let me know if you need any more information.

Thanks,
Adrian.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Reading of fixed width file is detecting incorrect number of columns #10198

INSTALLED VERSIONS

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Reading of fixed width file is detecting incorrect number of columns #10198

Description

INSTALLED VERSIONS

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions