# Fixed-Width File Reader Sample Usage
**Fixed-width files**: each field in a column is padded to a given number of bytes.<br>
**Cython**: Python project to automatically generate Python bindings for C++ scripts.<br>
**Python vs C++**: simple but slow vs fast but complicated.
## Goals
* C++ project to read fixed-width files into in-memory Apache Arrow tables.
* Includes Cython-made Python bindings for accessibility and integration with Python projects.

We use Apache Arrow's CSV reader as a base since it required only fairly minimal changes for this application.
* Modify the reader to optionally decode on read-in.
* Modify the parser to use field widths instead of a delimiter to separate the input stream into fields.
* Modify the converter to optionally convert COBOL-formatted numeric types to standard numeric types.

## Use
For installation, see this repo's README.

First, we'll create an example fixed-width file, including COBOL-formatted numbers, null values and odd spacing.

In [1]:
%%file sample.fwf
aa           bb     cc  
hello        123}   3.56
hi           9129A  NaN 
     spaces   N/A   7.8 
NA           3{     0   

Writing sample.fwf


Now we can run the module on the fixed-width file. See the README for a full list of options. Note that the converter uses preset lists/maps for null values and the COBOL values below. These can be modified through:
* convert_options.null_values = \[some new list\]
* convert_options.\[pos/neg\]\_values = {some new mapping}

In [2]:
import pyfwfr as pf

convert_options = pf.ConvertOptions(is_cobol=True, strings_can_be_null=True)
parse_options = pf.ParseOptions([13, 7, 4])

table = pf.read_fwf('sample.fwf', parse_options, 
                    convert_options=convert_options)
for column in table.columns:
    print(column)

<Column name='aa' type=DataType(string)>
[
  [
    "hello",
    "hi",
    "spaces",
    null
  ]
]
<Column name='bb' type=DataType(int64)>
[
  [
    -1230,
    91291,
    null,
    30
  ]
]
<Column name='cc' type=DataType(double)>
[
  [
    3.56,
    null,
    7.8,
    0
  ]
]
