# Introduction

The interface between Oracle, numpy, and FITS can be a complicated one. One annoying place that things break down is when padding whitespace. This is an example on the premise that forewarned is a forarmed.

The [FITS standard](http://fits.gsfc.nasa.gov/fits_standard.html) says "Leading spaces are significant; trailing spaces are not." In the [CFITSIO documentation](https://heasarc.gsfc.nasa.gov/docs/software/fitsio/c/c_user/node24.html) it states that "When reading a FITS string value, the CFITSIO routines will strip off these non-significant trailing spaces and will return a null-terminated string value containing only the significant characters." On the other hand, CFITSIO states that "Similarly, when writing string values to a FITS file the CFITSIO routines expect to get a null-terminated string as input; CFITSIO will pad the string with blanks if necessary when writing it to the FITS file."

All of this doesn't get easier when we toss Oracle into the mix. Oracle has two character data types `CHAR` and `VARCHAR`. For `CHAR` the [Oracle standard](https://docs.oracle.com/cd/E17952_01/refman-5.5-en/char.html) is "When CHAR values are stored, they are right-padded with spaces to the specified length. When CHAR values are retrieved, trailing spaces are removed". On the other hand for VARCHAR (mostly used by DES), the standard is "VARCHAR values are not padded when they are stored. Trailing spaces are retained when values are stored and retrieved".

Numpy itself "null-pads" strings. I haven't found the official documentation for this, but I'm pretty sure it's the case. Also, when `pyfits` reads a binary table, it [explicitly null-pads](https://github.com/spacetelescope/PyFITS/blob/8ebf9543d1373b4df257e6f6d8901e9a7b58b8b9/lib/pyfits/fitsrec.py#L1045-L1075).

So what actually happens?

In [61]:
import numpy as np
import pyfits
import fitsio

print 'numpy version: ',np.__version__
print 'pyfits version: ',pyfits.__version__
print 'fitsio version: ',fitsio.__version__

numpy version:  1.9.1
pyfits version:  3.3
fitsio version:  0.9.7


In [65]:
# Create a numpy recarray (should be null padded)
a = np.rec.array(10*[['a']],dtype=[('name','S60')])

# Write the array to a FITS file with pyfits
hdu = pyfits.BinTableHDU.from_columns(a)
hdu.writeto('tmp1.fits',clobber=True)

# Read the file with pyfits
a1 = pyfits.open('tmp1.fits')[1].data
print 'WRITE (pyfits); READ (pyfits)',a1[0]

# Now try reading the same file
a2 = fitsio.read('tmp1.fits',ext=1)
print 'WRITE (pyfits); READ (fitsio)',a2[0]

# Now write the original array with fitsio
fitsio.write('tmp2.fits',a,clobber=True)

# Read the file with pyfits
a3 = pyfits.open('tmp2.fits')[1].data
print "WRITE (fitsio); READ (pyfits)",a3[0]

# Read the file with fitsio
a4 = fitsio.read('tmp2.fits',ext=1)
print "WRITE (fitsio); READ (fitsio)",a4[0]

WRITE (pyfits); READ (pyfits) ('a')
WRITE (pyfits); READ (fitsio) ('a',)
WRITE (fitsio); READ (pyfits) ('a')
WRITE (fitsio); READ (fitsio) ('a                                                           ',)


So in summary, 

|               | WRITE (pyfits) | WRITE (fitsio) |
| --- | --- | --- |
| **READ (pyfits)** | OK             | OK             |
| **READ (fitsio)** | OK             | **BAD**            |

This is unfortunate, because the situation that we are in with `easyaccess` is that we are using `fitsio` to read data from the `Oracle` database and then using `fitsio` to read files to upload to the database. To make this worse, 

What this means is that if you read a VARCHAR column from the DB into a FITS file and then write back to the DB you will get different values (they will now be space padded). What a pain.

This investigation suggests that something is going wrong in `fitsio` (or more precisely, in `CFITSIO`). It would be best to fix this, but that would mean changing `CFITSIO` (which we arent' ready to do). There could also be a hack in `fitsio`, but the nice thing about `fitsio` is that it follows the same standard as `CFITSIO`. For the time being maybe we can just hack it in `easyaccess`. At first glance, it seems like the hack would be to strip the whitespace after reading with `fitsio`.

In [63]:
for name,tup in a4.dtype.fields.items():
    if tup[0].kind == 'S':
        a4[name] = np.char.rstrip(a4[name])
print "WRITE (fitsio); READ (fitsio)", a4[0]

WRITE (fitsio); READ (fitsio) ('a',)


Another possibility might be to write Oracle columns as `CHAR` rather than `VARCHAR`. In that case, all columns will get space padded on upload and space stripped on download. This still doesn't fix the problem with CFITSIO reading

As an addendum, here's  some investigation of numpy and null characters...

In [75]:
import numpy

# Create string with nulls
DT = numpy.dtype([('hashcode', numpy.str_, 16)])
badstring = 4 * chr(0) + 'ABCDEFGH' + 4 * chr(0)
print 'badstring: ',repr(badstring),'\n'

# Create numpy array
arr = numpy.array((badstring,), dtype=DT)
print 'array: ', repr(arr),'\n'

# Get the value in the array
s = str(arr['hashcode'])
print 'array value: ',len(s), repr(s),'\n'



badstring:  '\x00\x00\x00\x00ABCDEFGH\x00\x00\x00\x00' 

array:  array(('\x00\x00\x00\x00ABCDEFGH',), 
      dtype=[('hashcode', 'S16')]) 

array value:  12 '\x00\x00\x00\x00ABCDEFGH' 

