Retrieving large frames with sparse data from hdf5 - 'NoneType' object is not iterable error

I have been using pandas within my scripts for some time now, especially to store large data sets in an easily accessible way. I have stumbled upon this problem a couple of days ago and have not been able to solve it so far.

The problem is that after I store a huge data frame into an hdf5 file, when I later load it back, it sometimes has one or more columns (only from the object type columns) completely inaccessible and returning the 'NoneType object is not iterable' error.

While I use the frame in memory there are no problems, even with moderately larger data sets than the example below. It is worth mentioning that the frame contains either multiple datetime columns or multiple VMS timestamps (http://labs.hoffmanlabs.com/node/735), as well as string and char and integer columns. All non-object columns can and do have missing values.

At first I thought I was saving 'NA' values in one of the 'object type' columns. Then I tried to update to latest pandas version (0.9.1). I was using 0.9.0 when this problem first occurred. Neither seem to be the solution.

I have been able to reproduce the error with the following code:

<pre lang="python"><code>
import pandas as pd
import numpy as np
import datetime

# Get VMS timestamps for today
time_now = datetime.datetime.today()
start_vms = datetime.datetime(1858, 11, 17)
t_delta = (time_now - start_vms)
vms_time = t_delta.total_seconds() * 10000000

# Generate Test Frame (dense)
test_records = []
vms_time1 = vms_time
vms_time2 = vms_time
for i in range(2000000):
    vms_time1 += 15 * np.random.randn()
    vms_time2 += 25 * np.random.randn()
    vms_time_diff = vms_time2 - vms_time1
    string1 = 'XXXXXXXXXX'
    string2 = 'XXXXXXXXXX'
    string3 = 'XXXXX'
    string4 = 'XXXXX'
    char1 = 'A'
    char2 = 'B'
    char3 = 'C'
    char4 = 'D'
    number1 = np.random.randint(1,10)
    number2 = np.random.randint(1,100)
    number3 = np.random.randint(1,1000)
    test_records.append((char1, string1, vms_time1, number1, char2, string2, vms_time2, number2, char3, string3, vms_time_diff, number3, char4, string4))

df = pd.DataFrame(test_records, columns = ["column_1", "column_2", "column_3", "column_4", "column_5", "column_6", "column_7", "column_8", "column_9", "column_10", "column_11", "column_12", "column_13", "column_14"])

# Generate Test Frame (sparse)
test_records = []
vms_time1 = vms_time
vms_time2 = vms_time
count = 0
for i in range(2000000):
    if (count%23 == 0):
        vms_time1 += 15 * np.random.randn()
        string1 = 'XXXXXXXXXX'
        string2 = ' '
        string3 = 'XXXXX'
        string4 = 'XXXXX'
        char1 = 'A'
        char2 = 'B'
        char3 = 'C'
        char4 = 'D'
        number1 = None
        number2 = np.random.randint(1,100)
        number3 = np.random.randint(1,1000)
        test_records.append((char1, string1, vms_time1, number1, char2, None, None, number2, char3, string3, None, number3, None, string4))
    else:
        vms_time1 += 15 * np.random.randn()
        vms_time2 += 25 * np.random.randn()
        vms_time_diff = vms_time2 - vms_time1
        string1 = 'XXXXXXXXXX'
        string2 = 'XXXXXXXXXX'
        string3 = 'XXXXX'
        string4 = 'XXXXX'
        char1 = 'A'
        char2 = 'B'
        char3 = 'C'
        char4 = 'D'
        number1 = np.random.randint(1,10)
        number2 = np.random.randint(1,100)
        number3 = np.random.randint(1,1000)
        test_records.append((char1, string1, vms_time1, number1, char2, string2, vms_time2, number2, char3, string3, vms_time_diff, number3, char4, string4))
    count += 1

df1 = pd.DataFrame(test_records, columns = ["column_1", "column_2", "column_3", "column_4", "column_5", "column_6", "column_7", "column_8", "column_9", "column_10", "column_11", "column_12", "column_13", "column_14"])

store_loc = "foo.h5"
h5_store = pd.HDFStore(store_loc )
h5_store['df1'] = df
h5_store['df2'] = df1
h5_store.close()
</code></pre>


When I try to load from this store now the 'df1' is behaving normally, but the 'df2' is producing the following error:
<code>
TypeError: 'NoneType' object is not iterable
</code>

Additionally I just tried to reproduce this error on pandas version 0.8.1. It does not seem to be present there. So it is probably connected with the I/O changes introduced in 0.9.0?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Retrieving large frames with sparse data from hdf5 - 'NoneType' object is not iterable error #2299

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Retrieving large frames with sparse data from hdf5 - 'NoneType' object is not iterable error #2299

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions