loadtxt is slow? #3019

Closed
bmu opened this Issue Feb 26, 2013 · 3 comments

Comments

Projects
None yet
4 participants

bmu commented Feb 26, 2013

Based on this question on Stackoverflow I tested some ways to parse a file:

In [24]: data = np.random.randint(1000, size=(5 * 10**6, 2))

In [25]: np.savetxt('testfile.txt', data, delimiter=' ', fmt='%d')

In [26]: def your_way(filename):
   ...:     G = []
   ...:     with open(filename, 'r') as f:
   ...:         for line in f:
   ...:             G.append(list(map(int, line.split(','))))
   ...:     return G        
   ...: 

In [26]: %timeit your_way('testfile.txt', ' ')
1 loops, best of 3: 16.2 s per loop

In [27]: %timeit pd.read_csv('testfile.txt', delimiter=' ', dtype=int)
1 loops, best of 3: 1.57 s per loop

In [29]: %timeit np.loadtxt('testfile.txt', delimiter=' ', dtype=int)
1 loops, best of 3: 95.2 s per loop

So loadtxt is very slow compared to a simple loop! Note that this was also noted in this answer to the same question (where genfromtxt is faster than loadtxt!).

I'm using 1.7.
Was something changed in loadtxt between 1.6 and 1.7 or what am I doing wrong?

On 1.6.2 I get these times:

In [2]: %timeit np.loadtxt('testfile.txt', delimiter=' ', dtype=int)
1 loops, best of 3: 62.1 s per loop

On 1.7 I get

In [2]: %timeit np.loadtxt('testfile.txt', delimiter=' ', dtype=int)
1 loops, best of 3: 29.6 s per loop

So the performance has actually gotten better. The slowness is because numpy does more than what "def your_way" does.
There is functionality to strip out comments out of the file which takes some time (5-6 seconds) and the appending is more elaborate and handles list and tuples.

Owner

charris commented Feb 21, 2014

Looks like fromfile would be a better comparison for simple stuff.

@charris charris closed this Feb 21, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment