# Numpy Vs. Pandas

A Numpy array is a grid of values with the same data type, whereas a Pandas DataFrame is a 2-dimensional size-mutable, tabular data structure with rows and columns, which can have different data types.

Numpy arrays are faster and more efficient for performing numerical operations, while Pandas DataFrames are more flexible and easier to use for working with and manipulating data.

One key difference between the two is that a Numpy array can only have one data type, while a Pandas DataFrame can have multiple data types in the same table. This makes Pandas DataFrames more powerful and versatile, but also means they are generally slower and use more memory than Numpy arrays.

Here is an example comparing the two:

In [1]:
import numpy as np # numerical only (single option, faster)
import pandas as pd # can use with multiple data types in the same table (versatile, slower)

# NumPy array
arr = np.array([[1, 2, 3], [4, 5, 6]])

#Pandas Data Frame
df = pd.DataFrame([[1, 2, 3], [4, 5, 6]])


In [2]:
arr

array([[1, 2, 3],
       [4, 5, 6]])

In [3]:
df

Unnamed: 0,0,1,2
0,1,2,3
1,4,5,6



To compare the performance of Numpy arrays and Pandas DataFrames, you can use the timeit module to measure how long it takes to perform a particular operation on each.

Here is an example that compares the performance of adding two Numpy arrays and two Pandas DataFrames:

In [9]:
import timeit

In [11]:
# Set up the Numpy array
n = 1000000
a = np.random.randn(n)
b = np.random.randn(n)

In [13]:
# Setup DataFrame
df1 = pd.DataFrame(np.random.randn(n, 2))
df2 = pd.DataFrame(np.random.randn(n, 2))

In [15]:
start = timeit.default_timer()
c = a + b
end = timeit.default_timer()
print("Time for Numpy array: ", end - start)

Time for Numpy array:  0.0019575001206249


In [16]:
start = timeit.default_timer()
df3 = df1 + df2
end = timeit.default_timer()
print("Time for DataFrame ", end - start)

Time for DataFrame  0.005039500072598457
