Is pandas really faster? Well the answer is most of the time. A bunch of the more advanced python librabries like numpy and pandas use vectorisation the speed up their functions. In vectorisation, instead of running through the values one by one, we perform the same operation on a bunch of values saved as a vector. This is possible because modern CPUs have multiple cores. Traditionally processors could only hold one value and thus had to process one value at a time. Modern multi-core processors can hold multiple values and therefore run the same operation on a vector of values if you program it to do so. Let's now demonstrate this is a simple example comparing a columns of data:

In [None]:
#A simple comparison of 2 columns as an example
dataset[dataset.PT1 != dataset.PT2]  #pandas using vectorisation
df.query('A != B') #query from numexpr library
dataset[[x != y for x, y in zip(dataset.PT1, dataset.PT2)]] #normal for loop (list comprehension)

![comparison](https://raw.githubusercontent.com/kraikisto/CERN_LEP_Z_boson/main/comparisonk.pdf)




For string string type data vectorisation is a bit difficult:

If data is string or mixed for loop is faster:

In [None]:
#A simple comparison of 2 columns as an example
dataset[dataset.PT1 != dataset.PT2]  #pandas using vectorisation
dataset[[x != y for x, y in zip(dataset.PT1, dataset.PT2)]] #normal for loop (list comprehension)

It's important to note that pandas here is much easier to read than the for loop. 

Even in numbers pandas sometimes loses to numpy (which uses vectorisation). It seems that specificly .values is faster (don't know why).

In [16]:
dataset[dataset.PT1.values != dataset.PT2.values] #.values returns a numpy array

For comparison, counting the number of values pandas is actually faster than numpy or vanilla python.

In [None]:
#here we count the number of values of pandas.series object
series.value_counts(sort=False).to_dict() # value_counts from pandas
dict(zip(*np.unique(series, return_counts=True))) # np.unique from numpy
Counter(series) #counter from collections 

Note: in pandas some functions don't use vectorisation. iterrows() and apply() both lose to a normal loop (using list comprehension, they beat it otherwise). 