In my geoprocessing to make [transit-score heatmaps](https://johnfwhitesell.github.io/transit/) I had a step where I was simply concatanating two columns of a pandas dataframe to make tuple I would use in my API requests.  Although this is a simple task, repeating it 120,000 times did produce a noticable slowdown.  Previously I would have just shrugged and ignored it unless the delay got too big but now I am aware of the nifty timeit function from python.  In a jupyter notebook all you have to do is define your various functions and call them with %timeit at the start of the line to get a speed estimate.

In [3]:
#Speedtesting, fun!

def iat_assign():
    df['XY'] = None
    l = df.shape[0]
    for i in range(l):
        df.iat[i,8] = (df.iat[i,1],df.iat[i,0])
        
def at_assign():
    df['XY'] = None
    for i in df.index:
        df.at[i,'XY'] = (df.at[i,'lon'], df.at[i,'lat'])

def itertuples_at_assign():
    df['XY'] = None
    for row in df[['lon','lat']].itertuples():
        XY = (row[2],row[1])
        df.at[row[0], 'XY'] = XY
        
def itertuples_at_assign_2():
    df['XY'] = None
    for row in df[['lon','lat']].itertuples():
        df.at[row[0], 'XY'] = (row[2],row[1])
        
def itertuples_iat_assign():
    df['XY'] = None
    for row in df[['lon','lat']].reset_index().itertuples():
        XY = (row[2],row[1])
        df.iat[row[0], 8] = XY
        
%timeit iat_assign()
%timeit at_assign()
%timeit itertuples_at_assign()
%timeit itertuples_at_assign_2()
%timeit itertuples_iat_assign()

#  Results
# iat_assign:
# 546 ms ± 8.64 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
# at_assign:
# 566 ms ± 37.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
# itertuples_at_assign:
# 228 ms ± 4.22 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
# itertuples_at_assign_2:
# 223 ms ± 4.11 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
# itertuples_iat_assign:
# 258 ms ± 14 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


These differences aren't huge, only a factor of 2.  Still, there is a really useful lesson here.  It speeds things up to load then process that row rather then looking up over the entire dataframe.  The same thing can be seen in a pairing algorithm used later in the program.  This found which list to assign an object to, then added it to that list.

In [None]:


# def hex_pairing(self):
#         # This uses the fact that every point already has calculated the "id" of the hex in which it falls, using it's coordinates.
#         points = self.points
#         hexes = self.hexes        
#         points['point_id'] = points.index
#         t = tqdm(points.itertuples(), total=points.shape[0])
#         print("Generating list of values within each hex")
        
#         for row in t:
#             id = row[2] # this is the id of the hex it matches
            
#             hex_row = hexes.loc[id]
#             hex_row['total_score'].apply(lambda l: l.append(row.total_score))
#             hex_row['bike_score'].apply(lambda l: l.append(row.bike_score))
#             hex_row['car_score'].apply(lambda l: l.append(row.car_score))
#             hex_row['mass_score'].apply(lambda l: l.append(row.mass_score))
#             hex_row['ride_score'].apply(lambda l: l.append(row.ride_score))
#             hex_row['points_list'].apply(lambda l: l.append(row.point_id))

# def hex_pairing_alternative(self):
#         # This uses the fact that every point already has calculated the "id" of the hex in which it falls, using it's coordinates.
#         points = self.points
#         hexes = self.hexes        
#         points['point_id'] = points.index
#         t = tqdm_notebook(points.itertuples(), total=points.shape[0])
#         print("Generating list of values within each hex")
        
#         for row in t:
#             id = row[2] # this is the id of the hex it matches           

#             hexes.loc[id, 'total_score'].append(row.total_score)
#             hexes.loc[id, 'bike_score'].append(row.bike_score)
#             hexes.loc[id, 'car_score'].append(row.car_score)
#             hexes.loc[id, 'mass_score'].append(row.mass_score)
#             hexes.loc[id, 'ride_score'].append(row.ride_score)
#             hexes.loc[id, 'points_list'].append(row.point_id)

Here the calculations are much longer so I use tqdm instead of %timeit.  Changing your iterator to a tqdm iterator let's you have a progress bar track the progress.  Here I found that loading the row once and editing the row took 2 minutes for 18,000 iterations while loading the row, finding the position and looking up that value took 9 minutes.  That is a more significant difference then what I found before and very important when it comes to making this process scalable.

In [None]:

# test = Washington_grid.hexes
# w = grid_rings

# %timeit test['integer_form'] = test['grid'].apply(lambda x: x[0]+w+1) + test['grid'].apply(lambda x: x[1]+w+3)*2*w
## 8.32 ms ± 193 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

# %timeit test['integer_form']== 2550
## 133 µs ± 1.92 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

# %timeit test['grid']==(25,25,25)
## 1.24 ms ± 146 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
