## Perform a speed comparison between pandas.Series.apply(...) and pandas.Series.str.x and treating the series as a numpy array.

[timeit](https://docs.python.org/3/library/timeit.html)

    Measure execution time of code snippets

[Pandas.Series.apply](https://pandas.pydata.org/docs/reference/api/pandas.Series.apply.html)

    Invoke a function on a Series.

[Pandas.Series.str.x](https://pandas.pydata.org/docs/reference/api/pandas.Series.str.contains.html)

    Vectorized string functions for Series and Index.

In [22]:
import pandas as pd

In [2]:
s = pd.Series([20, 21, 12], index = ['London', 'New York', 'Helsinki'])
s, s[0], s["London"]

  s, s[0], s["London"]


(London      20
 New York    21
 Helsinki    12
 dtype: int64,
 20,
 20)

In [3]:
def mult(x):
        return x * 100

s.apply(mult)

London      2000
New York    2100
Helsinki    1200
dtype: int64

In [24]:
df = pd.read_csv("https://raw.githubusercontent.com/roualdes/data/refs/heads/master/penguins.csv")
df

Unnamed: 0,species,island,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g,sex,year
0,Adelie,Torgersen,39.1,18.7,181.0,3750.0,male,2007
1,Adelie,Torgersen,39.5,17.4,186.0,3800.0,female,2007
2,Adelie,Torgersen,40.3,18.0,195.0,3250.0,female,2007
3,Adelie,Torgersen,,,,,,2007
4,Adelie,Torgersen,36.7,19.3,193.0,3450.0,female,2007
...,...,...,...,...,...,...,...,...
339,Chinstrap,Dream,55.8,19.8,207.0,4000.0,male,2009
340,Chinstrap,Dream,43.5,18.1,202.0,3400.0,female,2009
341,Chinstrap,Dream,49.6,18.2,193.0,3775.0,male,2009
342,Chinstrap,Dream,50.8,19.0,210.0,4100.0,male,2009


In [25]:
df2 = df
df2["year"].apply(mult)

0      200700
1      200700
2      200700
3      200700
4      200700
        ...  
339    200900
340    200900
341    200900
342    200900
343    200900
Name: year, Length: 344, dtype: int64

In [5]:
df2["year"] * 100

0      200700
1      200700
2      200700
3      200700
4      200700
        ...  
339    200900
340    200900
341    200900
342    200900
343    200900
Name: year, Length: 344, dtype: int64

In [15]:
df2["year"].apply(lambda s: s + 5)

0      2012
1      2012
2      2012
3      2012
4      2012
       ... 
339    2014
340    2014
341    2014
342    2014
343    2014
Name: year, Length: 344, dtype: int64

In [7]:
df["year"] + 5

0      2012
1      2012
2      2012
3      2012
4      2012
       ... 
339    2014
340    2014
341    2014
342    2014
343    2014
Name: year, Length: 344, dtype: int64

In [26]:
df2["island"]

0      Torgersen
1      Torgersen
2      Torgersen
3      Torgersen
4      Torgersen
         ...    
339        Dream
340        Dream
341        Dream
342        Dream
343        Dream
Name: island, Length: 344, dtype: object

In [9]:
df2["island"] = df2["island"].apply(lambda k: k.replace('Torgersen', 'Catalina'))
df2["island"]

0      Catalina
1      Catalina
2      Catalina
3      Catalina
4      Catalina
         ...   
339       Dream
340       Dream
341       Dream
342       Dream
343       Dream
Name: island, Length: 344, dtype: object

In [12]:
df2["island"].str.replace("Catalina", "Chico")

0      Chico
1      Chico
2      Chico
3      Chico
4      Chico
       ...  
339    Dream
340    Dream
341    Dream
342    Dream
343    Dream
Name: island, Length: 344, dtype: object

In [21]:
import time
start = time.perf_counter()

df2["island"] = df2["island"].str.replace("Catalina", "Chico")

end = time.perf_counter()
seconds = end - start
print(f"Replacing 'Catalina' with 'Chico' took {seconds} seconds.")

Replacing 'Catalina' with 'Chico' took 0.001759601989760995 seconds.


In [16]:
df = pd.read_csv("https://raw.githubusercontent.com/roualdes/data/refs/heads/master/islp/default.csv")
df

Unnamed: 0,default,student,balance,income
0,No,No,729.526495,44361.625074
1,No,Yes,817.180407,12106.134700
2,No,No,1073.549164,31767.138947
3,No,No,529.250605,35704.493935
4,No,No,785.655883,38463.495879
...,...,...,...,...
9995,No,No,711.555020,52992.378914
9996,No,No,757.962918,19660.721768
9997,No,No,845.411989,58636.156984
9998,No,No,1569.009053,36669.112365
