Summary Functions and Maps

Summary functions

In [1]:
import pandas as pd
pd.set_option('display.max_rows', 5)
df = pd.read_csv("../dataset/train_FD001.txt", sep=r"\s+", header=None)
df.head()


Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,16,17,18,19,20,21,22,23,24,25
0,1,1,-0.0007,-0.0004,100.0,518.67,641.82,1589.7,1400.6,14.62,...,521.66,2388.02,8138.62,8.4195,0.03,392,2388,100.0,39.06,23.419
1,1,2,0.0019,-0.0003,100.0,518.67,642.15,1591.82,1403.14,14.62,...,522.28,2388.07,8131.49,8.4318,0.03,392,2388,100.0,39.0,23.4236
2,1,3,-0.0043,0.0003,100.0,518.67,642.35,1587.99,1404.2,14.62,...,522.42,2388.03,8133.23,8.4178,0.03,390,2388,100.0,38.95,23.3442
3,1,4,0.0007,0.0,100.0,518.67,642.35,1582.79,1401.87,14.62,...,522.86,2388.08,8133.83,8.3682,0.03,392,2388,100.0,38.88,23.3739
4,1,5,-0.0019,-0.0002,100.0,518.67,642.37,1582.85,1406.22,14.62,...,522.19,2388.04,8133.8,8.4294,0.03,393,2388,100.0,38.9,23.4044


Descriptive Statistics
The `describe()` function summarizes numerical columns by providing count, mean, standard deviation, and quartile values.

In [6]:
cols = ["engine_id", "cycle", "op1", "op2", "op3"] + [f"sensor{i}" for i in range(1, 22)]
df.columns = cols

In [7]:
df.sensor1.describe()

count    20631.00
mean       518.67
           ...   
75%        518.67
max        518.67
Name: sensor1, Length: 8, dtype: float64

In [8]:
df.engine_id.describe()

count    20631.000000
mean        51.506568
             ...     
75%         77.000000
max        100.000000
Name: engine_id, Length: 8, dtype: float64

 to see the mean of the points allotted we can use the mean() function

In [9]:
df.cycle.mean()

np.float64(108.80786195530997)

In [10]:
df[["cycle", "sensor1"]].mean()

cycle      108.807862
sensor1    518.670000
dtype: float64

To see a list of unique values we can use the unique() function

In [11]:
df.engine_id.unique()

array([  1,   2,   3,   4,   5,   6,   7,   8,   9,  10,  11,  12,  13,
        14,  15,  16,  17,  18,  19,  20,  21,  22,  23,  24,  25,  26,
        27,  28,  29,  30,  31,  32,  33,  34,  35,  36,  37,  38,  39,
        40,  41,  42,  43,  44,  45,  46,  47,  48,  49,  50,  51,  52,
        53,  54,  55,  56,  57,  58,  59,  60,  61,  62,  63,  64,  65,
        66,  67,  68,  69,  70,  71,  72,  73,  74,  75,  76,  77,  78,
        79,  80,  81,  82,  83,  84,  85,  86,  87,  88,  89,  90,  91,
        92,  93,  94,  95,  96,  97,  98,  99, 100])

To see a list of unique values and how often they occur in the dataset, we can use the value_counts() method

In [12]:
df.engine_id.value_counts()

engine_id
69    362
92    341
     ... 
91    135
39    128
Name: count, Length: 100, dtype: int64

Maps

Mapping transforms existing data into a new representation using a function, which is useful for data preprocessing and feature creation.

In [13]:
cycle_mean = df.cycle.mean()
df["cycle_centered"] = df.cycle.map(lambda c: c - cycle_mean)

df["cycle_centered"].head()

0   -107.807862
1   -106.807862
2   -105.807862
3   -104.807862
4   -103.807862
Name: cycle_centered, dtype: float64

`apply()` Transformation
The `apply()` method is used to perform custom operations on entire rows or columns of a DataFrame.

In [14]:
cycle_mean = df["cycle"].mean()
def remean_cycle(row):
    row["cycle"] = row["cycle"] - cycle_mean
    return row
df_remeaned = df.apply(remean_cycle, axis=1)
df_remeaned.head()

Unnamed: 0,engine_id,cycle,op1,op2,op3,sensor1,sensor2,sensor3,sensor4,sensor5,...,sensor13,sensor14,sensor15,sensor16,sensor17,sensor18,sensor19,sensor20,sensor21,cycle_centered
0,1.0,-107.807862,-0.0007,-0.0004,100.0,518.67,641.82,1589.7,1400.6,14.62,...,2388.02,8138.62,8.4195,0.03,392.0,2388.0,100.0,39.06,23.419,-107.807862
1,1.0,-106.807862,0.0019,-0.0003,100.0,518.67,642.15,1591.82,1403.14,14.62,...,2388.07,8131.49,8.4318,0.03,392.0,2388.0,100.0,39.0,23.4236,-106.807862
2,1.0,-105.807862,-0.0043,0.0003,100.0,518.67,642.35,1587.99,1404.2,14.62,...,2388.03,8133.23,8.4178,0.03,390.0,2388.0,100.0,38.95,23.3442,-105.807862
3,1.0,-104.807862,0.0007,0.0,100.0,518.67,642.35,1582.79,1401.87,14.62,...,2388.08,8133.83,8.3682,0.03,392.0,2388.0,100.0,38.88,23.3739,-104.807862
4,1.0,-103.807862,-0.0019,-0.0002,100.0,518.67,642.37,1582.85,1406.22,14.62,...,2388.04,8133.8,8.4294,0.03,393.0,2388.0,100.0,38.9,23.4044,-103.807862


`apply()` Axis Behavior
Using `axis=1` applies a function to each row, while `axis=0` (or `'index'`) applies the function to each column of a DataFrame.

Non-destructive Transformations
Both `map()` and `apply()` return new transformed data and do not modify the original DataFrame unless the result is reassigned.


In [15]:
df.head(1)

Unnamed: 0,engine_id,cycle,op1,op2,op3,sensor1,sensor2,sensor3,sensor4,sensor5,...,sensor13,sensor14,sensor15,sensor16,sensor17,sensor18,sensor19,sensor20,sensor21,cycle_centered
0,1,1,-0.0007,-0.0004,100.0,518.67,641.82,1589.7,1400.6,14.62,...,2388.02,8138.62,8.4195,0.03,392,2388,100.0,39.06,23.419,-107.807862


In [16]:
cycle_mean = df["cycle"].mean()
df["cycle_centered"] = df["cycle"] - cycle_mean
df["cycle_centered"].head()

0   -107.807862
1   -106.807862
2   -105.807862
3   -104.807862
4   -103.807862
Name: cycle_centered, dtype: float64