# Import Packages and Mount Google Drive

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [None]:
# Mount your google drive
from google.colab import drive
drive.mount('/content/drive')

## Load Data

In [None]:
ExampleData = pd.read_csv('https://github.com/ljwg3000/UNT_MEEN/blob/main/AI_tutorial/Week2/ExampleData?raw=true', sep=',', header=None)
ExampleData.shape

## **Extract Features**

In real-world sensor data analysis, we rarely use the raw signals directly.

Instead, we calculate **features** — numerical summaries that capture important characteristics of the data.

### **Defining a function**

One commonly used feature is the Root Mean Square (RMS) value.

The **RMS** value of a signal " $x = [x_1, x_2, \dots, x_N]$ " is defined as:

.

$RMS(x) = \sqrt{ \frac{1}{N} \sum_{i=1}^{N} x_i^2 }$


where:  
- $N$: number of samples  
- $x_i$: the individual data points in the signal  

👉 In words: *square the values, take the mean, then take the square root.*

.

---

We can define a simple function to calculate RMS in Python using NumPy:

In [None]:
def rms(a):
    return np.sqrt(np.mean(a**2))

.

.

### **Extracting features from data**

Once we have a signal, we usually don’t work with the raw values directly.  
Instead, we calculate **statistical features** that summarize the signal.  
These features help us describe the signal’s overall behavior in a compact way.

In this example, we extract the following features from the first sensor signal:

- **Max (Maximum):** the largest value in the signal  
- **Min (Minimum):** the smallest value in the signal  
- **RMS (Root Mean Square):** the effective energy level of the signal  
- **Var (Variance):** how spread out the values are around the mean  
- **Std (Standard Deviation):** the square root of variance (easier to interpret in the same units as the data)  
- **Mean (Average):** the central tendency of the signal  

.

---

### 💡 Why are these features useful?

- They reduce a long sequence of data points into just a few representative numbers.  
- Engineers often use them to compare different signals quickly (e.g., checking machine vibration levels).  
- These are the **building blocks** for more advanced feature extraction in Machine Learning.


In [None]:
Max  = np.max(ExampleData.iloc[:,1])
Min  = np.min(ExampleData.iloc[:,1])
RMS  =    rms(ExampleData.iloc[:,1])
Var  = np.var(ExampleData.iloc[:,1])
Std  = np.std(ExampleData.iloc[:,1])
Mean = np.mean(ExampleData.iloc[:,1])

Rep_values = np.array([Max, Min, RMS, Var, Std, Mean])
Rep_values

.

.

### **Extracting features from each sensor signal using 'for' loop**

So far, we extracted features (Max, Min, RMS, Variance, Std, Mean) from **one sensor signal**.  
But in real datasets, we usually have **multiple sensor channels** recorded at the same time.  
Instead of writing the same code again and again for each sensor, we can use a **`for` loop**.

---
### Steps

1. First, we create an empty array `Rep_Values` filled with zeros.  
   - Its size is `(6, number_of_sensors)` because we want to store 6 feature values for each sensor.  

2. Then, for every sensor signal (looping over columns in the data):
   - Calculate **Max** and **Min** values.  
   - Calculate **RMS** value (energy of the signal).  
   - Calculate **Variance** and **Standard Deviation**.  
   - Calculate **Mean** value.  
   - Save these 6 values into the `Rep_Values` array.  

3. Finally, `Rep_Values` contains all the representative features for each sensor in the dataset.  

---

### 💡 Why is this useful?
- With one short loop, we can process **all sensor channels at once**.  
- This is much more **efficient and scalable** compared to manually writing separate code for each channel.  
- Such a feature table is exactly what we use as **input for Machine Learning models** later.


In [None]:
# Create an empty array (filled with '0')
Rep_Values = np.zeros((6 , ExampleData.shape[1]-1))
Rep_Values

In [None]:
for i in range(ExampleData.shape[1]-1): # Loop from when i=0
                                        #        to when i='ExampleData.shape[1]-1'-1

    Rep_Values[0,i] = np.max(ExampleData.iloc[:,i+1])
    Rep_Values[1,i] = np.min(ExampleData.iloc[:,i+1])
    Rep_Values[2,i] =    rms(ExampleData.iloc[:,i+1])
    Rep_Values[3,i] = np.var(ExampleData.iloc[:,i+1])
    Rep_Values[4,i] = np.std(ExampleData.iloc[:,i+1])
    Rep_Values[5,i] = np.mean(ExampleData.iloc[:,i+1])

Rep_Values

.

.

### **Calculate RMS Value per Time Window using `for` Loop**

So far, we calculated one RMS value for the **entire signal**.  
However, signals often change over time, and engineers want to observe how the **energy level (RMS)** evolves in smaller intervals.

---

### Steps
1. We divide the signal into **time windows of 0.01 seconds** (here, 128 samples per window).  
2. For each sensor channel:
   - Loop over each time window.  
   - Calculate the RMS value within that window.  
   - Save the results into the `RMS_Values` matrix.  
3. The result is a **time-series of RMS values**, showing how the signal’s effective energy changes over time.

---

### 💡 Why is this useful?
- Instead of just one number, we now have a **dynamic feature** that reflects time-varying behavior.  
- For example:
  - In vibration monitoring, a sudden increase in RMS in a certain time window can indicate an abnormal event.  
  - In welding or machining data, RMS over time can reveal process transitions.  
- This is a typical preprocessing step before feeding the data into machine learning models.

In [None]:
RMS_Values = np.zeros((21 , ExampleData.shape[1]-1)) # from 0.01 to 0.21 seconds

for sensor in range(ExampleData.shape[1]-1):
    for time in range(21):

        RMS_Values[time,sensor] = rms(ExampleData.iloc[128*(time):128*(time+1),sensor+1])

RMS_Values

.

.

### **Visualizing RMS Trends over Time**

After calculating RMS values for each time window and sensor,  
the next step is to **plot them on a graph**.  
This allows us to see how the energy of each sensor signal changes over time.

---

### Steps
1. Create a **time axis** (`TimeArr`) that matches the number of windows (here, 0.01s steps from 0.01 to 0.21 seconds).  
2. Use `matplotlib` to plot the RMS values of different sensor signals against time.  
3. Each sensor signal is shown in a different color, making it easy to compare their behaviors.  

---

### 💡 Why visualization matters
- Numbers alone are hard to interpret, but a **graph reveals patterns** immediately.  
- You can spot **trends, peaks, or anomalies** in the sensor signals.  
- This is especially important in mechanical engineering, where sudden changes in RMS can indicate **faults or abnormal events**.  

In [None]:
# Create a time column
TimeArr = np.arange(1,22)/100
print(TimeArr.shape)
TimeArr

In [None]:
plt.figure(figsize=(10,5))
plt.plot(TimeArr, RMS_Values[:,0], ls = '--', c = 'r', marker = 'o', ms = 5, mfc = 'r', mec = 'r')
plt.plot(TimeArr, RMS_Values[:,1], ls = '--', c = 'g', marker = 'o', ms = 5, mfc = 'g', mec = 'g')
plt.plot(TimeArr, RMS_Values[:,2], ls = '--', c = 'b', marker = 'o', ms = 5, mfc = 'b', mec = 'b')
plt.grid()
plt.xlabel('time(s)')
plt.ylabel('RMS value')
plt.legend(['RMS_Acc','RMS_Vol', 'RMS_Cur'])
plt.show()

.

.

# **Mini Quiz: Feature Extraction from ExampleData**

Try these short exercises (about 5 minutes).

**Q1. Calculate basic features**


* From the **first sensor column** of `ExampleData`:
  * Compute the **maximum**, **minimum**, and **mean** values.
* Store them in a NumPy array and print the result.

👉 _Hint:_ Use `np.max()`, `np.min()`, `np.mean()`.

In [None]:
# Complete the code

col1 =
features =

print("Max, Min, Mean:", features)

<details>
<summary>Click to see Answer Q1</summary>

```python
col1 = ExampleData.iloc[:,1]
features = np.array([np.max(col1),
                     np.min(col1),
                     np.mean(col1)])
print("Max, Min, Mean:", features)

**Q2. Automate feature extraction with a loop**

* Use a `for` loop to calculate the **mean value of each sensor column** (not just the first).
* Store all results in an array called `Mean_Values`.

👉 _Hint:_ Loop over the column index `i` and use `.iloc[:, i]`.

In [None]:
# Complete the code

Mean_Values =
for i in range(  ):


print("Mean values of each sensor:", Mean_Values)

</details> <details> <summary>Click to see Answer Q2</summary>

```python
Mean_Values = np.zeros(ExampleData.shape[1]-1)
for i in range(1, ExampleData.shape[1]):
    Mean_Values[i-1] = np.mean(ExampleData.iloc[:,i])
    
print("Mean values of each sensor:", Mean_Values)

**Q3. Time-window Max/Min trends**

Now let’s analyze how the **maximum and minimum values** of a sensor signal change over time.

1. Select the **first sensor(accelaration) column** (column index 1).  
2. Divide the signal into 21 consecutive windows, each corresponding to **0.01 s (128 samples)**.  
3. For each window, calculate the **maximum** and **minimum** values.  
4. Plot the results as trends (Max vs. Min) with respect to time.  

👉 Hint: Use slicing like `ExampleData.iloc[start:end, 1]` for each window.  

In [None]:
# Complete the code

Max_Values =  # from 0.01 to 0.21 seconds
Min_Values =  # from 0.01 to 0.21 seconds

for time in range( ):
  Max_Values[time] =
  Min_Values[time] =

TimeArr = np.arange(1,22)/100

plt.figure(figsize=(10,5))
plt.plot()
plt.plot()
plt.grid()
plt.xlabel('time(s)')
plt.ylabel('Max/Min value')
plt.legend([])
plt.show()

</details> <details> <summary>Click to see Answer Q3</summary>

```python
Max_Values = np.zeros((21 , 1)) # from 0.01 to 0.21 seconds
Min_Values = np.zeros((21 , 1)) # from 0.01 to 0.21 seconds

for time in range(21):
  Max_Values[time] = np.max(ExampleData.iloc[128*(time):128*(time+1), 1])
  Min_Values[time] = np.min(ExampleData.iloc[128*(time):128*(time+1), 1])

TimeArr = np.arange(1,22)/100

plt.figure(figsize=(10,5))
plt.plot(TimeArr, Max_Values, marker = 'o', ms = 5)
plt.plot(TimeArr, Min_Values, marker = 'o', ms = 5)
plt.grid()
plt.xlabel('time(s)')
plt.ylabel('Max/Min value')
plt.legend(['Max','Min'])
plt.show()

.

.

.

# Summary of Data Analysis DA1_Code3

In this lab, you practiced **feature extraction** from sensor signals using Python in Google Colab.  
You learned how to condense raw data into meaningful statistical features and observe their trends over time.

---

🔹 **What you learned:**

1. **Defining custom functions**  
   - Create an RMS (Root Mean Square) function and understand its mathematical formula  

2. **Extracting statistical features**  
   - Calculate Max, Min, RMS, Variance, Standard Deviation, and Mean from sensor data  
   - Combine feature values into a compact representation array  

3. **Using loops for automation**  
   - Apply `for` loops to extract features from multiple sensor channels efficiently  
   - Build a feature matrix that summarizes all sensor signals  

4. **Time-windowed feature extraction**  
   - Divide the signal into 0.01s segments and compute RMS for each window  
   - Track how RMS values change over time for each sensor  

5. **Data visualization**  
   - Plot RMS trends over time for multiple sensors in one graph  
   - Compare different sensor behaviors visually

---

💡 **Key Takeaway**  
By completing this lab, you now know how to:  
- Implement custom feature extraction functions (e.g., RMS)  
- Summarize large datasets into compact statistical descriptors  
- Automate feature extraction across multiple sensors and time windows  
- Visualize temporal changes in features for better interpretation of signal dynamics