# Which filter to select?

Determining which filtering technique is best for your application depends on several factors related to your specific data and the goals of your analysis. Since you're dealing with strain gage data which likely includes noise due to long cables, the ideal filter would effectively reduce noise while preserving the true signal as much as possible. Here are some considerations for choosing the right filter:

- Nature of Noise:

If the noise is uniformly distributed and high-frequency, a low-pass filter like a Gaussian or Butterworth filter might be more effective.
If the noise includes outliers or spikes, a median filter can be more suitable as it is robust against such outliers.

- Signal Characteristics:

If preserving the sharpness of signal changes (like steps or edges) is crucial, a median filter or Savitzky-Golay filter could be more appropriate.
For smoother transitions or if you're interested in trends rather than exact values, moving averages or exponential smoothing might be better.

- Computational Complexity:

Some filters (like Fourier or Wavelet transforms) are computationally more complex and may not be necessary for simpler applications.
Filters like moving averages, exponential smoothing, or median filters are computationally simpler and easier to implement.

- Analysis Goals:

If you need a quick visualization to understand trends, simpler methods like moving averages could suffice.
For more detailed analysis or if you're preparing the data for machine learning models, more sophisticated methods might be required.

- Trial and Error:

Often the best approach is to try different filters and visually inspect the results to see which one best preserves the features of interest while reducing noise.
Compare the filtered data with your domain knowledge and expectations about what the data should realistically look like.

In [None]:
import numpy as np
import pandas as pd
import plotly.express as px
from scipy.ndimage import median_filter

# Parameters
sampling_rate = 200  # 200 Hz
noise_level = 0.1  # Increased noise level
window_size = 40  # Median filter window size, can be adjusted
total_duration = 10
num_steps = 5

# Generating time and strain data with the new sampling rate
time = np.arange(0, total_duration, 1/sampling_rate)
strain = np.linspace(0, 1, num=num_steps).repeat(len(time) / num_steps)
noise = np.random.normal(0, noise_level, size=strain.shape)
noisy_strain = strain + noise

# Creating a DataFrame
df = pd.DataFrame({'Time': time, 'Strain': noisy_strain})

# Apply median filter
df['Filtered_Strain'] = median_filter(df['Strain'], size=window_size)

# Set index for plotting
df.set_index('Time', inplace=True)

# Plotting
fig = px.line(df, y=['Strain', 'Filtered_Strain'], title='Strain Data with Median Filtering')
fig.show()


# Comparison of filtering methods


In [None]:
import numpy as np
import pandas as pd
import plotly.express as px
from scipy.signal import butter, filtfilt

# Parameters
sampling_rate = 200  # 200 Hz
noise_level = 0.1  # Increased noise level
cutoff_frequency = 0.5  # Cut-off frequency for the filter, can be adjusted
filter_order = 2  # Order of the Butterworth filter, can be adjusted

# Generating time and strain data with the new sampling rate
time = np.arange(0, total_duration, 1/sampling_rate)
strain = np.linspace(0, 1, num=num_steps).repeat(len(time) / num_steps)
noise = np.random.normal(0, noise_level, size=strain.shape)
noisy_strain = strain + noise

# Creating a DataFrame
df = pd.DataFrame({'Time': time, 'Strain': noisy_strain})

# Design the Butterworth filter
nyquist_frequency = 0.5 * sampling_rate
normalized_cutoff = cutoff_frequency / nyquist_frequency
b, a = butter(filter_order, normalized_cutoff, btype='low', analog=False)

# Apply the filter
df['Filtered_Strain'] = filtfilt(b, a, df['Strain'])

# Set index for plotting
df.set_index('Time', inplace=True)

# Plotting
fig = px.line(df, y=['Strain', 'Filtered_Strain'], title='Strain Data with Butterworth Filtering')
fig.show()

In [None]:
import numpy as np
import pandas as pd
import plotly.express as px
from scipy.ndimage import median_filter
from scipy.signal import butter, filtfilt, savgol_filter

# Parameters
sampling_rate = 200  # 200 Hz
noise_level = 0.1  # Noise level
total_duration = 100  # Total duration in seconds
num_steps = 5  # Number of steps in the static loading test
num_outliers = 10  # Number of outliers
outlier_magnitude = 0.5  # Magnitude of the outliers
window_size_median = 40  # Median filter window size
cutoff_frequency = 1  # Cut-off frequency for the Butterworth filter
filter_order = 2  # Order of the Butterworth filter
window_size_savgol = 51  # Window size for Savitzky-Golay filter
poly_order = 3  # Polynomial order for Savitzky-Golay filter
window_size_moving_avg = 40  # Window size for moving average

# Generating time and strain data
time = np.arange(0, total_duration, 1/sampling_rate)
strain = np.linspace(0, 1, num=num_steps).repeat(len(time) / num_steps)
noise = np.random.normal(0, noise_level, size=strain.shape)
noisy_strain = strain + noise

# Adding outliers
outlier_indices = np.random.choice(len(time), num_outliers, replace=False)
noisy_strain[outlier_indices] += np.random.choice([-1, 1], num_outliers) * outlier_magnitude

# Creating a DataFrame
df = pd.DataFrame({'Time': time, 'Strain': noisy_strain})

# Apply median filter
df['Median_Filtered'] = median_filter(df['Strain'], size=window_size_median)

# Design and apply the Butterworth filter
nyquist_frequency = 0.5 * sampling_rate
normalized_cutoff = cutoff_frequency / nyquist_frequency
b, a = butter(filter_order, normalized_cutoff, btype='low', analog=False)
df['Butterworth_Filtered'] = filtfilt(b, a, df['Strain'])

# Apply Savitzky-Golay filter
df['Savitzky_Golay'] = savgol_filter(df['Strain'], window_length=window_size_savgol, polyorder=poly_order)

# Apply moving average
df['Moving_Average'] = df['Strain'].rolling(window=window_size_moving_avg).mean()

# Set index for plotting
df.set_index('Time', inplace=True)

# Plotting
fig = px.line(df, y=['Strain', 'Median_Filtered', 'Butterworth_Filtered', 'Savitzky_Golay', 'Moving_Average'], 
              title='Comparison of Filtering Methods on Strain Data')
fig.show()



# Downsampling a dataset

In [None]:
import ipywidgets as widgets
from IPython.display import display
import plotly.graph_objects as go
import pandas as pd
import numpy as np







# Function to find divisors of the original sampling rate
def find_divisors(num):
    divisors = [i for i in range(1, num + 1) if num % i == 0]
    return divisors

# Function to downsample data
def downsample_data(df, original_rate, new_rate):
    if new_rate >= original_rate:
        return df
    step_size = int(original_rate / new_rate)
    downsampled_df = df.iloc[::step_size, :].copy()
    downsampled_df.index = np.linspace(df.index[0], df.index[-1], len(downsampled_df))
    return downsampled_df

# Function to create initial plot with two traces (original and downsampled data)
def create_initial_plot(original_df, downsampled_df):
    fig = go.Figure()
    fig.add_trace(go.Scatter(x=original_df.index, y=original_df['Strain'], mode='lines', name='Original Data', visible=False))
    fig.add_trace(go.Scatter(x=downsampled_df.index, y=downsampled_df['Strain'], mode='lines', name='Downsampled Data'))
    fig.update_layout(title='Data Plot', xaxis_title='Time (s)', yaxis_title='Strain')
    return fig

# Original data setup (replace this with your actual data setup)
sampling_rate = 200  # Original sampling rate in Hz
# ... [rest of your data setup] ...
# Ensure df is set up with 'Time' as index and 'Strain' as a column
# df = pd.DataFrame({'Time': time, 'Strain': noisy_strain}).set_index('Time')

# Make a copy of the original data
original_df = df.copy()

# Divisors of the original sampling rate
divisors = find_divisors(sampling_rate)

# Dropdown for new sampling rate
sampling_rate_dropdown = widgets.Dropdown(
    options=divisors,
    value=sampling_rate,
    description='New Rate (Hz):',
    style={'description_width': 'initial'}
)

# Button to toggle original data
toggle_button = widgets.Button(
    description='Toggle Original Data',
    button_style='info',
    tooltip='Click to show/hide original data'
)

# Initial plot setup with original and downsampled data (initially the same)
initial_plot = create_initial_plot(original_df, original_df)
plot_widget = go.FigureWidget(initial_plot)

# Function to update plot data
def update_plot(change):
    new_rate = sampling_rate_dropdown.value
    downsampled_df = downsample_data(original_df, sampling_rate, new_rate)
    with plot_widget.batch_update():
        plot_widget.data[1].x = downsampled_df.index
        plot_widget.data[1].y = downsampled_df['Strain']

# Function to toggle the visibility of the original data
def toggle_original_data(b):
    plot_widget.data[0].visible = not plot_widget.data[0].visible

# Link dropdown to update function
sampling_rate_dropdown.observe(update_plot, names='value')

# Link button to toggle function
toggle_button.on_click(toggle_original_data)

# Display widgets
display(sampling_rate_dropdown, toggle_button, plot_widget)


In [None]:
import numpy as np
import pandas as pd
import plotly.express as px
from scipy.signal import butter, filtfilt

# Parameters
sampling_rate = 200  # 200 Hz

# Generating time data with the new sampling rate
time = np.arange(0, total_duration, 1/sampling_rate)
base_strain = np.linspace(0, 1, num=num_steps).repeat(len(time) / num_steps)

# Creating a DataFrame with multiple strain channels
df = pd.DataFrame({'Time': time})

for channel in range(num_channels):
    # Adjusting noise level and amplitude for each channel
    noise_level = base_noise_level * (1 + 0.5 * channel)
    amplitude = 1 + 0.2 * channel

    # Adding random noise and a linear trend
    channel_noise = np.random.normal(0, noise_level, size=base_strain.shape)
    channel_strain = amplitude * base_strain + channel_noise

    # Optionally adding a phase shift
    phase_shift = np.pi / 4 * channel
    channel_strain = channel_strain * np.cos(time + phase_shift)

    # Naming the column as Strain_Channel_1, Strain_Channel_2, etc.
    df[f'Strain_Channel_{channel + 1}'] = channel_strain

df.set_index('Time', inplace=True)
px.line(df)


In [None]:
df