# Lab 4: Data Cleaning and Filtering with Time Series Data

In this lab session, you will learn foundational techniques in data cleaning, filtering, and visualization. These skills are essential for analyzing environmental data collected from sensors in agricultural systems.

### What you will learn:
- How to create a time-indexed DataFrame
- How to identify and handle missing values
- How to apply Boolean filters to extract specific data
- How to visualize raw and processed data using Matplotlib
- How to apply your knowledge through hands-on exercises

## 1. Creating a Time Series DataFrame with an Index Column

In this section, we create a simple dataset that simulates sensor readings taken daily over a period of 10 days. The dataset includes Temperature (°C), Humidity (%), and CO2 concentration (ppm), with some missing values. We use `pandas.date_range` to generate timestamps and set them as the index of the DataFrame.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Create a time series of 10 days
date_rng = pd.date_range(start='2024-01-01', periods=10, freq='D')

# Create the data dictionary with some missing values
data = {
    'Date': date_rng,
    'Temperature': [22, 21, np.nan, 24, 25, 23, np.nan, 26, 27, 28],
    'Humidity': [55, 53, 56, np.nan, 60, 59, 57, 58, np.nan, 54],
    'CO2': [400, 405, 410, 415, np.nan, 425, 430, np.nan, 440, 445]
}

# Convert dictionary to DataFrame and set Date as index
df = pd.DataFrame(data)
df.set_index('Date', inplace=True)
df

## 2. Handling Missing Data

Sensor data often contains missing values (e.g., due to disconnection or sensor failure). We can use the `isna()` function to check for missing data and `fillna()` to replace them.

- `df.isna()` returns a DataFrame indicating which values are missing.
- `df.isna().sum()` shows how many missing values are in each column.
- `df.fillna(df.mean())` replaces missing values with the mean of each column.

In [None]:
df.isna()

In [None]:
df.isna().sum()

In [None]:
df_filled = df.fillna(df.mean(numeric_only=True))
df_filled

## 3. Filtering Rows Using Conditions

We often want to focus on data that meets specific criteria. For example, we might be interested in periods when the temperature exceeded 25°C.

- `df['Temperature'] > 25` creates a Boolean Series
- `df[df['Temperature'] > 25]` returns the rows that meet this condition
- Combine multiple conditions using `&` (AND) or `|` (OR)

**Note:** Wrap each condition in parentheses when combining them.

In [None]:
df[df['Temperature'] > 25]

In [None]:
df[(df['Temperature'] > 25) & (df['Humidity'] > 55)]

## 4. Visualizing the Original and Filled Data

Plotting the raw data helps us detect trends and anomalies. By using `subplots=True`, we can see each sensor value in a separate graph. We will visualize the original data and compare it to the filled (cleaned) version.

In [None]:
# Plot the original data with missing values
df.plot(subplots=True, figsize=(10, 8), title='Original Data with Missing Values')
plt.tight_layout()
plt.show()

In [None]:
# Plot the filled data to see how missing values were handled
df_filled.plot(subplots=True, figsize=(10, 8), title='Data After Filling Missing Values')
plt.tight_layout()
plt.show()

## 5. Visualizing Filtered Data

We can also visualize the subset of data that meets specific criteria. Here, we filter rows with `Temperature > 25°C` and plot only those values. The `style='o-'` option plots both lines and markers.

In [None]:
filtered_df = df[df['Temperature'] > 25]
filtered_df.plot(style='o-', figsize=(10, 6), title='Filtered Data (Temperature > 25)')
plt.ylabel('Sensor Values')
plt.xlabel('Date')
plt.grid(True)
plt.show()

## 6. 📝 Exercises

Try the following on your own to reinforce what you've learned:

1. Fill missing values using **backward fill**:
   ```python
   df_bfill = df.fillna(method='bfill')
   ```

2. Create a new filter condition:
   - Temperature >= 24 AND CO2 < 430
   ```python
   your_filtered_df = df[(df['Temperature'] >= 24) & (df['CO2'] < 430)]
   ```

3. Visualize the filtered result using a line plot:
   ```python
   your_filtered_df.plot(style='o-')
   ```

4. Save the filtered DataFrame as a CSV file:
   ```python
   your_filtered_df.to_csv('filtered_result.csv')
   ```