### Individual – feature exploration
we decided to drop the follwing features:
```python
def drop_columns(self, df: pd.DataFrame):
    drop = [
        # wind speed vector u, available up to 20000 m, from 1000 hPa to 10 hPa and on flight levels FL10-FL900[m/s] does not make sens at surfece level
        "wind_speed_w_1000hPa:ms",
        "wind_speed_u_10m:ms",  # same as above
        "wind_speed_v_10m:ms",  # same as above
        "snow_density:kgm3",
        "snow_drift:idx",  # denne er ny. Fikk 140.9 uten.
    ]
    shared_columns = list(set(df.columns) & set(drop))
    df = df.drop(columns=shared_columns)
    return df
```

These features were dropped based on the following reasons:

1. "wind_speed_w_1000hPa:ms": Wind speed vector at 1000 hPa is available up to 20000 m, which doesn't make sense at the surface level.
2. "wind_speed_u_10m:ms" and "wind_speed_v_10m:ms": These are redundant as they represent the same information as the wind speed vectors at higher altitudes.
3. "snow_density:kgm3": This feature was dropped for unspecified reasons.
4. "snow_drift:idx": Dropped, possibly due to inconsistency or lack of relevance, with a note indicating a specific case where it resulted in 140.9.

Based on the *model-interpretations* found by plotting, all agree to our feature selection.

### Pairs of data – feature exploration

The wind speed and snow groups as pairs have proven to be useless.

### Clean up features
Cleaning the features, we set all values to its absolute, removing all negative values – proving to be exceptionally efficient
```python
def absolute_values(self, df: pd.DataFrame):
    df[df.columns] = df[df.columns].abs()
    df = df.replace(-0.0, 0.0)
    return df
```
### Intuitive data
We expect the data to be seasonal. Looking at the individual plots against its timestamp, all off the values seem to follow a sinodial pattern.

The exceptions are more random, and of these are either dropped or features like 
- prob_rime
- dew_or_rime

### Feature engineering

From the different "pipeline classes" you will se how we tried different methods of dropping correlated features, adding lag features, doing sinodial transformations of the timestamps, different ways of grouping and widening the dataframe pivoting the quarter hours. A short description of the optimal Pipeline:

The feature engineering process involves experimenting with various techniques, including:
   - Handling date-related features.
   - One-hot encoding estimated and observed categories.
   - Grouping data by hour.
   - Exploring different methods for removing consecutive measurements.
   - Lagging features by 1 hour.
   - Clipping predictions to be non-negative.

The optimal pipeline involves a combination of these steps to achieve the best results for the given data and problem domain. The feature selection is guided by the identification of important features that contribute significantly to the model's performance.