# Exploratory Data Analysis — France Grid Stress Prediction (Day-Ahead, D+1)


## Notebook objectives


- Understand the temporal structure of national electricity load and generation  
- Validate data integrity (time axis, missingness, DST, outliers)  
- Analyze demand-side (consumption) and supply-side (production) separately  
- Translate EDA findings into clear modeling decisions for day-ahead forecasting  


# 0. Executive overview


## 0.1 Dataset snapshot


Describe:
- Period covered (start / end)
- Inferred frequency (hourly / half-hourly)
- Number of rows and columns
- Key variables detected (load, temperature, wind, solar, nuclear, hydro, thermal, etc.)


## 0.2 Top data issues


Summarize any critical issues:
- Missing timestamp blocks
- DST anomalies (23h / 25h days)
- High missing-rate variables
- Extreme or suspicious values


## 0.3 Key findings


Write 3–5 high-level findings in plain language after completing the EDA.


## 0.4 Weekly mini-dashboard


Single figure over a representative week:
- Load
- Temperature
- Wind generation
- Solar generation


# 1. Data loading and reproducibility


## 1.1 Data sources


Document:
- RTE : 
https://www.services-rte.com/en/download-data-published-by-rte.html?category=consumption&type=power_consumption

- Weather sources : 
https://open-meteo.com/

- Calendar data (holidays, school vacations if used)


## 1.2 Load processed dataset


Load the consolidated dataset.
Show head / tail and basic shape.


## 1.3 Structural sanity checks


Check:
- Datetime index type and timezone
- Sorted and unique timestamps
- Inferred frequency
- Column naming consistency and units
- Coverage (min / max dates)


# 2. Time axis integrity (global)


## 2.1 Frequency audit


Analyze datetime differences:
- Distribution of time deltas
- Abnormal steps
- Largest gaps


## 2.2 Missing timestamps


- Build the expected full time grid
- Count missing timestamps
- Identify contiguous missing blocks (start / end / duration)


## 2.3 Daylight Saving Time (DST)


- Identify 23-hour and 25-hour days
- Decide and document the timezone policy


# 3. Data quality and anomalies (global)


## 3.1 Missing values analysis


- Missing rate per variable
- Missingness over time (year / month)
- Optional heatmap


## 3.2 Outlier detection


- Extreme values in load and generation
- Negative or impossible values
- Flatlines and step changes


## 3.3 Known event windows


Inspect known periods:
- COVID shock
- Extreme cold spells
- Other major system events (if known)


# Part A — Consumption (demand side)


# 4. National electricity load: global behavior


## 4.1 Long-term trend


- Multi-year load evolution
- Rolling averages
- Non-stationarity discussion


## 4.2 Load distribution and tails


- Histogram / density
- Quantiles
- Extreme demand days


# 5. Intra-day and intra-week patterns (load)


## 5.1 Daily profile


- Average load by hour
- Weekday vs weekend comparison


## 5.2 Weekly structure


- Mean load by weekday
- Hourly profiles for each weekday


# 6. Seasonal patterns (load)


## 6.1 Monthly seasonality


- Monthly averages
- Monthly boxplots


## 6.2 Year-to-year comparison


- Same-month comparisons across years
- Structural break detection


# 7. Thermo-sensitivity analysis


## 7.1 Load vs temperature


- Scatter plots
- Nonlinear smoothing
- Winter vs summer asymmetry


## 7.2 Degree-days


- Heating Degree Days (HDD)
- Cooling Degree Days (CDD)
- Comparison with raw temperature


## 7.3 Thermal inertia


- Lagged or smoothed temperature features


# Part B — Production (supply side)


# 8. Electricity generation: overview


## 8.1 Total generation vs load


- Compare total generation and load
- Identify stress periods (imports/exports if available)


## 8.2 Generation mix


- Average mix by technology
- Mix evolution over time


# 9. Production by technology


## 9.1 Nuclear


- Stability
- Seasonal modulation
- Ramp constraints


## 9.2 Wind


- Volatility
- Seasonal behavior
- Ramp rates


## 9.3 Solar


- Diurnal cycle
- Seasonal amplitude


## 9.4 Hydro and thermal


- Dispatchable role
- Response during stress periods


# 10. Weather dependence (production)


## 10.1 Wind vs wind speed


- Scatter plots
- Binned averages


## 10.2 Solar vs radiation


- Scatter plots
- Solar vs hour of day


## 10.3 Seasonal weather effects


- Winter vs summer comparisons


# 11. Residual load and grid stress


## 11.1 Definition


Residual load = load − (wind + solar)


## 11.2 Residual load analysis


- Distribution
- Daily profile
- Extreme peaks


## 11.3 Comparison with raw load


- Tail behavior
- Interpretation for grid stress


# Cross-cutting analysis


# 12. Calendar effects


## 12.1 Weekday effects


- Mean differences by weekday
- Interaction with seasonality


## 12.2 Holidays and vacations


- Holiday vs non-holiday comparison
- Vacation-period effects


# 13. Temporal dependence


## 13.1 Autocorrelation


- ACF / PACF for load and residual load


## 13.2 Lag correlations


- Correlation at 24h, 48h, 168h


# 14. Feature relationships


## 14.1 Correlation heatmaps


- Load, weather, generation variables


## 14.2 Redundancy checks


- Multicollinearity inspection


# 15. EDA synthesis and modeling decisions


## 15.1 Main conclusions


Summarize confirmed patterns:
- Seasonality
- Nonlinearity
- Intermittency
- Temporal dependence


## 15.2 Modeling implications


- Target choice
- Feature set
- Data cleaning policy
- Backtesting strategy


## 15.3 Next steps


- Load forecasting notebook
- Generation forecasting notebook
- Grid stress evaluation


# Appendix


## A. Data dictionary


Column definitions, units, and sources.


## B. Cleaning rules


Interpolation limits, outlier handling rules.


## C. Figure registry


List of exported figures for reports.
