- # PREPROCESSING `eCO2mix`

## Importation des modules

In [1]:
import sys
import os
import glob
import pandas as pd
# Ajoute le dossier parent au path pour importer data_preprocessing.py
sys.path.append(os.path.abspath(os.path.join('..')))

## Importation des fonction de pré-traitement

- `fetch_eCO2mix_data` : télécharge la source de donnée `eCO2mix`
- `convert_all_xls_eCO2mix_data` : converti les fichiers `.xls` en `.csv`
- `concat_eCO2mix_annual_data` : concatène les données de consommation en un seul Dataframe
- `concat_eCO2mix_tempo_data` : concatène les donnée du calendrier TEMPO en un seul Dataframe
- `preprocess_annual_data` : Premier traitement du nettoyage de donnée sur les données de consommation
- `preprocess_tempo_data` : Premier traitement sur le nettoyage de donnée sur les données TEMPO
- `merge_eCO2mix_data`: jointure gauche des données de consommation et TEMPO sur la date
- `preprocess_eCO2mix_data` : Deuxième traitement de nettoyage sur la donnée jointe de consommation et TEMPO
- `preprocess_eCO2mix_data_engineered` : Dernier traitement de nettoyage sur les données jointe de consommation et TEMPO après application du feature engineering

**Chaque fonctions de traitement teste la conformité de la source de donnée.**

In [2]:
from src.data_preprocessing import (
    fetch_eCO2mix_data,
    convert_all_xls_eCO2mix_data,
    concat_eCO2mix_annual_data,
    concat_eCO2mix_tempo_data,
    preprocess_annual_data,
    preprocess_tempo_data,
    merge_eCO2mix_data,
    preprocess_eCO2mix_data,
    preprocess_eCO2mix_data_engineered
    )

## Donnée de `eCO2mix`

### Téléchargement des données

In [3]:
fetch_eCO2mix_data()

🌐 Chargement de https://www.rte-france.com/eco2mix/telecharger-les-indicateurs
🔗 24 liens détectés (ZIP + Tempo).
⬇️  Téléchargement ZIP : eCO2mix_RTE_En-cours-TR.zip
✅ Contenu de eCO2mix_RTE_En-cours-TR.zip extrait dans ../data/external/
⬇️  Téléchargement ZIP : eCO2mix_RTE_En-cours-Consolide.zip
✅ Contenu de eCO2mix_RTE_En-cours-Consolide.zip extrait dans ../data/external/
⬇️  Téléchargement ZIP : eCO2mix_RTE_Annuel-Definitif_2012.zip
✅ Contenu de eCO2mix_RTE_Annuel-Definitif_2012.zip extrait dans ../data/external/
⬇️  Téléchargement ZIP : eCO2mix_RTE_Annuel-Definitif_2013.zip
✅ Contenu de eCO2mix_RTE_Annuel-Definitif_2013.zip extrait dans ../data/external/
⬇️  Téléchargement ZIP : eCO2mix_RTE_Annuel-Definitif_2014.zip
✅ Contenu de eCO2mix_RTE_Annuel-Definitif_2014.zip extrait dans ../data/external/
⬇️  Téléchargement ZIP : eCO2mix_RTE_Annuel-Definitif_2015.zip
✅ Contenu de eCO2mix_RTE_Annuel-Definitif_2015.zip extrait dans ../data/external/
⬇️  Téléchargement ZIP : eCO2mix_RTE_Annue

### Définition des répertoires de données 

In [4]:
xls_folder = os.path.join('..', 'data', 'external')

In [5]:
# Dossiers contenant les fichiers source
eco2mix_folder = os.path.join("..","data", "raw")

### Conversion des XLS en CSV

In [6]:
convert_all_xls_eCO2mix_data(xls_folder, eco2mix_folder)

✅ Fichier converti et nettoyé : ..\data\raw\eCO2mix_RTE_Annuel-Definitif_2012.csv
✅ Fichier converti et nettoyé : ..\data\raw\eCO2mix_RTE_Annuel-Definitif_2013.csv
✅ Fichier converti et nettoyé : ..\data\raw\eCO2mix_RTE_Annuel-Definitif_2014.csv
✅ Fichier converti et nettoyé : ..\data\raw\eCO2mix_RTE_Annuel-Definitif_2015.csv
✅ Fichier converti et nettoyé : ..\data\raw\eCO2mix_RTE_Annuel-Definitif_2016.csv
✅ Fichier converti et nettoyé : ..\data\raw\eCO2mix_RTE_Annuel-Definitif_2017.csv
✅ Fichier converti et nettoyé : ..\data\raw\eCO2mix_RTE_Annuel-Definitif_2018.csv
✅ Fichier converti et nettoyé : ..\data\raw\eCO2mix_RTE_Annuel-Definitif_2019.csv
✅ Fichier converti et nettoyé : ..\data\raw\eCO2mix_RTE_Annuel-Definitif_2020.csv
✅ Fichier converti et nettoyé : ..\data\raw\eCO2mix_RTE_Annuel-Definitif_2021.csv
✅ Fichier converti et nettoyé : ..\data\raw\eCO2mix_RTE_Annuel-Definitif_2022.csv
✅ Fichier converti et nettoyé : ..\data\raw\eCO2mix_RTE_En-cours-Consolide.csv
✅ Fichier converti 

### Chargement de toute la consommation horaire

- Concaténation des donnée de consommation venant de plusieurs sources de données

In [7]:
df_annual = concat_eCO2mix_annual_data(eco2mix_folder)
df_annual

Lecture des fichiers annuels :
  - ..\data\raw\eCO2mix_RTE_Annuel-Definitif_2012.csv
  - ..\data\raw\eCO2mix_RTE_Annuel-Definitif_2013.csv
  - ..\data\raw\eCO2mix_RTE_Annuel-Definitif_2014.csv
  - ..\data\raw\eCO2mix_RTE_Annuel-Definitif_2015.csv
  - ..\data\raw\eCO2mix_RTE_Annuel-Definitif_2016.csv
  - ..\data\raw\eCO2mix_RTE_Annuel-Definitif_2017.csv
  - ..\data\raw\eCO2mix_RTE_Annuel-Definitif_2018.csv
  - ..\data\raw\eCO2mix_RTE_Annuel-Definitif_2019.csv
  - ..\data\raw\eCO2mix_RTE_Annuel-Definitif_2020.csv
  - ..\data\raw\eCO2mix_RTE_Annuel-Definitif_2021.csv
  - ..\data\raw\eCO2mix_RTE_Annuel-Definitif_2022.csv
  - ..\data\raw\eCO2mix_RTE_En-cours-Consolide.csv
  - ..\data\raw\eCO2mix_RTE_En-cours-TR.csv


Unnamed: 0,Périmètre,Nature,Date,Heures,Consommation,Prévision J-1,Prévision J,Fioul,Charbon,Gaz,...,Hydraulique - Fil de l?eau + éclusée,Hydraulique - Lacs,Hydraulique - STEP turbinage,Bioénergies - Déchets,Bioénergies - Biomasse,Bioénergies - Biogaz,Stockage batterie,Déstockage batterie,Eolien terrestre,Eolien offshore
0,France,Données définitives,2012-01-01,00:00,58315,58200,58200,492,25,3816,...,ND,ND,ND,ND,ND,ND,,,,
1,France,Données définitives,2012-01-01,00:15,,57700,57550,,,,...,,,,,,,,,,
2,France,Données définitives,2012-01-01,00:30,58315,57200,56900,492,25,3816,...,ND,ND,ND,ND,ND,ND,,,,
3,France,Données définitives,2012-01-01,00:45,,56200,56000,,,,...,,,,,,,,,,
4,France,Données définitives,2012-01-01,01:00,56231,55200,55100,492,25,3834,...,ND,ND,ND,ND,ND,ND,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
467131,France,Données temps réel,2025-04-27,22:45,,ND,,,,,...,,,,,,,,,,
467132,France,Données temps réel,2025-04-27,23:00,,ND,,,,,...,,,,,,,,,,
467133,France,Données temps réel,2025-04-27,23:15,,ND,,,,,...,,,,,,,,,,
467134,France,Données temps réel,2025-04-27,23:30,,ND,,,,,...,,,,,,,,,,


### Premier passage de traitement des consommations horaires

In [8]:
df_annual = preprocess_annual_data(df_annual)
df_annual

Unnamed: 0,Date,Heures,Consommation,Datetime
0,2012-01-01,00:00,58315,2012-01-01 00:00:00
2,2012-01-01,00:30,58315,2012-01-01 00:30:00
4,2012-01-01,01:00,56231,2012-01-01 01:00:00
6,2012-01-01,01:30,56075,2012-01-01 01:30:00
8,2012-01-01,02:00,55532,2012-01-01 02:00:00
...,...,...,...,...
466880,2025-04-25,08:00,50734,2025-04-25 08:00:00
466881,2025-04-25,08:15,51745,2025-04-25 08:15:00
466882,2025-04-25,08:30,52209,2025-04-25 08:30:00
466883,2025-04-25,08:45,52312,2025-04-25 08:45:00


### Chargement de toute la donnée TEMPO

- Concaténation des donnée du calendrier TEMPO venant de plusieurs sources de données

In [9]:
df_tempo = concat_eCO2mix_tempo_data(eco2mix_folder)
df_tempo

Lecture des fichiers tempo :
  - ..\data\raw\eCO2mix_RTE_tempo_2014-2015.csv
  - ..\data\raw\eCO2mix_RTE_tempo_2015-2016.csv
  - ..\data\raw\eCO2mix_RTE_tempo_2016-2017.csv
  - ..\data\raw\eCO2mix_RTE_tempo_2017-2018.csv
  - ..\data\raw\eCO2mix_RTE_tempo_2018-2019.csv
  - ..\data\raw\eCO2mix_RTE_tempo_2019-2020.csv
  - ..\data\raw\eCO2mix_RTE_tempo_2020-2021.csv
  - ..\data\raw\eCO2mix_RTE_tempo_2021-2022.csv
  - ..\data\raw\eCO2mix_RTE_tempo_2022-2023.csv
  - ..\data\raw\eCO2mix_RTE_tempo_2023-2024.csv
  - ..\data\raw\eCO2mix_RTE_tempo_2024-2025.csv


Unnamed: 0,Date,Type de jour TEMPO
0,2014-09-01,BLEU
1,2014-09-02,BLEU
2,2014-09-03,BLEU
3,2014-09-04,BLEU
4,2014-09-05,BLEU
...,...,...
3886,2025-04-22,BLEU
3887,2025-04-23,BLEU
3888,2025-04-24,BLEU
3889,2025-04-25,BLEU


### Premier passage de traitement des données TEMPO

In [10]:
df_tempo = preprocess_tempo_data(df_tempo)
df_tempo

Unnamed: 0,Date,Type de jour TEMPO_BLANC,Type de jour TEMPO_BLEU,Type de jour TEMPO_ROUGE
0,2014-09-01,False,True,False
1,2014-09-02,False,True,False
2,2014-09-03,False,True,False
3,2014-09-04,False,True,False
4,2014-09-05,False,True,False
...,...,...,...,...
3886,2025-04-22,False,True,False
3887,2025-04-23,False,True,False
3888,2025-04-24,False,True,False
3889,2025-04-25,False,True,False


### Jointure gauche des données de consommation horaire et TEMPO

In [11]:
df_merged = merge_eCO2mix_data(df_annual, df_tempo)
df_merged

Unnamed: 0,Date,Heures,Consommation,Datetime,Type de jour TEMPO_BLANC,Type de jour TEMPO_BLEU,Type de jour TEMPO_ROUGE
0,2012-01-01,00:00,58315,2012-01-01 00:00:00,,,
1,2012-01-01,00:30,58315,2012-01-01 00:30:00,,,
2,2012-01-01,01:00,56231,2012-01-01 01:00:00,,,
3,2012-01-01,01:30,56075,2012-01-01 01:30:00,,,
4,2012-01-01,02:00,55532,2012-01-01 02:00:00,,,
...,...,...,...,...,...,...,...
272528,2025-04-25,08:00,50734,2025-04-25 08:00:00,False,True,False
272529,2025-04-25,08:15,51745,2025-04-25 08:15:00,False,True,False
272530,2025-04-25,08:30,52209,2025-04-25 08:30:00,False,True,False
272531,2025-04-25,08:45,52312,2025-04-25 08:45:00,False,True,False


### Feature Engineering

- `create_date_features` : décompose la colonne Date en plusieurs feature distincte caractérisant des information sur le temps (année, mois, jour, jour de la semaine...)
- `create_lag_features` : Ajout des variables de décalage sur la `Consommation` (*N-1*, *N-2*, *N-3*)
- `create_rolling_features` : Ajout de la moyenne mobile (*moyenne sur les 3 dernières consommation*)
- `create_hour_features` : Transformation de l'horaire en features encoder cycliquement par sinus et cosinus

In [12]:
from src.feature_engineering import(
    create_date_features,
    create_lag_features,
    create_rolling_features,
    create_hour_features
)

### Applications du features engineering

In [13]:
df_merged = create_date_features(df_merged)
df_merged = create_lag_features(df_merged, 'Consommation')
df_merged = create_rolling_features(df_merged, 'Consommation')
df_merged = create_hour_features(df_merged)
df_merged

Unnamed: 0,Date,Heures,Consommation,Datetime,Type de jour TEMPO_BLANC,Type de jour TEMPO_BLEU,Type de jour TEMPO_ROUGE,year,month,day,...,dayofyear,is_weekend,is_end_of_month,Consommation_lag_1,Consommation_lag_2,Consommation_lag_3,Consommation_rolling_mean_3,hour_transformed,hour_sin,hour_cos
0,2012-01-01,00:00,58315,2012-01-01 00:00:00,,,,2012,1,1,...,1,True,False,,,,,0,0.000000,1.000000
1,2012-01-01,00:30,58315,2012-01-01 00:30:00,,,,2012,1,1,...,1,True,False,58315,,,58315.000000,0,0.000000,1.000000
2,2012-01-01,01:00,56231,2012-01-01 01:00:00,,,,2012,1,1,...,1,True,False,58315,58315,,58315.000000,1,0.258819,0.965926
3,2012-01-01,01:30,56075,2012-01-01 01:30:00,,,,2012,1,1,...,1,True,False,56231,58315,58315,57620.333333,1,0.258819,0.965926
4,2012-01-01,02:00,55532,2012-01-01 02:00:00,,,,2012,1,1,...,1,True,False,56075,56231,58315,56873.666667,2,0.500000,0.866025
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
272528,2025-04-25,08:00,50734,2025-04-25 08:00:00,False,True,False,2025,4,25,...,115,False,False,50248,49795,48695,49579.333333,8,0.866025,-0.500000
272529,2025-04-25,08:15,51745,2025-04-25 08:15:00,False,True,False,2025,4,25,...,115,False,False,50734,50248,49795,50259.000000,8,0.866025,-0.500000
272530,2025-04-25,08:30,52209,2025-04-25 08:30:00,False,True,False,2025,4,25,...,115,False,False,51745,50734,50248,50909.000000,8,0.866025,-0.500000
272531,2025-04-25,08:45,52312,2025-04-25 08:45:00,False,True,False,2025,4,25,...,115,False,False,52209,51745,50734,51562.666667,8,0.866025,-0.500000


### Dernier traitement sur le nettoyage et conformité de la donnée

In [14]:
df_merged = preprocess_eCO2mix_data(df_merged)
df_merged = preprocess_eCO2mix_data_engineered(df_merged)
df_merged

Unnamed: 0_level_0,Date,Heures,Consommation,Type de jour TEMPO_BLANC,Type de jour TEMPO_BLEU,Type de jour TEMPO_ROUGE,year,month,day,weekday,...,dayofyear,is_weekend,is_end_of_month,Consommation_lag_1,Consommation_lag_2,Consommation_lag_3,Consommation_rolling_mean_3,hour_transformed,hour_sin,hour_cos
Datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2014-09-01 00:00:00,2014-09-01,00:00,43320,False,True,False,2014,9,1,0,...,244,False,False,43938,44567,42056,43520.333333,0,0.000000,1.000000
2014-09-01 00:30:00,2014-09-01,00:30,41174,False,True,False,2014,9,1,0,...,244,False,False,43320,43938,44567,43941.666667,0,0.000000,1.000000
2014-09-01 01:00:00,2014-09-01,01:00,38430,False,True,False,2014,9,1,0,...,244,False,False,41174,43320,43938,42810.666667,1,0.258819,0.965926
2014-09-01 01:30:00,2014-09-01,01:30,37800,False,True,False,2014,9,1,0,...,244,False,False,38430,41174,43320,40974.666667,1,0.258819,0.965926
2014-09-01 02:00:00,2014-09-01,02:00,37137,False,True,False,2014,9,1,0,...,244,False,False,37800,38430,41174,39134.666667,2,0.500000,0.866025
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2025-04-25 08:00:00,2025-04-25,08:00,50734,False,True,False,2025,4,25,4,...,115,False,False,50248,49795,48695,49579.333333,8,0.866025,-0.500000
2025-04-25 08:15:00,2025-04-25,08:15,51745,False,True,False,2025,4,25,4,...,115,False,False,50734,50248,49795,50259.000000,8,0.866025,-0.500000
2025-04-25 08:30:00,2025-04-25,08:30,52209,False,True,False,2025,4,25,4,...,115,False,False,51745,50734,50248,50909.000000,8,0.866025,-0.500000
2025-04-25 08:45:00,2025-04-25,08:45,52312,False,True,False,2025,4,25,4,...,115,False,False,52209,51745,50734,51562.666667,8,0.866025,-0.500000


## Exportation de la donnée nettoyé, transformé

In [15]:
df_merged.to_csv(os.path.join('..', 'data', 'processed', 'eco2mix_data.csv'), index=False)

## Donnée `meteo.gouv.fr`

In [16]:
from src.data_preprocessing import load_data

In [17]:
wheather_folder = os.path.join("..", "data", "raw")

In [36]:
weather_pattern = os.path.join(wheather_folder,"H_02_*.csv")
weather_files = glob.glob(weather_pattern)
list_df_weather = []

for file in weather_files:
    print (" -", file)
    try:
        df = load_data(file, encoding="utf-8", sep=";")
        list_df_weather.append(df)
    except Exception as e:
        print(f"Erreur lors du chargement du fichier {file}: {e}")
df_weather = pd.concat(list_df_weather, ignore_index=True, axis=0)
df_weather

 - ..\data\raw\H_02_2010-2019.csv
 - ..\data\raw\H_02_latest-2024-2025.csv
 - ..\data\raw\H_02_previous-2020-2023.csv


Unnamed: 0,NUM_POSTE,NOM_USUEL,LAT,LON,ALTI,AAAAMMJJHH,RR1,QRR1,DRR1,QDRR1,...,INS,QINS,INS2,QINS2,TLAGON,QTLAGON,TVEGETAUX,QTVEGETAUX,ECOULEMENT,QECOULEMENT
0,02031001,AUBENTON,49.834000,4.200500,177,2010120106,,,,,...,,,,,,,,,,
1,02031001,AUBENTON,49.834000,4.200500,177,2010120206,,,,,...,,,,,,,,,,
2,02031001,AUBENTON,49.834000,4.200500,177,2010120306,,,,,...,,,,,,,,,,
3,02031001,AUBENTON,49.834000,4.200500,177,2010120406,,,,,...,,,,,,,,,,
4,02031001,AUBENTON,49.834000,4.200500,177,2010120506,,,,,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1224904,02738001,TERGNIER,49.663833,3.320667,59,2023123119,0.0,1,,,...,,,,,,,,,,
1224905,02738001,TERGNIER,49.663833,3.320667,59,2023123120,0.0,1,,,...,,,,,,,,,,
1224906,02738001,TERGNIER,49.663833,3.320667,59,2023123121,0.0,1,,,...,,,,,,,,,,
1224907,02738001,TERGNIER,49.663833,3.320667,59,2023123122,0.0,1,,,...,,,,,,,,,,


### Description des features

- `AAAAMMJJHH`  : date de la mesure (année mois jour heure)
- RR1         : quantité de précipitation tombée en 1 heure (en mm et 1/10)
- FF          : force du vent moyenné sur 10 mn, mesurée à 10 m (en m/s et 1/10)
- FXY         : valeur maximale de FF dans l'heure (en m/s et 1/10)
- FXI         : force maximale du vent instantané dans l'heure, mesurée à 10 m (en m/s et 1/10)
- T           : température sous abri instantanée (en °C et 1/10)
- TN          : température minimale sous abri dans l'heure (en °C et 1/10)
- TX          : température maximale sous abri dans l'heure (en °C et 1/10)
- U           : humidité relative (en %)
- UN          : humidité relative minimale dans l'heure (en %)
- UX          : humidité relative maximale dans l'heure (en %)
- LAT         : latitude
- LONG        : longitude

### Nettoyage

In [37]:
df_weather['AAAAMMJJHH'] = pd.to_datetime(df_weather['AAAAMMJJHH'], format='%Y%m%d%H')

In [38]:
weather_features = [
    'T', 'TN', 'TX', 'U', 'UN', 'UX', 'FF', 'FXY', 'FXI', 'RR1','LAT','LON'
]

In [39]:
df_weather = df_weather[['AAAAMMJJHH'] + weather_features]

In [40]:
df_weather

Unnamed: 0,AAAAMMJJHH,T,TN,TX,U,UN,UX,FF,FXY,FXI,RR1,LAT,LON
0,2010-12-01 06:00:00,,,,,,,,,,,49.834000,4.200500
1,2010-12-02 06:00:00,,,,,,,,,,,49.834000,4.200500
2,2010-12-03 06:00:00,,,,,,,,,,,49.834000,4.200500
3,2010-12-04 06:00:00,,,,,,,,,,,49.834000,4.200500
4,2010-12-05 06:00:00,,,,,,,,,,,49.834000,4.200500
...,...,...,...,...,...,...,...,...,...,...,...,...,...
1224904,2023-12-31 19:00:00,9.4,9.4,9.7,,,,,,,0.0,49.663833,3.320667
1224905,2023-12-31 20:00:00,8.7,8.7,9.4,,,,,,,0.0,49.663833,3.320667
1224906,2023-12-31 21:00:00,8.4,8.4,8.7,,,,,,,0.0,49.663833,3.320667
1224907,2023-12-31 22:00:00,8.1,8.0,8.4,,,,,,,0.0,49.663833,3.320667


In [41]:
df_weather = df_weather.dropna(subset=set(weather_features))
df_weather

Unnamed: 0,AAAAMMJJHH,T,TN,TX,U,UN,UX,FF,FXY,FXI,RR1,LAT,LON
122,2010-01-01 00:00:00,0.6,0.5,0.7,94,92,94,4.0,4.4,6.2,0.0,49.595667,3.610333
123,2010-01-01 01:00:00,0.5,0.4,0.6,93,93,95,4.3,4.7,6.5,0.0,49.595667,3.610333
124,2010-01-01 02:00:00,0.2,0.2,0.5,94,92,94,4.1,4.8,6.7,0.0,49.595667,3.610333
125,2010-01-01 03:00:00,0.0,0.0,0.3,95,94,95,4.2,5.3,7.4,0.0,49.595667,3.610333
126,2010-01-01 04:00:00,0.0,-0.1,0.0,93,93,95,4.3,4.7,7.0,0.0,49.595667,3.610333
...,...,...,...,...,...,...,...,...,...,...,...,...,...
1196379,2023-12-31 19:00:00,8.8,8.6,8.9,71,70,72,8.9,10.6,14.7,0.0,49.566000,4.036500
1196380,2023-12-31 20:00:00,8.7,8.7,9.0,70,69,71,6.8,9.1,13.9,0.0,49.566000,4.036500
1196381,2023-12-31 21:00:00,8.2,8.2,9.0,72,70,73,6.3,7.4,12.0,0.0,49.566000,4.036500
1196382,2023-12-31 22:00:00,7.4,7.3,8.3,77,72,77,8.6,8.8,13.1,0.0,49.566000,4.036500


In [42]:
df_weather = df_weather.rename(columns={'AAAAMMJJHH': 'DateTime'})


In [43]:
df_weather.dtypes

DateTime    datetime64[ns]
T                   object
TN                  object
TX                  object
U                   object
UN                  object
UX                  object
FF                  object
FXY                 object
FXI                 object
RR1                 object
LAT                 object
LON                 object
dtype: object

In [44]:
# Liste des colonnes à convertir en float (en divisant par 10 quand nécessaire)
cols_to_convert = ['T', 'TN', 'TX', 'U', 'UN', 'UX', 'FF', 'FXY', 'FXI', 'RR1']

# Conversion sécurisée (valeurs invalides → NaN) puis division par 10 quand applicable
for col in cols_to_convert:
    df_weather[col] = pd.to_numeric(df_weather[col], errors='coerce')
df_weather

Unnamed: 0,DateTime,T,TN,TX,U,UN,UX,FF,FXY,FXI,RR1,LAT,LON
122,2010-01-01 00:00:00,0.6,0.5,0.7,94,92,94,4.0,4.4,6.2,0.0,49.595667,3.610333
123,2010-01-01 01:00:00,0.5,0.4,0.6,93,93,95,4.3,4.7,6.5,0.0,49.595667,3.610333
124,2010-01-01 02:00:00,0.2,0.2,0.5,94,92,94,4.1,4.8,6.7,0.0,49.595667,3.610333
125,2010-01-01 03:00:00,0.0,0.0,0.3,95,94,95,4.2,5.3,7.4,0.0,49.595667,3.610333
126,2010-01-01 04:00:00,0.0,-0.1,0.0,93,93,95,4.3,4.7,7.0,0.0,49.595667,3.610333
...,...,...,...,...,...,...,...,...,...,...,...,...,...
1196379,2023-12-31 19:00:00,8.8,8.6,8.9,71,70,72,8.9,10.6,14.7,0.0,49.566000,4.036500
1196380,2023-12-31 20:00:00,8.7,8.7,9.0,70,69,71,6.8,9.1,13.9,0.0,49.566000,4.036500
1196381,2023-12-31 21:00:00,8.2,8.2,9.0,72,70,73,6.3,7.4,12.0,0.0,49.566000,4.036500
1196382,2023-12-31 22:00:00,7.4,7.3,8.3,77,72,77,8.6,8.8,13.1,0.0,49.566000,4.036500


In [None]:

# Diviser par 10 pour les colonnes concernées
cols_div10 = ['T', 'TN', 'TX', 'FF', 'FXY', 'FXI', 'RR1']
df_weather[cols_div10] = df_weather[cols_div10] / 10

In [35]:
df_weather

Unnamed: 0,DateTime,T,TN,TX,U,UN,UX,FF,FXY,FXI,RR1,LAT,LON
122,2010-01-01 00:00:00,0.06,0.05,0.07,94,92,94,0.40,0.44,0.62,0.0,49.595667,3.610333
123,2010-01-01 01:00:00,0.05,0.04,0.06,93,93,95,0.43,0.47,0.65,0.0,49.595667,3.610333
124,2010-01-01 02:00:00,0.02,0.02,0.05,94,92,94,0.41,0.48,0.67,0.0,49.595667,3.610333
125,2010-01-01 03:00:00,0.00,0.00,0.03,95,94,95,0.42,0.53,0.74,0.0,49.595667,3.610333
126,2010-01-01 04:00:00,0.00,-0.01,0.00,93,93,95,0.43,0.47,0.70,0.0,49.595667,3.610333
...,...,...,...,...,...,...,...,...,...,...,...,...,...
1196379,2023-12-31 19:00:00,0.88,0.86,0.89,71,70,72,0.89,1.06,1.47,0.0,49.566000,4.036500
1196380,2023-12-31 20:00:00,0.87,0.87,0.90,70,69,71,0.68,0.91,1.39,0.0,49.566000,4.036500
1196381,2023-12-31 21:00:00,0.82,0.82,0.90,72,70,73,0.63,0.74,1.20,0.0,49.566000,4.036500
1196382,2023-12-31 22:00:00,0.74,0.73,0.83,77,72,77,0.86,0.88,1.31,0.0,49.566000,4.036500
