## Data Cleaning: Weather Data

This notebook will work through the steps to clean the weather data that were downloaded from `Open-Meteo`^ to obtain only the relevant data required for this project.

^Link to API: https://open-meteo.com/en/docs/historical-weather-api  
Hourly OR weekly weather data from 2012 to 2022 were downloaded to create this dataset.

In [1]:
from datetime import date
import pandas as pd

### Load the data

We have 2 weather csv files:
1. `weather.csv` which contains **daily** data for:
    * temperature (minimum, maximum, mean)
    * apparent temperature (minimum, maximum, mean)
    * sunrise and sunset time
    * shortwave radiation energy (sum; in mega-Joules per square meter)
    * precipitation amount (sum; in millimeters)
    * rain amount (sum; in millimeters)
    * number of hours of precipitation
    * wind speed (maximum; in kilometers per hour)
    * wind gusts (maximum; in kilometers per hour)
    * wind direction (in degrees)
    * evapotranspiration rate (mean; in millimeters)
    
2. `weather2.csv` which contains **hourly** data for:
    * relative humidity (mean of current hour; in percent)
    * direct radiation amount (mean of **preceding** hour; in Watts per square meter)
    * wind speed (mean of current hour; in kilometers per hour)
    
As relative humidity and number of hours of sunshine data as the mean for each day (ie. daily data) are not directly provided by the API, hourly data for the relative humidity and direct radiation (which is used to calculate number of hours of sunshine) had to be obtained through the hourly data queries using the API.

Similarly, the API only provided maximum windspeed data for daily queries, thus hourly mean wind speed was queried.

Hence, the daily and hourly data were queried separately using the API and stored as 2 different csv files.

### Daily data

In [2]:
weather1 = pd.read_csv('../../data/weather/weather.csv', skiprows=3)
weather1

Unnamed: 0,time,weathercode (wmo code),temperature_2m_max (°C),temperature_2m_min (°C),temperature_2m_mean (°C),apparent_temperature_max (°C),apparent_temperature_min (°C),apparent_temperature_mean (°C),sunrise (iso8601),sunset (iso8601),shortwave_radiation_sum (MJ/m²),precipitation_sum (mm),rain_sum (mm),precipitation_hours (h),windspeed_10m_max (km/h),windgusts_10m_max (km/h),winddirection_10m_dominant (°),et0_fao_evapotranspiration (mm)
0,2012-01-01,53,31.0,24.4,27.1,38.7,29.3,32.7,2012-01-01T07:04,2012-01-01T19:09,20.33,1.9,1.9,7.0,10.4,23.0,30,4.08
1,2012-01-02,51,29.2,24.9,26.8,35.3,29.3,31.5,2012-01-02T07:05,2012-01-02T19:10,20.20,0.7,0.7,4.0,16.6,32.0,23,3.87
2,2012-01-03,51,29.4,25.2,27.0,34.7,29.4,31.3,2012-01-03T07:05,2012-01-03T19:10,19.81,0.5,0.5,2.0,19.2,34.6,27,3.98
3,2012-01-04,51,29.6,24.9,26.9,33.6,28.5,30.6,2012-01-04T07:06,2012-01-04T19:11,20.19,0.1,0.1,1.0,20.4,36.7,24,4.26
4,2012-01-05,51,29.2,24.7,26.6,34.6,28.1,30.3,2012-01-05T07:06,2012-01-05T19:11,19.09,0.6,0.6,2.0,20.0,37.4,18,3.91
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4013,2022-12-27,51,29.5,25.5,27.0,33.7,28.4,30.3,2022-12-27T07:02,2022-12-27T19:07,17.24,0.5,0.5,4.0,18.5,33.5,24,4.08
4014,2022-12-28,51,28.4,25.5,26.7,31.7,28.7,30.0,2022-12-28T07:03,2022-12-28T19:07,18.04,0.9,0.9,6.0,19.4,34.6,10,3.99
4015,2022-12-29,53,29.4,25.4,26.6,34.5,29.0,30.4,2022-12-29T07:03,2022-12-29T19:08,18.45,1.7,1.7,7.0,17.5,37.1,3,3.84
4016,2022-12-30,51,30.5,25.1,27.1,33.8,28.3,30.1,2022-12-30T07:04,2022-12-30T19:09,19.53,1.6,1.6,7.0,22.3,40.7,359,4.51


In [3]:
# inspect the columns
cols = weather1.columns
cols

Index(['time', 'weathercode (wmo code)', 'temperature_2m_max (°C)',
       'temperature_2m_min (°C)', 'temperature_2m_mean (°C)',
       'apparent_temperature_max (°C)', 'apparent_temperature_min (°C)',
       'apparent_temperature_mean (°C)', 'sunrise (iso8601)',
       'sunset (iso8601)', 'shortwave_radiation_sum (MJ/m²)',
       'precipitation_sum (mm)', 'rain_sum (mm)', 'precipitation_hours (h)',
       'windspeed_10m_max (km/h)', 'windgusts_10m_max (km/h)',
       'winddirection_10m_dominant (°)', 'et0_fao_evapotranspiration (mm)'],
      dtype='object')

Rename the columns to make them more succinct and also easier to access. The degree symbols are problematic to type in code.

In [4]:
# rename only the relvant columns needed
weather1.rename({cols[2]: 'temp_max',
                 cols[3]: 'temp_min',
                 cols[4]: 'temp_mean',
                 cols[5]: 'temp_app_max',
                 cols[6]: 'temp_app_min',
                 cols[7]: 'temp_app_mean',
                 cols[12]: 'rain_mm',
                 cols[13]: 'rain_hours',
                 cols[17]: 'evapotranspiration'}, axis=1, inplace=True)

As mentioned in the notebook `data_analysis/data_cleaning/dengue.ipynb`, we standardize all data to be based on epidemiological weeks. We require some functions to correctly convert the dates to epi year/week accurately. For now, we keep the data at the daily level so that we can merge this data with the rest of the weather measures below.

In [5]:
def year_epi_start(year):
    """Return the ordinal for the first day 
    of the first epi week of the calendar year.
    """
    jan1 = date(year, 1, 1)
    jan1_ord = jan1.toordinal()
    jan1_day = jan1.weekday()
    week1_start_ord = jan1_ord - jan1_day - 1
    if jan1_day > 2:
        week1_start_ord += 7
    return week1_start_ord

def date_to_epi(date_obj, year_only=False, week_only=False):
    """Return the epi year and/or week."""
    year = date_obj.year
    date_ord = date_obj.toordinal()
    year_start_ord = year_epi_start(year)
    week = (date_ord - year_start_ord) // 7
    
    if week < 0:  # epi week 1 starts from the previous calendar year
        year -= 1
        year_start_ord = year_epi_start(year)
        week = (date_ord - year_start_ord) // 7
    elif week >= 52:  # last epi week spills over to the next calendar year
        year_start_ord = year_epi_start(year + 1)
        if date_ord >= year_start_ord:
            year += 1
            week = 0
    
    week += 1
    
    if year_only:
        return year
    elif week_only:
        return week
    
    return (year, week)

In [6]:
# extract epi year and epi week from `time` column 
weather1['time'] = pd.to_datetime(weather1['time'])
weather1['year'] = weather1['time'].apply(date_to_epi, year_only=True)
weather1['week'] = weather1['time'].apply(date_to_epi, week_only=True)
weather1

Unnamed: 0,time,weathercode (wmo code),temp_max,temp_min,temp_mean,temp_app_max,temp_app_min,temp_app_mean,sunrise (iso8601),sunset (iso8601),shortwave_radiation_sum (MJ/m²),precipitation_sum (mm),rain_mm,rain_hours,windspeed_10m_max (km/h),windgusts_10m_max (km/h),winddirection_10m_dominant (°),evapotranspiration,year,week
0,2012-01-01,53,31.0,24.4,27.1,38.7,29.3,32.7,2012-01-01T07:04,2012-01-01T19:09,20.33,1.9,1.9,7.0,10.4,23.0,30,4.08,2012,1
1,2012-01-02,51,29.2,24.9,26.8,35.3,29.3,31.5,2012-01-02T07:05,2012-01-02T19:10,20.20,0.7,0.7,4.0,16.6,32.0,23,3.87,2012,1
2,2012-01-03,51,29.4,25.2,27.0,34.7,29.4,31.3,2012-01-03T07:05,2012-01-03T19:10,19.81,0.5,0.5,2.0,19.2,34.6,27,3.98,2012,1
3,2012-01-04,51,29.6,24.9,26.9,33.6,28.5,30.6,2012-01-04T07:06,2012-01-04T19:11,20.19,0.1,0.1,1.0,20.4,36.7,24,4.26,2012,1
4,2012-01-05,51,29.2,24.7,26.6,34.6,28.1,30.3,2012-01-05T07:06,2012-01-05T19:11,19.09,0.6,0.6,2.0,20.0,37.4,18,3.91,2012,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4013,2022-12-27,51,29.5,25.5,27.0,33.7,28.4,30.3,2022-12-27T07:02,2022-12-27T19:07,17.24,0.5,0.5,4.0,18.5,33.5,24,4.08,2022,52
4014,2022-12-28,51,28.4,25.5,26.7,31.7,28.7,30.0,2022-12-28T07:03,2022-12-28T19:07,18.04,0.9,0.9,6.0,19.4,34.6,10,3.99,2022,52
4015,2022-12-29,53,29.4,25.4,26.6,34.5,29.0,30.4,2022-12-29T07:03,2022-12-29T19:08,18.45,1.7,1.7,7.0,17.5,37.1,3,3.84,2022,52
4016,2022-12-30,51,30.5,25.1,27.1,33.8,28.3,30.1,2022-12-30T07:04,2022-12-30T19:09,19.53,1.6,1.6,7.0,22.3,40.7,359,4.51,2022,52


Is it necessary to keep both precipitation and rain columns? Singapore does not ever snow...!

In [7]:
# the only probable form of precipitation in Singapore is likely only rain
# so precipitation_sum and rain_sum should be the same
(weather1['precipitation_sum (mm)'] != weather1['rain_mm']).sum()

0

Finally, keep only the required and relevant columns, and reorder them for easy reading/comparison.

In [8]:
weather1 = weather1[['time',
                     'year',
                     'week',
                     'temp_max',
                     'temp_min',
                     'temp_mean',
                     'temp_app_max',
                     'temp_app_min',
                     'temp_app_mean',
                     'rain_mm',
                     'rain_hours',
                     'evapotranspiration']].copy()
weather1

Unnamed: 0,time,year,week,temp_max,temp_min,temp_mean,temp_app_max,temp_app_min,temp_app_mean,rain_mm,rain_hours,evapotranspiration
0,2012-01-01,2012,1,31.0,24.4,27.1,38.7,29.3,32.7,1.9,7.0,4.08
1,2012-01-02,2012,1,29.2,24.9,26.8,35.3,29.3,31.5,0.7,4.0,3.87
2,2012-01-03,2012,1,29.4,25.2,27.0,34.7,29.4,31.3,0.5,2.0,3.98
3,2012-01-04,2012,1,29.6,24.9,26.9,33.6,28.5,30.6,0.1,1.0,4.26
4,2012-01-05,2012,1,29.2,24.7,26.6,34.6,28.1,30.3,0.6,2.0,3.91
...,...,...,...,...,...,...,...,...,...,...,...,...
4013,2022-12-27,2022,52,29.5,25.5,27.0,33.7,28.4,30.3,0.5,4.0,4.08
4014,2022-12-28,2022,52,28.4,25.5,26.7,31.7,28.7,30.0,0.9,6.0,3.99
4015,2022-12-29,2022,52,29.4,25.4,26.6,34.5,29.0,30.4,1.7,7.0,3.84
4016,2022-12-30,2022,52,30.5,25.1,27.1,33.8,28.3,30.1,1.6,7.0,4.51


### Hourly data

In [9]:
# hourly data for RH, direct irradiation, windspeed
weather2 = pd.read_csv('../../data/weather/weather2.csv', skiprows=3)
weather2

Unnamed: 0,time,relativehumidity_2m (%),direct_radiation (W/m²),windspeed_10m (km/h)
0,2012-01-01T00:00,94,0.0,5.4
1,2012-01-01T01:00,95,0.0,5.2
2,2012-01-01T02:00,95,0.0,5.4
3,2012-01-01T03:00,96,0.0,5.9
4,2012-01-01T04:00,94,0.0,5.1
...,...,...,...,...
96451,2023-01-01T19:00,78,13.0,12.6
96452,2023-01-01T20:00,82,0.0,12.2
96453,2023-01-01T21:00,84,0.0,10.6
96454,2023-01-01T22:00,86,0.0,9.8


In [10]:
cols = weather2.columns
cols

Index(['time', 'relativehumidity_2m (%)', 'direct_radiation (W/m²)',
       'windspeed_10m (km/h)'],
      dtype='object')

Rename the columns to be more succinct.

In [11]:
weather2 = weather2.rename({cols[1]: 'rh', cols[2]: 'irad', cols[3]: 'windspeed'}, axis=1)
weather2

Unnamed: 0,time,rh,irad,windspeed
0,2012-01-01T00:00,94,0.0,5.4
1,2012-01-01T01:00,95,0.0,5.2
2,2012-01-01T02:00,95,0.0,5.4
3,2012-01-01T03:00,96,0.0,5.9
4,2012-01-01T04:00,94,0.0,5.1
...,...,...,...,...
96451,2023-01-01T19:00,78,13.0,12.6
96452,2023-01-01T20:00,82,0.0,12.2
96453,2023-01-01T21:00,84,0.0,10.6
96454,2023-01-01T22:00,86,0.0,9.8


Recall that the data for direct irradiation is for the preceding hour, while for the other variables, it is for the current hour. Thus, we need to shift the `irad` column up by 1 (shift -1) row to match the `time` and other variables. Note that to account for this shift, the original hourly data downloaded from the API includes 1 extra day (Jan 1, 2023), so that there will be no missing data after the correction.

In [12]:
weather2['time'] = pd.to_datetime(weather2['time'])
# offset the data for irad by -1 hour
weather2['irad'] = weather2['irad'].shift(-1)
weather2

Unnamed: 0,time,rh,irad,windspeed
0,2012-01-01 00:00:00,94,0.0,5.4
1,2012-01-01 01:00:00,95,0.0,5.2
2,2012-01-01 02:00:00,95,0.0,5.4
3,2012-01-01 03:00:00,96,0.0,5.9
4,2012-01-01 04:00:00,94,0.0,5.1
...,...,...,...,...
96451,2023-01-01 19:00:00,78,0.0,12.6
96452,2023-01-01 20:00:00,82,0.0,12.2
96453,2023-01-01 21:00:00,84,0.0,10.6
96454,2023-01-01 22:00:00,86,0.0,9.8


In [13]:
# now remove data beyond period of interest
weather2 = weather2[(weather2['time'] >= '2012-01-01') & (weather2['time'] <= '2022-12-31')]
weather2

Unnamed: 0,time,rh,irad,windspeed
0,2012-01-01 00:00:00,94,0.0,5.4
1,2012-01-01 01:00:00,95,0.0,5.2
2,2012-01-01 02:00:00,95,0.0,5.4
3,2012-01-01 03:00:00,96,0.0,5.9
4,2012-01-01 04:00:00,94,0.0,5.1
...,...,...,...,...
96404,2022-12-30 20:00:00,78,0.0,15.7
96405,2022-12-30 21:00:00,81,0.0,15.2
96406,2022-12-30 22:00:00,81,0.0,14.4
96407,2022-12-30 23:00:00,83,0.0,13.7


#### Number of hours of sunshine data
The number of hours of sunshine in a day is determined by the number of hours where the direct irradiance from the sun is at least 120 Watts per square meter^. Thus, we can calculate, for each hour, whether `irad` is at least 120, and create a new column `sun` which shows `1` if true (direct irradiance at least 120) and `0` otherwise.

In [14]:
# hour has sun if irad is at least 120
weather2['sun'] = (weather2['irad'] >= 120) * 1
weather2['irad']

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  weather2['sun'] = (weather2['irad'] >= 120) * 1


0        0.0
1        0.0
2        0.0
3        0.0
4        0.0
        ... 
96404    0.0
96405    0.0
96406    0.0
96407    0.0
96408    0.0
Name: irad, Length: 96409, dtype: float64

Then, since we are only interested in statistics aggregated by epi week, we can first calculate the number of hours of sunshine per day, by adding up the number of `1`s in `sun` for each day.

In [15]:
# sum number of hours with sunshine for each day
weather2_daily = weather2.groupby(weather2['time'].dt.date)['sun'].sum().reset_index()
weather2_daily.rename({'sun': 'sun_hours'}, axis=1, inplace=True)
weather2_daily['time'] = pd.to_datetime(weather2_daily['time'])
weather2_daily

Unnamed: 0,time,sun_hours
0,2012-01-01,10
1,2012-01-02,10
2,2012-01-03,10
3,2012-01-04,10
4,2012-01-05,9
...,...,...
4013,2022-12-27,7
4014,2022-12-28,9
4015,2022-12-29,8
4016,2022-12-30,10


Similarly, for relative humidity and windspeed, we can calculate their corresponding daily means.

In [16]:
# get mean rh and windspeed for each day
weather2_daily[['rh_mean', 'windspeed_mean']] = weather2.groupby(weather2['time'].dt.date)[['rh', 'windspeed']].mean().reset_index()[['rh', 'windspeed']].round(decimals=1)
weather2_daily

Unnamed: 0,time,sun_hours,rh_mean,windspeed_mean
0,2012-01-01,10,86.7,7.3
1,2012-01-02,10,87.9,13.5
2,2012-01-03,10,85.5,15.0
3,2012-01-04,10,81.9,15.7
4,2012-01-05,9,84.9,17.1
...,...,...,...,...
4013,2022-12-27,7,78.4,15.9
4014,2022-12-28,9,79.1,15.1
4015,2022-12-29,8,81.8,13.7
4016,2022-12-30,10,76.7,16.0


### Merging the two together

Now, with data at the daily level for all the weather measures, we can combine them together into one single dataframe.

In [17]:
# combine both dfs together
weather = weather1.merge(weather2_daily, on='time', how='inner')
weather.rename({'time': 'dt'}, axis=1, inplace=True)
weather

Unnamed: 0,dt,year,week,temp_max,temp_min,temp_mean,temp_app_max,temp_app_min,temp_app_mean,rain_mm,rain_hours,evapotranspiration,sun_hours,rh_mean,windspeed_mean
0,2012-01-01,2012,1,31.0,24.4,27.1,38.7,29.3,32.7,1.9,7.0,4.08,10,86.7,7.3
1,2012-01-02,2012,1,29.2,24.9,26.8,35.3,29.3,31.5,0.7,4.0,3.87,10,87.9,13.5
2,2012-01-03,2012,1,29.4,25.2,27.0,34.7,29.4,31.3,0.5,2.0,3.98,10,85.5,15.0
3,2012-01-04,2012,1,29.6,24.9,26.9,33.6,28.5,30.6,0.1,1.0,4.26,10,81.9,15.7
4,2012-01-05,2012,1,29.2,24.7,26.6,34.6,28.1,30.3,0.6,2.0,3.91,9,84.9,17.1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4013,2022-12-27,2022,52,29.5,25.5,27.0,33.7,28.4,30.3,0.5,4.0,4.08,7,78.4,15.9
4014,2022-12-28,2022,52,28.4,25.5,26.7,31.7,28.7,30.0,0.9,6.0,3.99,9,79.1,15.1
4015,2022-12-29,2022,52,29.4,25.4,26.6,34.5,29.0,30.4,1.7,7.0,3.84,8,81.8,13.7
4016,2022-12-30,2022,52,30.5,25.1,27.1,33.8,28.3,30.1,1.6,7.0,4.51,10,76.7,16.0


However, as mentioned, we need the data to be at the epi week level, so we aggregate the data by epi week, which can be easily done since the `year` and `week` columns give the respective epi year/weeks!

In [29]:
weather_weekly = weather.groupby(['year', 'week']).mean(numeric_only=True)
weather_weekly

Unnamed: 0_level_0,Unnamed: 1_level_0,temp_max,temp_min,temp_mean,temp_app_max,temp_app_min,temp_app_mean,rain_mm,rain_hours,evapotranspiration,sun_hours,rh_mean,windspeed_mean
year,week,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
2012,1,29.528571,24.728571,26.785714,34.714286,28.542857,30.857143,1.271429,3.571429,3.982857,9.714286,84.814286,14.957143
2012,2,28.014286,24.128571,25.642857,32.357143,27.571429,29.414286,6.114286,12.714286,2.622857,4.428571,88.157143,14.442857
2012,3,29.757143,24.700000,26.828571,36.985714,29.600000,32.371429,5.500000,6.714286,3.771429,8.857143,86.557143,6.171429
2012,4,29.557143,24.342857,26.371429,34.457143,27.842857,30.242857,4.985714,4.285714,3.855714,7.714286,84.400000,14.271429
2012,5,28.885714,24.614286,26.357143,34.385714,28.657143,30.914286,4.857143,11.285714,3.287143,6.714286,86.857143,11.428571
...,...,...,...,...,...,...,...,...,...,...,...,...,...
2022,48,30.214286,25.014286,26.842857,36.514286,29.771429,32.242857,14.557143,16.428571,3.401429,7.571429,86.714286,7.128571
2022,49,30.114286,24.985714,26.814286,36.271429,29.671429,32.114286,11.642857,16.714286,2.910000,5.571429,86.471429,7.185714
2022,50,28.542857,25.042857,26.328571,34.014286,29.628571,31.414286,15.171429,18.714286,2.588571,3.857143,88.114286,8.271429
2022,51,29.114286,24.842857,26.428571,34.285714,28.742857,30.828571,8.285714,14.285714,3.042857,6.142857,86.528571,12.314286


In [30]:
weather_weekly['rain_sum'] = weather[['year', 'week', 'rain_mm']].groupby(['year', 'week']).sum()
weather_weekly

Unnamed: 0_level_0,Unnamed: 1_level_0,temp_max,temp_min,temp_mean,temp_app_max,temp_app_min,temp_app_mean,rain_mm,rain_hours,evapotranspiration,sun_hours,rh_mean,windspeed_mean,rain_sum
year,week,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
2012,1,29.528571,24.728571,26.785714,34.714286,28.542857,30.857143,1.271429,3.571429,3.982857,9.714286,84.814286,14.957143,8.9
2012,2,28.014286,24.128571,25.642857,32.357143,27.571429,29.414286,6.114286,12.714286,2.622857,4.428571,88.157143,14.442857,42.8
2012,3,29.757143,24.700000,26.828571,36.985714,29.600000,32.371429,5.500000,6.714286,3.771429,8.857143,86.557143,6.171429,38.5
2012,4,29.557143,24.342857,26.371429,34.457143,27.842857,30.242857,4.985714,4.285714,3.855714,7.714286,84.400000,14.271429,34.9
2012,5,28.885714,24.614286,26.357143,34.385714,28.657143,30.914286,4.857143,11.285714,3.287143,6.714286,86.857143,11.428571,34.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2022,48,30.214286,25.014286,26.842857,36.514286,29.771429,32.242857,14.557143,16.428571,3.401429,7.571429,86.714286,7.128571,101.9
2022,49,30.114286,24.985714,26.814286,36.271429,29.671429,32.114286,11.642857,16.714286,2.910000,5.571429,86.471429,7.185714,81.5
2022,50,28.542857,25.042857,26.328571,34.014286,29.628571,31.414286,15.171429,18.714286,2.588571,3.857143,88.114286,8.271429,106.2
2022,51,29.114286,24.842857,26.428571,34.285714,28.742857,30.828571,8.285714,14.285714,3.042857,6.142857,86.528571,12.314286,58.0


In [31]:
weather_weekly = weather_weekly.reset_index()
weather_weekly

Unnamed: 0,year,week,temp_max,temp_min,temp_mean,temp_app_max,temp_app_min,temp_app_mean,rain_mm,rain_hours,evapotranspiration,sun_hours,rh_mean,windspeed_mean,rain_sum
0,2012,1,29.528571,24.728571,26.785714,34.714286,28.542857,30.857143,1.271429,3.571429,3.982857,9.714286,84.814286,14.957143,8.9
1,2012,2,28.014286,24.128571,25.642857,32.357143,27.571429,29.414286,6.114286,12.714286,2.622857,4.428571,88.157143,14.442857,42.8
2,2012,3,29.757143,24.700000,26.828571,36.985714,29.600000,32.371429,5.500000,6.714286,3.771429,8.857143,86.557143,6.171429,38.5
3,2012,4,29.557143,24.342857,26.371429,34.457143,27.842857,30.242857,4.985714,4.285714,3.855714,7.714286,84.400000,14.271429,34.9
4,2012,5,28.885714,24.614286,26.357143,34.385714,28.657143,30.914286,4.857143,11.285714,3.287143,6.714286,86.857143,11.428571,34.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
569,2022,48,30.214286,25.014286,26.842857,36.514286,29.771429,32.242857,14.557143,16.428571,3.401429,7.571429,86.714286,7.128571,101.9
570,2022,49,30.114286,24.985714,26.814286,36.271429,29.671429,32.114286,11.642857,16.714286,2.910000,5.571429,86.471429,7.185714,81.5
571,2022,50,28.542857,25.042857,26.328571,34.014286,29.628571,31.414286,15.171429,18.714286,2.588571,3.857143,88.114286,8.271429,106.2
572,2022,51,29.114286,24.842857,26.428571,34.285714,28.742857,30.828571,8.285714,14.285714,3.042857,6.142857,86.528571,12.314286,58.0


### Save the cleaned dataframe as a csv

So that we can easily read it in during analysis later on!

In [32]:
weather_weekly.to_csv('../../data/cleaned/weather_clean.csv', index=False)