## Identifying Heat Stress Periods Based on Milk Production

In this approach, we aim to identify periods of heat stress by analyzing the milk production data of dairy cows. Heat stress negatively impacts the health and productivity of cows, often resulting in a noticeable decline in milk yield.


In [10]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
from scipy.optimize import curve_fit, OptimizeWarning
from tqdm import tqdm
import warnings

sns.set_theme()
sns.set_context("notebook")
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [11]:
dtype_dict = {
    'FarmName_Pseudo': 'str',
    'SE_Number': 'str',
    'AnimalNumber': 'Int64',          
    'StartDate': 'str',
    'StartTime': 'str',
    'DateTime': 'str',
    'LactationNumber': 'Int64',       
    'DaysInMilk': 'Int64', 
    'YearSeason': 'str',           
    'TotalYield': 'float',
    'DateTime': 'str',
    'BreedName': 'str',
    'Age': 'Int64',
    'Mother': 'str',
    'Father': 'str',
    'CullDecisionDate': 'str',
    'Temperature': 'float',
    'RelativeHumidity': 'float',      
    'THI_adj': 'float',
    'HW': 'Int64',                    
    'cum_HW': 'Int64',                
    'Temp15Threshold': 'Int64'        
}


# Load the CSV with specified dtypes
data = pd.read_csv('../Data/MergedData/CleanedYieldData.csv', dtype=dtype_dict)

# Convert date and time columns back to datetime and time objects
data['DateTime'] = pd.to_datetime(data['DateTime'], errors='coerce')
data['StartTime'] = pd.to_datetime(data['StartTime'], format='%H:%M:%S', errors='coerce').dt.time
data['StartDate'] = pd.to_datetime(data['StartDate'], errors='coerce')
data['CullDecisionDate'] = pd.to_datetime(data['CullDecisionDate'], errors='coerce')
data['DateTime'] = pd.to_datetime(data['DateTime'], errors='coerce')
data.head()

Unnamed: 0,FarmName_Pseudo,SE_Number,AnimalNumber,StartDate,StartTime,LactationNumber,DaysInMilk,TotalYield,DateTime,YearSeason,...,Mother,Father,CullDecisionDate,Temperature,RelativeHumidity,THI_adj,HW,cum_HW,Temp15Threshold,Age
0,a624fb9a,SE-064c0cec-1189,5189,2022-01-01,06:25:00,7,191,13.9,2022-01-01 06:25:00,2022-1,...,,,2022-12-20,-3.025,0.930917,28.012944,0,0,0,3095
1,a624fb9a,SE-064c0cec-1189,5189,2022-01-01,16:41:00,7,191,16.87,2022-01-01 16:41:00,2022-1,...,,,2022-12-20,-3.025,0.930917,28.012944,0,0,0,3095
2,a624fb9a,SE-064c0cec-1189,5189,2022-01-02,15:29:00,7,192,20.41,2022-01-02 15:29:00,2022-1,...,,,2022-12-20,-0.279167,0.990542,32.898193,0,0,0,3096
3,a624fb9a,SE-064c0cec-1189,5189,2022-01-02,03:31:00,7,192,16.28,2022-01-02 03:31:00,2022-1,...,,,2022-12-20,-0.279167,0.990542,32.898193,0,0,0,3096
4,a624fb9a,SE-064c0cec-1189,5189,2022-01-02,22:44:00,7,192,11.53,2022-01-02 22:44:00,2022-1,...,,,2022-12-20,-0.279167,0.990542,32.898193,0,0,0,3096


In this approach, we need to do a day-by-day approach based on the milk production. So we transfrom the data to being one instance per day for each cow, instead of being one instance per milking session.

In [12]:
# Calculate the DailyYield for each cow each day
data['DailyYield'] = data.groupby(['SE_Number', 'StartDate'])['TotalYield'].transform('sum')

# Sort the data by AnimalNumber and StartDate
data.sort_values(['AnimalNumber', 'StartDate'], inplace=True)

# Calculate the previous day's total yield for each cow
data['PreviousDailyYield'] = data.groupby('AnimalNumber')['DailyYield'].shift(1)

# Calculate the daily yield change for each cow
data['DailyYieldChange'] = data['DailyYield'] - data['PreviousDailyYield']

# Group and aggregate data
data = data.groupby(['SE_Number', 'FarmName_Pseudo', 'StartDate']).agg({
    'DailyYield': 'first',
    'PreviousDailyYield': 'first',
    'DailyYieldChange': 'first',
    'HW': 'max',
    'Temperature': 'mean',
    'THI_adj': 'mean',
    'DaysInMilk': 'first',
    'YearSeason': 'first',
    'cum_HW': 'max',
    'Temp15Threshold': 'max',
    'Age': 'first',
    'BreedName': 'first',
    'LactationNumber': 'first'
}).reset_index()

# Renaming and formatting
data.rename(columns={
    'Temperature': 'MeanTemperature',
    'THI_adj': 'MeanTHI_adj',
    'StartDate': 'Date'
}, inplace=True)
data['Date'] = pd.to_datetime(data['Date'])

# Display the first few rows of the transformed data
data.head()

Unnamed: 0,SE_Number,FarmName_Pseudo,Date,DailyYield,PreviousDailyYield,DailyYieldChange,HW,MeanTemperature,MeanTHI_adj,DaysInMilk,YearSeason,cum_HW,Temp15Threshold,Age,BreedName,LactationNumber
0,SE-064c0cec-1189,a624fb9a,2022-01-01,30.77,30.77,0.0,0,-3.025,28.012944,191,2022-1,0,0,3095,02 SLB,7
1,SE-064c0cec-1189,a624fb9a,2022-01-02,48.22,30.77,17.45,0,-0.279167,32.898193,192,2022-1,0,0,3096,02 SLB,7
2,SE-064c0cec-1189,a624fb9a,2022-01-03,30.53,48.22,-17.69,0,2.033333,36.760487,193,2022-1,0,0,3097,02 SLB,7
3,SE-064c0cec-1189,a624fb9a,2022-01-04,42.26,30.53,11.73,0,0.066667,31.939524,194,2022-1,0,0,3098,02 SLB,7
4,SE-064c0cec-1189,a624fb9a,2022-01-05,38.49,42.26,-3.77,0,-3.7,26.498206,195,2022-1,0,0,3099,02 SLB,7


In [13]:
# Check if DailyYield is centered around approx the same for each farm
print("Mean of DailyYield:", data.groupby('FarmName_Pseudo')['DailyYield'].mean())
print("Standard Deviation of DailyYield:", data.groupby('FarmName_Pseudo')['DailyYield'].std())

Mean of DailyYield: FarmName_Pseudo
5c06d92d    37.389675
752efd72    31.151716
a624fb9a    33.413694
f454e660    30.485127
Name: DailyYield, dtype: float64
Standard Deviation of DailyYield: FarmName_Pseudo
5c06d92d     9.960240
752efd72     7.799288
a624fb9a    11.050811
f454e660    11.833056
Name: DailyYield, dtype: float64


## Wood's Lactation Curve
$$
Y(t) = at^be^{-ct}
$$
- $Y(t)$ yield at time $t$ post-calving
- $a$ correlates with the initial production level post-calving
- $b$ governs the incline rate of milk production
- $c$ dictates the decline rate after the peak production 


Normalize the dataset using Wood's lactattion curve and set thresholds for some outliers which have unreasonable values.

In [14]:
# Function to plot the DailyYield and ExpectedYield
def plot_daily_vs_expected(data, farm_id):
    farm_data = data[data['FarmName_Pseudo'] == farm_id]
    unique_animals = farm_data['SE_Number'].unique()

    for animal in unique_animals:
        animal_data = farm_data[farm_data['SE_Number'] == animal]
        plt.figure(figsize=(10, 6))
        plt.plot(animal_data['Date'], animal_data['DailyYield'], label='Daily Yield')
        plt.plot(animal_data['Date'], animal_data['ExpectedYield'], label='Expected Yield', linestyle='--')
        plt.title(f'Daily Yield vs Expected Yield for {animal}')
        plt.xlabel('Date')
        plt.ylabel('Yield')
        plt.legend()
        plt.show()

# Define the Wood's Lactation Curve function
def woods_lactation_curve(dim, a, b, c):
    dim = np.array(dim, dtype=float)
    return a * dim**b * np.exp(-c * dim)

# Function to fit the Wood's Lactation Curve to the dataset
def fit_woods_lactation_curve(dataset):
    # Initialize the 'ExpectedYield' column to NaN
    dataset['ExpectedYield'] = np.nan
    
    # Group the dataset by 'SE_Number' and fit the curve for each cow
    for animal_number, group in tqdm(dataset.groupby('SE_Number'), unit=" Cows"):
        # Prepare the data for fitting
        x_data = group['DaysInMilk'].values
        y_data = group['DailyYield'].values
        
        # Ensure there are no NaN or infinite values in the data
        if not np.isfinite(x_data).all() or not np.isfinite(y_data).all():
            print(f"Non-finite values found for cow {animal_number}, skipping.")
            continue
        
        # Ensure there are enough data points to fit the curve
        if len(x_data) < 50 or len(y_data) < 50:
            print(f"Insufficient data points for cow {animal_number}, skipping.")
            continue
        
        # Fit the model
        try:
            # Initial parameter guesses
            initial_guesses = [max(y_data), 0.4, 0.0001]
            # Bounds on the parameters to prevent overflow
            bounds = ([0, 0, 0], [np.inf, 1, 0.1])
            
            with warnings.catch_warnings():
                warnings.filterwarnings('error', category=OptimizeWarning)
                try:
                    popt, pcov = curve_fit(
                        woods_lactation_curve, x_data, y_data,
                        p0=initial_guesses, bounds=bounds, maxfev=10000
                    )
                    
                    # Predict the expected yield using the fitted model
                    group['ExpectedYield'] = woods_lactation_curve(group['DaysInMilk'], *popt)
                    
                    # Normalize the DailyYield
                    group['NormalizedDailyYield'] = group['DailyYield'] / group['ExpectedYield']
                    
                    # Calculate the daily yield change and normalize it
                    group['PreviousDailyYield'] = group['DailyYield'].shift(1)
                    group['DailyYieldChange'] = group['DailyYield'] - group['PreviousDailyYield']
                    group['NormalizedDailyYieldChange'] = group['DailyYieldChange'] / group['ExpectedYield']
                    
                    # Update the dataset with the fitted data
                    dataset.loc[group.index, 'ExpectedYield'] = group['ExpectedYield']
                    dataset.loc[group.index, 'NormalizedDailyYield'] = group['NormalizedDailyYield']
                    dataset.loc[group.index, 'PreviousDailyYield'] = group['PreviousDailyYield']
                    dataset.loc[group.index, 'DailyYieldChange'] = group['DailyYieldChange']
                    dataset.loc[group.index, 'NormalizedDailyYieldChange'] = group['NormalizedDailyYieldChange']
                
                except OptimizeWarning:
                    print(f"OptimizeWarning for cow {animal_number}, skipping.")
            
        except RuntimeError as e:
            print(f"Curve fit failed for cow {animal_number}: {e}")
        except ValueError as e:
            print(f"Value error for cow {animal_number}: {e}")
    
    # Fill any NaN values in the newly created columns with 0
    dataset['ExpectedYield'] = dataset['ExpectedYield'].fillna(0)
    dataset['NormalizedDailyYield'] = dataset['NormalizedDailyYield'].fillna(0)
    dataset['PreviousDailyYield'] = dataset['PreviousDailyYield'].fillna(0)
    dataset['DailyYieldChange'] = dataset['DailyYieldChange'].fillna(0)
    dataset['NormalizedDailyYieldChange'] = dataset['NormalizedDailyYieldChange'].fillna(0)
    
    return dataset

# Apply the curve fitting function to your dataset
data = fit_woods_lactation_curve(data)

  1%|          | 11/1277 [00:00<00:11, 107.19 Cows/s]

Insufficient data points for cow SE-5c06d92d-2327, skipping.
Insufficient data points for cow SE-5c06d92d-2328, skipping.


  4%|▎         | 45/1277 [00:00<00:07, 161.61 Cows/s]

Insufficient data points for cow SE-5c06d92d-2502, skipping.
Insufficient data points for cow SE-5c06d92d-2545, skipping.


 12%|█▏        | 153/1277 [00:00<00:05, 194.26 Cows/s]

Insufficient data points for cow SE-5c06d92d-2815, skipping.
Insufficient data points for cow SE-5c06d92d-2824, skipping.
Insufficient data points for cow SE-5c06d92d-2845, skipping.
Insufficient data points for cow SE-5c06d92d-2885, skipping.


 27%|██▋       | 346/1277 [00:01<00:03, 250.79 Cows/s]

Insufficient data points for cow SE-5c06d92d-3197, skipping.


 40%|███▉      | 507/1277 [00:02<00:02, 287.55 Cows/s]

Insufficient data points for cow SE-5c06d92d-3471, skipping.
Insufficient data points for cow SE-5c06d92d-3490, skipping.
Insufficient data points for cow SE-5c06d92d-3492, skipping.
Insufficient data points for cow SE-5c06d92d-3494, skipping.
Insufficient data points for cow SE-5c06d92d-3499, skipping.
Insufficient data points for cow SE-5c06d92d-3500, skipping.
Insufficient data points for cow SE-5c06d92d-3512, skipping.
Insufficient data points for cow SE-5c06d92d-3514, skipping.
Insufficient data points for cow SE-5c06d92d-3515, skipping.
Insufficient data points for cow SE-5c06d92d-3516, skipping.
Insufficient data points for cow SE-5c06d92d-3522, skipping.
Insufficient data points for cow SE-5c06d92d-3524, skipping.
Insufficient data points for cow SE-5c06d92d-3526, skipping.
Insufficient data points for cow SE-5c06d92d-3528, skipping.
Insufficient data points for cow SE-5c06d92d-3530, skipping.
Insufficient data points for cow SE-5c06d92d-3536, skipping.
Insufficient data points

 48%|████▊     | 619/1277 [00:02<00:02, 248.04 Cows/s]

Insufficient data points for cow SE-752efd72-0173, skipping.
Insufficient data points for cow SE-752efd72-0198, skipping.


 51%|█████     | 645/1277 [00:02<00:02, 212.27 Cows/s]

Insufficient data points for cow SE-752efd72-0257, skipping.
Insufficient data points for cow SE-752efd72-0298, skipping.
Insufficient data points for cow SE-752efd72-0303, skipping.


 58%|█████▊    | 740/1277 [00:03<00:01, 277.13 Cows/s]

Insufficient data points for cow SE-752efd72-0316, skipping.
Insufficient data points for cow SE-752efd72-0317, skipping.
Insufficient data points for cow SE-752efd72-0320, skipping.
Insufficient data points for cow SE-752efd72-0329, skipping.
Insufficient data points for cow SE-752efd72-0339, skipping.
Insufficient data points for cow SE-752efd72-0351, skipping.
Insufficient data points for cow SE-752efd72-0354, skipping.
Insufficient data points for cow SE-752efd72-0369, skipping.


 66%|██████▌   | 842/1277 [00:03<00:01, 315.71 Cows/s]

Insufficient data points for cow SE-752efd72-0452, skipping.
Insufficient data points for cow SE-752efd72-0453, skipping.
Insufficient data points for cow SE-752efd72-0454, skipping.
Insufficient data points for cow SE-752efd72-0484, skipping.
Insufficient data points for cow SE-752efd72-0491, skipping.
Insufficient data points for cow SE-752efd72-0497, skipping.
Insufficient data points for cow SE-752efd72-0498, skipping.
Insufficient data points for cow SE-752efd72-0506, skipping.
Insufficient data points for cow SE-752efd72-0508, skipping.
Insufficient data points for cow SE-752efd72-0510, skipping.
Insufficient data points for cow SE-752efd72-0516, skipping.
Insufficient data points for cow SE-752efd72-0526, skipping.
Insufficient data points for cow SE-752efd72-0527, skipping.
Insufficient data points for cow SE-752efd72-0531, skipping.
Insufficient data points for cow SE-752efd72-0533, skipping.
Insufficient data points for cow SE-752efd72-0534, skipping.
Insufficient data points

 71%|███████▏  | 910/1277 [00:03<00:01, 230.76 Cows/s]

Insufficient data points for cow SE-752efd72-2705, skipping.
Insufficient data points for cow SE-752efd72-2729, skipping.


 73%|███████▎  | 938/1277 [00:04<00:01, 208.51 Cows/s]

Insufficient data points for cow SE-7fd04cd3-679, skipping.
Insufficient data points for cow SE-a624fb9a-1232, skipping.
Insufficient data points for cow SE-a624fb9a-1249, skipping.
Insufficient data points for cow SE-a624fb9a-1285, skipping.
Insufficient data points for cow SE-a624fb9a-1296, skipping.


 78%|███████▊  | 1002/1277 [00:04<00:01, 251.44 Cows/s]

Insufficient data points for cow SE-a624fb9a-1320, skipping.
Insufficient data points for cow SE-a624fb9a-1368, skipping.


 95%|█████████▍| 1213/1277 [00:05<00:00, 250.42 Cows/s]

Insufficient data points for cow SE-f454e660-0779, skipping.
Insufficient data points for cow SE-f454e660-0800, skipping.
Insufficient data points for cow SE-f454e660-0829, skipping.


100%|██████████| 1277/1277 [00:05<00:00, 230.13 Cows/s]

Insufficient data points for cow SE-f454e660-665, skipping.
Insufficient data points for cow SE-f454e660-729, skipping.





In [15]:
# Check if NormalizedDailyYield is centered around 1 for each unique farm
print("Mean of NormalizedDailyYield:", data.groupby('FarmName_Pseudo')['NormalizedDailyYield'].mean())
print("Standard Deviation of NormalizedDailyYield:", data.groupby('FarmName_Pseudo')['NormalizedDailyYield'].std())

Mean of NormalizedDailyYield: FarmName_Pseudo
5c06d92d    0.995988
752efd72    0.993615
a624fb9a    0.996941
f454e660    0.998148
Name: NormalizedDailyYield, dtype: float64
Standard Deviation of NormalizedDailyYield: FarmName_Pseudo
5c06d92d    0.183936
752efd72    0.170016
a624fb9a    0.222119
f454e660    0.252622
Name: NormalizedDailyYield, dtype: float64


In [16]:
# Define the threshold for NormalizedDailyYield
yield_threshold = 0.975

# Function to identify heat stress periods based on Temp15Threshold
def identify_heat_stress(group, yield_threshold):
    num_cows = len(group)
    num_cows_below_threshold = (group['NormalizedDailyYield'] < yield_threshold).sum()
    majority_below_threshold = num_cows_below_threshold > (num_cows / 2)
    
    # Define condition for heat stress based on Temp15Threshold
    temp_condition = group['Temp15Threshold'].any() == 1
    
    heat_stress = int(majority_below_threshold & temp_condition)
    return pd.Series({'HeatStress': heat_stress})

# Group data by FarmName_Pseudo and Date, then apply the function with include_groups=False
heat_stress_results = data.groupby(['FarmName_Pseudo', 'Date'], group_keys=False, as_index=False).apply(
    identify_heat_stress, yield_threshold=yield_threshold, include_groups=False
).reset_index(drop=True)

# Ensure there are no duplicate columns after merge
if 'HeatStress' in data.columns:
    data.drop(columns=['HeatStress'], inplace=True)

# Merge the results back to the main dataset
data = pd.merge(data, heat_stress_results, on=['FarmName_Pseudo', 'Date'], how='left')
data.head()

Unnamed: 0,SE_Number,FarmName_Pseudo,Date,DailyYield,PreviousDailyYield,DailyYieldChange,HW,MeanTemperature,MeanTHI_adj,DaysInMilk,YearSeason,cum_HW,Temp15Threshold,Age,BreedName,LactationNumber,ExpectedYield,NormalizedDailyYield,NormalizedDailyYieldChange,HeatStress
0,SE-064c0cec-1189,a624fb9a,2022-01-01,30.77,0.0,0.0,0,-3.025,28.012944,191,2022-1,0,0,3095,02 SLB,7,29.739372,1.034655,0.0,0
1,SE-064c0cec-1189,a624fb9a,2022-01-02,48.22,30.77,17.45,0,-0.279167,32.898193,192,2022-1,0,0,3096,02 SLB,7,29.692059,1.624003,0.587699,0
2,SE-064c0cec-1189,a624fb9a,2022-01-03,30.53,48.22,-17.69,0,2.033333,36.760487,193,2022-1,0,0,3097,02 SLB,7,29.644756,1.029862,-0.596733,0
3,SE-064c0cec-1189,a624fb9a,2022-01-04,42.26,30.53,11.73,0,0.066667,31.939524,194,2022-1,0,0,3098,02 SLB,7,29.597463,1.427825,0.396318,0
4,SE-064c0cec-1189,a624fb9a,2022-01-05,38.49,42.26,-3.77,0,-3.7,26.498206,195,2022-1,0,0,3099,02 SLB,7,29.550181,1.30253,-0.12758,0


In [17]:
# Filter the data for rows where HeatStress is 1
heat_stress_days = data[data['HeatStress'] == 1].copy()

# Convert 'Date' column to date only
heat_stress_days['Date'] = pd.to_datetime(heat_stress_days['Date']).dt.date

# Group by FarmName_Pseudo and list the heat stress days
heat_stress_summary = heat_stress_days.groupby('FarmName_Pseudo')['Date'].apply(list).reset_index()

# Print the heat stress days for each farm
for index, row in heat_stress_summary.iterrows():
    print(f"Farm: {row['FarmName_Pseudo']}")
    print(f"Heat Stress Days: {', '.join(map(str, row['Date']))}")
    print("")


Farm: 5c06d92d
Heat Stress Days: 2022-05-05, 2022-05-06, 2022-05-15, 2022-06-27, 2022-07-06, 2022-07-07, 2022-07-29, 2022-07-30, 2022-08-15, 2022-08-21, 2022-08-22, 2022-09-13, 2022-09-18, 2022-05-05, 2022-05-06, 2022-05-15, 2022-06-27, 2022-07-06, 2022-07-07, 2022-07-29, 2022-07-30, 2022-08-15, 2022-08-21, 2022-08-22, 2022-09-13, 2022-09-18, 2022-09-24, 2022-10-01, 2022-05-05, 2022-05-06, 2022-05-15, 2022-09-13, 2022-09-18, 2022-09-24, 2022-10-01, 2023-07-11, 2023-07-27, 2023-07-28, 2023-08-08, 2023-08-10, 2023-08-20, 2022-05-05, 2022-05-06, 2022-05-15, 2022-06-27, 2022-07-06, 2022-07-07, 2022-07-29, 2022-07-30, 2023-07-11, 2023-07-27, 2023-07-28, 2023-08-08, 2023-08-10, 2023-08-20, 2022-05-05, 2022-05-06, 2022-05-15, 2022-06-27, 2022-07-06, 2022-07-07, 2022-07-29, 2022-07-30, 2022-08-15, 2022-08-21, 2022-08-22, 2022-09-18, 2022-05-05, 2022-05-06, 2022-05-15, 2022-06-27, 2022-07-06, 2022-07-07, 2022-05-05, 2022-05-06, 2022-05-15, 2022-10-01, 2023-07-11, 2023-07-27, 2023-07-28, 2023-08

In [18]:
# Reorder columns
new_order = [
    "Date", "FarmName_Pseudo", "SE_Number", "Age", "BreedName", "LactationNumber", "DaysInMilk",'YearSeason', "DailyYield", "PreviousDailyYield", 
    "DailyYieldChange", "ExpectedYield", "NormalizedDailyYield", 
    "NormalizedDailyYieldChange", "HeatStress", "Temp15Threshold", "HW", 
    "cum_HW", "MeanTemperature", "MeanTHI_adj"
]
data = data[new_order]
data.head()

Unnamed: 0,Date,FarmName_Pseudo,SE_Number,Age,BreedName,LactationNumber,DaysInMilk,YearSeason,DailyYield,PreviousDailyYield,DailyYieldChange,ExpectedYield,NormalizedDailyYield,NormalizedDailyYieldChange,HeatStress,Temp15Threshold,HW,cum_HW,MeanTemperature,MeanTHI_adj
0,2022-01-01,a624fb9a,SE-064c0cec-1189,3095,02 SLB,7,191,2022-1,30.77,0.0,0.0,29.739372,1.034655,0.0,0,0,0,0,-3.025,28.012944
1,2022-01-02,a624fb9a,SE-064c0cec-1189,3096,02 SLB,7,192,2022-1,48.22,30.77,17.45,29.692059,1.624003,0.587699,0,0,0,0,-0.279167,32.898193
2,2022-01-03,a624fb9a,SE-064c0cec-1189,3097,02 SLB,7,193,2022-1,30.53,48.22,-17.69,29.644756,1.029862,-0.596733,0,0,0,0,2.033333,36.760487
3,2022-01-04,a624fb9a,SE-064c0cec-1189,3098,02 SLB,7,194,2022-1,42.26,30.53,11.73,29.597463,1.427825,0.396318,0,0,0,0,0.066667,31.939524
4,2022-01-05,a624fb9a,SE-064c0cec-1189,3099,02 SLB,7,195,2022-1,38.49,42.26,-3.77,29.550181,1.30253,-0.12758,0,0,0,0,-3.7,26.498206


In [19]:
# Save the reordered DataFrame to a CSV file
data.to_csv('../Data/MergedData/MilkApproachYieldData.csv', index=False)

### Variables Explanation for `MilkApproachYieldData.csv`

1. **Date**:
   - Description: The date when the milk yield was recorded.
   - Datatype: `datetime`
   - Format: `YYYY-MM-DD`
   - Example: `2022-01-01`

2. **FarmName_Pseudo**:
   - Description: A pseudo-identifier for the farm where the data was collected.
   - Datatype: `str`
   - Example: `a624fb9a`

3. **SE_Number**:
   - Description: A unique identifier for the cow, which has been formatted to include the farm and the animal number.
   - Datatype: `str`
   - Example: `SE-064c0cec-1189`

4. **Age**:
   - Description: The age of the cow in days.
   - Datatype: `Int64`
   - Example: `3095`

5. **LactationNumber**:
   - Description: The number assigned to the cow's lactation cycle.
   - Datatype: `Int64`
   - Example: `7`

6. **DaysInMilk**:
   - Description: The number of days the cow has been in milk (lactating) at the time of recording.
   - Datatype: `Int64`
   - Example: `191`

7. **YearSeason**:
   - Description: The seasonal period based on the year and the month range.
   - Datatype: `str`
   - Example: `2022-1`
   - YearSeason parameters in yield datasets:
     - 1: Dec-Feb
     - 2: Mar-May
     - 3: Jun-Aug
     - 4: Sep-Nov

8. **DailyYield**:
   - Description: The total amount of milk produced by the cow in a single day.
   - Datatype: `float`
   - Example: `30.77`

9. **PreviousDailyYield**:
   - Description: The total amount of milk produced by the cow on the previous day.
   - Datatype: `float`
   - Example: `0.0`

10. **DailyYieldChange**:
    - Description: The change in daily milk yield from the previous day.
    - Datatype: `float`
    - Example: `0.0`

11. **ExpectedYield**:
    - Description: The expected amount of milk yield based on certain models or predictions.
    - Datatype: `float`
    - Example: `29.73937171388362`

12. **NormalizedDailyYield**:
    - Description: The daily yield normalized to account for various factors.
    - Datatype: `float`
    - Example: `1.0346553483386214`

13. **NormalizedDailyYieldChange**:
    - Description: The change in normalized daily yield from the previous day.
    - Datatype: `float`
    - Example: `0.0`

14. **HeatStress**:
    - Description: A binary variable indicating the presence of heat stress on the cow.
    - Datatype: `Int64`
    - Example: `0`

15. **Temp15Threshold**:
    - Description: A binary variable indicating if the temperature exceeded 15 degrees Celsius on the given day.
    - Datatype: `Int64`
    - Example: `0`

16. **HW**:
    - Description: A binary variable indicating the presence of a heatwave on the day.
    - Datatype: `Int64`
    - Example: `0`

17. **cum_HW**:
    - Description: Cumulative number of heatwave days up to the current date.
    - Datatype: `Int64`
    - Example: `0`

18. **MeanTemperature**:
    - Description: The mean temperature recorded on the day.
    - Datatype: `float`
    - Example: `-3.025`

19. **MeanTHI_adj**:
    - Description: The mean adjusted Temperature-Humidity Index for the day.
    - Datatype: `float`
    - Example: `28.012944166666667`