# Spandan's Data Analysis NB

To achieve the objective of trading cocoa using satellite-derived agroclimatological data, the process must be systematically divided into the following sections:
1. Predicting Cocoa Yield Using Agroclimatological Data
2. Predicting Cocoa Prices Based on Predicted Yield
3. Trading Cocoa Futures Based on Predicting Cocoa Prices

## Predicting Cocoa Yield Using Agroclimatological Data

This section will use agroclimatological data from NASA Power DAV Tool and attempt to predict Cocoa yeild using a neural network. The NASA agroclimatological data will be partitioned by harvest cycles of cocoa.

### Agroclimatological Data Proccessing

#### Import Required Packages

In [12]:
import pandas as pd
import numpy as np
import os
import matplotlib.pyplot as plt
import seaborn as sns

In [14]:
df = pd.read_csv('../Raw_DB/NASA_Agroclimatological_Data.csv', on_bad_lines='skip')
print(df.shape)
df.head()

(46848, 12)


Unnamed: 0,LAT,LON,YEAR,DOY,GWETPROF,T2M,T2M_MAX,T2M_MIN,TS,RH2M,GWETROOT,PRECTOTCORR
0,4.75,-8.25,2023,60,0.73,26.97,28.95,25.33,27.32,83.49,0.74,0.98
1,4.75,-7.75,2023,60,0.69,26.72,29.31,24.56,26.82,84.77,0.68,0.87
2,4.75,-7.25,2023,60,0.68,26.67,29.38,24.41,26.69,85.31,0.66,0.94
3,4.75,-6.75,2023,60,0.7,26.88,29.09,24.94,26.96,84.96,0.67,1.29
4,4.75,-6.25,2023,60,0.64,27.43,28.94,26.01,27.72,83.81,0.62,1.35


#### Data Cleaning & Wrangling

In [15]:
# Change YEAR & DOY into DateTime format
df['DATE'] = pd.to_datetime(df['YEAR'].astype(str) + df['DOY'].astype(str), format='%Y%j')

# Reorder the columns to make DATE the first column and drop YEAR and DOY
df = df[['DATE'] + [col for col in df.columns if col not in ['YEAR', 'DOY', 'DATE']]]

# Rename cols
df.rename(columns={
    'TS': 'Earth_Skin_Temp',
    'RH2M': 'Rel_Humidity',
    'GWETROOT': 'Root_Soil_Wetness',
    'PRECTOTCORR': 'Precip_Corrected',
    'T2M_MIN': 'Temp_Min',
    'T2M_MAX': 'Temp_Max',
    'T2M': 'Temp_Avg',
    'GWETPROF': 'Soil_Moisture'
}, inplace=True)

# Drop rows where any cell has a value of -999
# -999 indicates NULL for that cell, as per NASA Power DAV Tool
df = df[(df != -999).all(axis=1)]

# Create a new df where we assign stations to each LAT, LON
station_df = df.copy()
station_df['Station_ID'] = station_df.groupby(['LAT', 'LON']).ngroup()
column_order = ['Station_ID'] + [col for col in station_df.columns if col != 'Station_ID']
station_df = station_df[column_order]

print(station_df.shape)
station_df.head()

(43920, 12)


Unnamed: 0,Station_ID,DATE,LAT,LON,Soil_Moisture,Temp_Avg,Temp_Max,Temp_Min,Earth_Skin_Temp,Rel_Humidity,Root_Soil_Wetness,Precip_Corrected
0,0,2023-03-01,4.75,-8.25,0.73,26.97,28.95,25.33,27.32,83.49,0.74,0.98
1,1,2023-03-01,4.75,-7.75,0.69,26.72,29.31,24.56,26.82,84.77,0.68,0.87
2,2,2023-03-01,4.75,-7.25,0.68,26.67,29.38,24.41,26.69,85.31,0.66,0.94
3,3,2023-03-01,4.75,-6.75,0.7,26.88,29.09,24.94,26.96,84.96,0.67,1.29
4,4,2023-03-01,4.75,-6.25,0.64,27.43,28.94,26.01,27.72,83.81,0.62,1.35


#### Export Cleaned DF

In [16]:
# Define the directory and filename
directory = '../NASA_DB'  # This goes one level up from the current directory
filename = 'Clean_NASA_Agroclimatological_Data.csv'

# Create the directory if it doesn't exist
if not os.path.exists(directory):
    os.makedirs(directory)

# Save the DataFrame to the CSV file in the specified directory
file_path = os.path.join(directory, filename)
station_df.to_csv(file_path, index=False)

print(f"DataFrame saved to {file_path}")

DataFrame saved to ../NASA_DB/Clean_NASA_Agroclimatological_Data.csv


### Cocoa Yeild Data Proccessing

For now unable to find monthly Cocoa yeild data so temporarily asked chat gpt to generate a df with monthly Cocoa yeild data of Ivory Coast. 