In [233]:
import os
import pandas as pd
import matplotlib as mpl
import matplotlib_inline
import seaborn as sns
import plotly.express as px
from plotly.subplots import make_subplots
import plotly.graph_objects as go

# Problem statement
Palm olein has grown over the last 60 years to be the single largest vegetable oil crop globally. It makes up XX% of total oil consumption globally, and is used in numerous industries.

Since late 2020, palm olein prices have surged to break one record after another. Prices today are at an all-time high. It is unclear what is driving this price surge. There has been no obvious structural change to the market that could easily explain such a sustained bull market. This implies that the current prices are unsustainable and there will be a market correction at some point.

In the meantime, business need to plan and the uncertainty about future prices makes this task challenging, as margins are <8% for most actors. Tha palm oil market affects many industries including foods, industrial applications (consumer goods like soaps, detergents & cosmetics) and energy. In a market where price has increased by 400% in 24 months, accurately forecasting future prices and making appropriate decisions become mission-critical.

The suppliers pricing mechanism:

The goal of this project is:
- to understand the drivers for the palm oil price
- to understand the current price surge
- to model the supply and demand factors that drive price and quantity
- to forecast future prices over 30 - 60 - 90 day horizons
- use naive as the benchmark
- fourier transform or wavelet
- pipeline should be automatic, duplication, failure

# Data Pipeline
Software automation solution to move data around. host data in a data warehouse & move the data to where you can use it.

http://tmgmdashboarding-env-1.eba-srqgw9ij.ap-southeast-2.elasticbeanstalk.com/

- data storage
- ETL
- preprocess data
- seasonality

# Data Sources
- https://agropost.wordpress.com/
- This has a lot of great market data going back to 2010. however it doesn't seem to have the data as CSV. This seems like a perfect opportunity to demonstrate my scraping skills and build a data pipeline
- https://www.fao.org/home/en
- FAO food and agriculture organisation. Part of the UN. Good data, high quality and easy to download, but lacking in granularity

# Price Drivers

### General
- political policy
- inflation
- war in russia & geo-politics

### Market
- price of substitute oils

# Demand Side
- growth in demand with the growth of economies in china and india.

# Supply Side
- weather
- labor shortages
- land clearing
- output per ha

In [234]:
DATA = 'data'

# Visualisation
**production of palm oil**: total world production & by country and region
**vegetable oils production**: total world production and by oil type
**oil yield by crop**:
**price**: palm olein price and other edible oil prices
**land use for palm oil**:
**land use for vegetable oil**:
**land area needed for vegetable oil production**:
**imports**:
**exports**:

# 1) Global palm oil production as increased to meet rising demands for vegetable oils
- Production by country over time. Stacked line chart

In [235]:
production = pd.read_csv(os.path.join(DATA, "palm-oil-production.csv"))
production.head()

Unnamed: 0,Entity,Code,Year,"Crops - Oil, palm - 257 - Production - 5510 - tonnes"
0,Africa,,1961,1131882
1,Africa,,1962,1111006
2,Africa,,1963,1145004
3,Africa,,1964,1160831
4,Africa,,1965,1138860


In [236]:
world_df = production.loc[production['Entity'] == 'World']
world_df.head()

Unnamed: 0,Entity,Code,Year,"Crops - Oil, palm - 257 - Production - 5510 - tonnes"
3410,World,OWID_WRL,1961,1478901
3411,World,OWID_WRL,1962,1475941
3412,World,OWID_WRL,1963,1535070
3413,World,OWID_WRL,1964,1570032
3414,World,OWID_WRL,1965,1576213


In [237]:
fig = px.line(world_df, x='Year', y='Crops - Oil, palm - 257 - Production - 5510 - tonnes', title='Oil Palm Production')
fig.show()

- 1961 = 1,478,901mt
- 2018 = 71,453,193mt
- 48x increase in 57 years

need to include a drop down menu to select countries and regions

# 2) Land used for Palm Oil Production

There should be a strong correlation between increased areas under cultivation for oil palm and increased production of palm oil.

Total production should effectively be the sum of total hectares under cultivation and yield per hectare. Production increases are driven by increases in land under cultivation and improving (or deteriorating) yields per hectare.

- plot by country over time (stacked line plot)
- plot by country over time (geo heat map)

In [238]:
land = pd.read_csv(os.path.join(DATA, 'land-use-palm-oil.csv'))
land.head()

Unnamed: 0,Entity,Code,Year,Crops - Oil palm fruit - 254 - Area harvested - 5312 - ha
0,Africa,,1961,3444970
1,Africa,,1962,3237470
2,Africa,,1963,3325700
3,Africa,,1964,3333700
4,Africa,,1965,3394650


In [239]:
world_land = land.loc[land['Entity'] == 'World']
world_land.head()

Unnamed: 0,Entity,Code,Year,Crops - Oil palm fruit - 254 - Area harvested - 5312 - ha
3338,World,OWID_WRL,1961,3621037
3339,World,OWID_WRL,1962,3422412
3340,World,OWID_WRL,1963,3517630
3341,World,OWID_WRL,1964,3540825
3342,World,OWID_WRL,1965,3617385


In [240]:
fig = px.line(world_land, x='Year', y='Crops - Oil palm fruit - 254 - Area harvested - 5312 - ha', title='Land under Cultivation (Palm Oil)')
fig.show()

# 3) Vegetable Oil Production
- plot 1: production over time by crop. currently a simple line chart. Need to turn it into a stacked area chart
- geo-map plot showing production by country or region over time

In [241]:
vegetable_oil_production = pd.read_csv(os.path.join(DATA, 'vegetable-oil-production.csv'))
# vegetable_oil_production.head()
year = vegetable_oil_production['Year']
vegetable_oil_production = vegetable_oil_production.groupby(year).sum()
# vegetable_oil_production

In [242]:
soybean = vegetable_oil_production['Crops processed - Oil, soybean - 237 - Production - 5510 - tonnes']
sesame = vegetable_oil_production['Crops processed - Oil, sesame - 290 - Production - 5510 - tonnes']
linseed = vegetable_oil_production['Crops processed - Oil, linseed - 334 - Production - 5510 - tonnes']
palm = vegetable_oil_production['Crops processed - Oil, palm - 257 - Production - 5510 - tonnes']
rapeseed = vegetable_oil_production['Crops processed - Oil, rapeseed - 271 - Production - 5510 - tonnes']
groundnut = vegetable_oil_production['Crops processed - Oil, groundnut - 244 - Production - 5510 - tonnes']
cottonseed = vegetable_oil_production['Crops processed - Oil, cottonseed - 331 - Production - 5510 - tonnes']
coconut = vegetable_oil_production['Crops processed - Oil, coconut (copra) - 252 - Production - 5510 - tonnes']
olive_oil = vegetable_oil_production['Crops processed - Oil, olive, virgin - 261 - Production - 5510 - tonnes']
safflower = vegetable_oil_production['Crops processed - Oil, safflower - 281 - Production - 5510 - tonnes']
sunflower = vegetable_oil_production['Crops processed - Oil, sunflower - 268 - Production - 5510 - tonnes']
maize = vegetable_oil_production['Crops processed - Oil, maize - 60 - Production - 5510 - tonnes']
palm_kernal = vegetable_oil_production['Crops processed - Oil, palm kernel - 258 - Production - 5510 - tonnes']

In [243]:
# Create figure with secondary y-axis
fig = make_subplots(specs=[[{"secondary_y": True}]])

# Add traces
fig.add_trace(go.Scatter(x=year, y=soybean, name="soybean"), secondary_y=True)  #(soybean)
fig.add_trace(go.Scatter(x=year, y=sesame, name="sesame"), secondary_y=False,)  # sesame
fig.add_trace(go.Scatter(x=year, y=linseed, name="linseed"), secondary_y=True)  # linseed
fig.add_trace(go.Scatter(x=year, y=palm, name="palm"), secondary_y=True)  # palm
fig.add_trace(go.Scatter(x=year, y=rapeseed, name="rapeseed"), secondary_y=True)  # rapeseed
fig.add_trace(go.Scatter(x=year, y=groundnut, name="groundnut"), secondary_y=True)  # groundnut
fig.add_trace(go.Scatter(x=year, y=cottonseed, name="cottonseed"), secondary_y=True)  # cottonseed
fig.add_trace(go.Scatter(x=year, y=coconut, name="coconut"), secondary_y=True)  # coconut
fig.add_trace(go.Scatter(x=year, y=olive_oil, name="olive_oil"), secondary_y=True)  # olive_oil
fig.add_trace(go.Scatter(x=year, y=safflower, name="safflower"), secondary_y=True)  # safflower
fig.add_trace(go.Scatter(x=year, y=sunflower, name="sunflower"), secondary_y=True)  # sunflower
fig.add_trace(go.Scatter(x=year, y=maize, name="maize"), secondary_y=True)  # maize
fig.add_trace(go.Scatter(x=year, y=palm_kernal, name="palm_kernal"), secondary_y=True)  # palm_kernel

# Add figure title
fig.update_layout(title_text="Production by Crop")

# Set x-axis title
fig.update_xaxes(title_text="Year")

# Set y-axes titles
fig.update_yaxes(title_text="<b>Metric Tons</b> ", secondary_y=False)
fig.update_yaxes(title_text="<b>secondary</b> yaxis title", secondary_y=True)

fig.show()

In [244]:
# this is meant to be a stacked area chart. The sum of all types of oil crop production = total global oil production
fig2 = go.Figure()

fig2.add_trace(go.Scatter(
    x=year,
    y=vegetable_oil_production['Crops processed - Oil, soybean - 237 - Production - 5510 - tonnes'],
    name='Soybean',
    mode='lines',
    line=dict(width=0.5, color='orange'),
    stackgroup='one'))

fig2.add_trace(go.Scatter(
    x=year,
    y=vegetable_oil_production['Crops processed - Oil, sesame - 290 - Production - 5510 - tonnes'],
    name='Sesame',
    mode='lines',
    line=dict(width=0.5,color='lightgreen'),
    stackgroup='one'))

fig2.add_trace(go.Scatter(
    x=year,
    y=vegetable_oil_production['Crops processed - Oil, linseed - 334 - Production - 5510 - tonnes'],
    name='Japan',
    mode='lines',
    line=dict(width=0.5, color='blue'),
    stackgroup='one'))

fig2.add_trace(go.Scatter(
    x=year,
    y=vegetable_oil_production['Crops processed - Oil, palm - 257 - Production - 5510 - tonnes'],
    name='Palm',
    mode='lines',
    line=dict(width=0.5, color='darkred'),
    stackgroup='one'))

fig2.add_trace(go.Scatter(
    x=year,
    y=vegetable_oil_production['Crops processed - Oil, rapeseed - 271 - Production - 5510 - tonnes'],
    name='Rapeseed',
    mode='lines',
    line=dict(width=0.5, color='orange'),
    stackgroup='one'))

fig2.add_trace(go.Scatter(
    x=year,
    y=vegetable_oil_production['Crops processed - Oil, groundnut - 244 - Production - 5510 - tonnes'],
    name='Groundnut',
    mode='lines',
    line=dict(width=0.5, color='black'),
    stackgroup='one'))

fig2.add_trace(go.Scatter(
    x=year,
    y=vegetable_oil_production['Crops processed - Oil, cottonseed - 331 - Production - 5510 - tonnes'],
    name='Cottonseed',
    mode='lines',
    line=dict(width=0.5, color='red'),
    stackgroup='one'))

fig2.add_trace(go.Scatter(
    x=year,
    y=vegetable_oil_production['Crops processed - Oil, coconut (copra) - 252 - Production - 5510 - tonnes'],
    name='coconut',
    mode='lines',
    line=dict(width=0.5, color='white'),
    stackgroup='one'))

fig2.add_trace(go.Scatter(
    x=year,
    y=vegetable_oil_production['Crops processed - Oil, olive, virgin - 261 - Production - 5510 - tonnes'],
    name='Olive Oil',
    mode='lines',
    line=dict(width=0.5, color='pink'),
    stackgroup='one'))

fig2.add_trace(go.Scatter(
    x=year,
    y=vegetable_oil_production['Crops processed - Oil, safflower - 281 - Production - 5510 - tonnes'],
    name='Safflower',
    mode='lines',
    line=dict(width=0.5, color='yellow'),
    stackgroup='one'))

fig2.add_trace(go.Scatter(
    x=year,
    y=vegetable_oil_production['Crops processed - Oil, sunflower - 268 - Production - 5510 - tonnes'],
    name='Sunflower',
    mode='lines',
    line=dict(width=0.5, color='purple'),
    stackgroup='one'))

fig2.add_trace(go.Scatter(
    x=year,
    y=vegetable_oil_production['Crops processed - Oil, maize - 60 - Production - 5510 - tonnes'],
    name='Maize',
    mode='lines',
    line=dict(width=0.5, color='green'),
    stackgroup='one'))

fig2.add_trace(go.Scatter(
    x=year,
    y=vegetable_oil_production['Crops processed - Oil, palm kernel - 258 - Production - 5510 - tonnes'],
    name='Palm Kernel',
    mode='lines',
    line=dict(width=0.5, color='light blue'),
    stackgroup='one'))

fig2.update_layout(
     title="Total Vegetable oil production by crop",
     title_font_size=40, legend_font_size=20,
     width=1600, height=1400)

fig2.update_xaxes(
     title_text='Year',
     title_font=dict(size=30, family='Verdana', color='black'),
     tickfont=dict(family='Calibri', color='darkred', size=25))

fig2.update_yaxes(
     title_text="metric tons", range=(0, 160),
     title_font=dict(size=30, family='Verdana', color='black'),
     tickfont=dict(family='Calibri', color='darkred', size=25))

fig2.show()

In [245]:
oil_yield = pd.read_csv(os.path.join(DATA, "oil-yield-by-crop.csv"))
oil_yield.head()

Unnamed: 0,Entity,Code,Year,Oil yield (t/ha)
0,Coconut Oil,,2018,0.257735
1,Cottonseed Oil,,2018,0.141688
2,Groundnut Oil,,2018,0.183447
3,Olive Oil,,2018,0.339773
4,Palm Oil,,2018,2.835088


# 4) Price

In [246]:
price = pd.read_csv(os.path.join(DATA, 'palm oil prices 021020 - 290422.csv'))
price.head()

Unnamed: 0,DAILY PRICES,Palm olein RBD Mal FOB US$
0,02/01/20,790.0
1,03/01/20,790.0
2,04/01/20,
3,05/01/20,
4,06/01/20,785.0


I need to find more data. Ideally daily price data back to 1961

In [247]:
fig = px.line(price, x='DAILY PRICES', y='Palm olein RBD Mal FOB US$', title='Palm Olein Price')
fig.show()

In [248]:
price.shape

(849, 2)

In [249]:
# line chart horizontal, no hover text

# 5) Export Volumes
are a very good way to look at total volumes. Most palm oi is exported as evidenced by comparing the importer country vs the

In [250]:
# stacked line chart with hover text

# 6) Key Import Markets

In [251]:
imports = pd.read_csv(os.path.join(DATA, 'palm-oil-imports.csv'))
imports.head()

Unnamed: 0,Entity,Code,Year,New Food Balances - Palm Oil - 2577 - Import Quantity - 5611 - 1000 tonnes
0,Afghanistan,AFG,2014,145000
1,Afghanistan,AFG,2015,153000
2,Afghanistan,AFG,2016,161000
3,Afghanistan,AFG,2017,175000
4,Albania,ALB,2014,1000


In [252]:
imports.shape

(688, 4)

In [265]:
imports = pd.read_csv(os.path.join(DATA, 'vegetable-oil-production.csv'))
# imports.head()
year = imports['Year']
imports = imports.groupby(year).sum()
imports

Unnamed: 0_level_0,"Crops processed - Oil, soybean - 237 - Production - 5510 - tonnes","Crops processed - Oil, sesame - 290 - Production - 5510 - tonnes","Crops processed - Oil, linseed - 334 - Production - 5510 - tonnes","Crops processed - Oil, palm - 257 - Production - 5510 - tonnes","Crops processed - Oil, rapeseed - 271 - Production - 5510 - tonnes","Crops processed - Oil, groundnut - 244 - Production - 5510 - tonnes","Crops processed - Oil, cottonseed - 331 - Production - 5510 - tonnes","Crops processed - Oil, coconut (copra) - 252 - Production - 5510 - tonnes","Crops processed - Oil, olive, virgin - 261 - Production - 5510 - tonnes","Crops processed - Oil, safflower - 281 - Production - 5510 - tonnes","Crops processed - Oil, sunflower - 268 - Production - 5510 - tonnes","Crops processed - Oil, maize - 60 - Production - 5510 - tonnes","Crops processed - Oil, palm kernel - 258 - Production - 5510 - tonnes"
Year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
1961,12563566.0,1762131.0,3605247.0,6708734.0,4840867.0,11234153.0,9230397.0,7354670.0,6608670.0,388947.0,8131734.0,1465322.0,2392785.0
1962,13777553.0,2090167.0,4059024.0,6712690.0,5174853.0,11774274.0,9653376.0,8749060.0,4533310.0,641181.0,9552219.0,1540017.0,2378154.0
1963,14725863.0,2103291.0,3896389.0,6947784.0,4961239.0,12955374.0,10483243.0,8475221.0,8856432.0,635618.0,10056050.0,1584599.0,2199279.0
1964,14743503.0,2117241.0,3952174.0,7076191.0,4836472.0,13222081.0,11163961.0,8165172.0,4481156.0,516356.0,9881294.0,1669078.0,2396552.0
1965,15993252.0,2155306.0,4375128.0,7011976.0,6780248.0,12167658.0,11763107.0,8443615.0,6078829.0,524554.0,12519604.0,1831634.0,2469792.0
1966,18080003.0,2036072.0,3965364.0,7395521.0,6757745.0,12484300.0,11877619.0,9289872.0,6190790.0,748605.0,12591332.0,1852351.0,2441664.0
1967,19360549.0,2182103.0,3504699.0,7486559.0,7332706.0,13346223.0,10318935.0,8741430.0,6608902.0,766063.0,14621180.0,1883002.0,2044249.0
1968,19411049.0,2362568.0,3480347.0,8210069.0,8080160.0,12505530.0,10198331.0,8229023.0,7262915.0,528304.0,15323309.0,1892525.0,2127513.0
1969,21180907.0,2284582.0,3543701.0,8731743.0,7641357.0,12834120.0,11136005.0,8099855.0,6294838.0,578457.0,15288671.0,1987756.0,2340698.0
1970,26414015.0,2786629.0,4423724.0,8703692.0,8332219.0,14398952.0,10847887.0,8320713.0,6841166.0,648149.0,15321150.0,2080306.0,2337788.0


In [None]:
# stacked area plot

In [253]:
# world heat map

# 4) Palm Oil Production Volumes by Country & over time

In [254]:
production_time = pd.read_csv(os.path.join(DATA, 'palm-oil-production.csv'))
production_time.head()

Unnamed: 0,Entity,Code,Year,"Crops - Oil, palm - 257 - Production - 5510 - tonnes"
0,Africa,,1961,1131882
1,Africa,,1962,1111006
2,Africa,,1963,1145004
3,Africa,,1964,1160831
4,Africa,,1965,1138860


In [255]:
production_time.shape

(3468, 4)

In [256]:
# world heat map of producers
# line chart with drop box for country

# 5) land use for vegetable oil crops

In [257]:
crops = pd.read_csv(os.path.join(DATA, 'land-use-for-vegetable-oil-crops.csv'))
crops.head()

Unnamed: 0,Entity,Code,Year,Crops - Olives - 260 - Area harvested - 5312 - ha,Crops - Rapeseed - 270 - Area harvested - 5312 - ha,Crops - Soybeans - 236 - Area harvested - 5312 - ha,Crops - Safflower seed - 280 - Area harvested - 5312 - ha,Crops - Sunflower seed - 267 - Area harvested - 5312 - ha,Crops - Oil palm fruit - 254 - Area harvested - 5312 - ha,Crops - Coconuts - 249 - Area harvested - 5312 - ha,"Crops - Groundnuts, with shell - 242 - Area harvested - 5312 - ha",Crops - Linseed - 333 - Area harvested - 5312 - ha,Crops - Seed cotton - 328 - Area harvested - 5312 - ha,Crops - Sesame seed - 289 - Area harvested - 5312 - ha,Crops - Castor oil seed - 265 - Area harvested - 5312 - ha,Crops - Karite nuts (sheanuts) - 263 - Area harvested - 5312 - ha,Crops - Tung nuts - 275 - Area harvested - 5312 - ha
0,Afghanistan,AFG,1961,600.0,,,,8300.0,,,,46800.0,76892.0,33200.0,,,
1,Afghanistan,AFG,1962,600.0,,,,8300.0,,,,46800.0,91056.0,33200.0,,,
2,Afghanistan,AFG,1963,600.0,,,,8300.0,,,,46800.0,121408.0,33200.0,,,
3,Afghanistan,AFG,1964,600.0,,,,8300.0,,,,46800.0,121408.0,33200.0,,,
4,Afghanistan,AFG,1965,600.0,,,,8300.0,,,,46800.0,80939.0,33200.0,,,


In [258]:
crops.shape

(12191, 17)

In [259]:
# stacked line chart

# 6) total vegetable oil production (substitutes) over time
palm oil prices are effected by the price and supply of substitute vegetable oils

In [260]:
oil_production = pd.read_csv(os.path.join(DATA, 'vegetable-oil-production.csv'))
oil_production.head()

Unnamed: 0,Entity,Code,Year,"Crops processed - Oil, soybean - 237 - Production - 5510 - tonnes","Crops processed - Oil, sesame - 290 - Production - 5510 - tonnes","Crops processed - Oil, linseed - 334 - Production - 5510 - tonnes","Crops processed - Oil, palm - 257 - Production - 5510 - tonnes","Crops processed - Oil, rapeseed - 271 - Production - 5510 - tonnes","Crops processed - Oil, groundnut - 244 - Production - 5510 - tonnes","Crops processed - Oil, cottonseed - 331 - Production - 5510 - tonnes","Crops processed - Oil, coconut (copra) - 252 - Production - 5510 - tonnes","Crops processed - Oil, olive, virgin - 261 - Production - 5510 - tonnes","Crops processed - Oil, safflower - 281 - Production - 5510 - tonnes","Crops processed - Oil, sunflower - 268 - Production - 5510 - tonnes","Crops processed - Oil, maize - 60 - Production - 5510 - tonnes","Crops processed - Oil, palm kernel - 258 - Production - 5510 - tonnes"
0,Afghanistan,AFG,1961,,2253.0,3531.0,,,,4997.0,,82.0,,2938.0,,
1,Afghanistan,AFG,1962,,1876.0,3701.0,,,,7716.0,,90.0,,3138.0,,
2,Afghanistan,AFG,1963,,1831.0,2857.0,,,,11742.0,,82.0,,3138.0,,
3,Afghanistan,AFG,1964,,2722.0,3377.0,,,,7960.0,,90.0,,3138.0,,
4,Afghanistan,AFG,1965,,2821.0,4327.0,,,,7926.0,,82.0,,3238.0,,


In [261]:
oil_production.shape

(11064, 16)

# 7) Palm oil uses

# 8) oil yield by crop (2018)

In [262]:
_yield = pd.read_csv(os.path.join(DATA, 'oil-yield-by-crop.csv'))
_yield.head()

Unnamed: 0,Entity,Code,Year,Oil yield (t/ha)
0,Coconut Oil,,2018,0.257735
1,Cottonseed Oil,,2018,0.141688
2,Groundnut Oil,,2018,0.183447
3,Olive Oil,,2018,0.339773
4,Palm Oil,,2018,2.835088


In [263]:
_yield.shape

(9, 4)

# Palm oil terms
We're mainly interested in RBD palm olein but its worth mentioning some others
- RBD: refined deodorised and bleached
- CPO: crude palm olein
- CPKO: crude palm kernel oil

# INCO terms
We're mainly interested in FOB but its work knowing some other INCO terms
- EXW: ex works
- FAS: free alongside
- FOB: free on board
- CFR: cost and freight
- CIF: cost, insurance & freight

# Ports of origin
we will reference 'malaysia', however in fact, there are two primary ports for exports of palm olein.
- Port Klang
- Kuala Lumpur
the often quoted 'FOB malaysia' is an index price, traded on Bursa Malaysia. In reality, the prices will differ slightly between the two ports