In [33]:
import os
import numpy as np
import pandas as pd
import matplotlib as mpl
import matplotlib_inline
import seaborn as sns
import plotly.express as px
from plotly.subplots import make_subplots
import plotly.graph_objects as go

In [34]:
DATA = 'data'

# Problem statement
Palm olein has grown over the last 60 years to be the worlds single largest vegetable oil crop. It makes up 35% of edible oil consumption globally, and is used in numerous industries.

This is at odds with the perception among consumers that palm oil is environmentally unfriendly, and that companies are increasingly turning to alternatives.

Since late 2020, palm olein prices have surged to break one record after another. Prices today are at an all-time high. It is unclear what is driving this price surge. There has been no obvious structural change to the market that could easily explain such a sustained bull market. This implies that the current prices are unsustainable and there will be a market correction at some point.

In the meantime, business need to plan and the uncertainty about future prices makes this task challenging, as margins are <8% for most actors. Tha palm oil market affects many industries including foods, industrial applications (consumer goods like soaps, detergents & cosmetics) and energy. In a market where price has increased by 400% in 24 months, accurately forecasting future prices and making appropriate decisions become mission-critical.

**The suppliers pricing mechanism**: the supplier sets the price for the month on the first day of the new month. The supplier closes their order book on the 21st of the current month. by the 21st of the month, purchasing must choose whether or not to place forward orders for 1-3 months at the current price, or to wait for the new price in 10 days.

# Goal
The goal of this project is:
- to understand the drivers for the palm oil price
- to understand the current price surge
- to model the supply and demand factors that drive price and quantity
- to forecast future prices over 30 - 60 - 90 day horizons
- use naive as the benchmark
- fourier transform or wavelet
- pipeline should be automatic, duplication, failure

# Price Drivers

We have major data challenges. need more data over longer periods

### General
- political policy
- inflation (quarterly)
- exchange rate movements (daily)
- war in russia & geo-politics
- economic effects of COVID flow-on to palm oil

### Market
- price of palm oil (daily)
- price of substitute oils (daily)

# Demand Side
- general increase in the consumption of edible oils globally
- growth in demand with the growth of specific economies in china and india.

# Supply Side
- weather: daily
- labor shortages:
- land clearing:?
- output per ha:?

# Visualisation
- **production of palm oil**: total world production & by country and region over time. line plot
- **vegetable oils production**: production and by oil type, country and region over time. stacked area plot
- **imports**: which countries import the most oil? how has it changed over time?. global map/heatmap
- **production by country**: palm oil production by country over time. global map/heatmap
- **production by country**: horizontal bar chart with national output at end of bar.
- **exports**: which countries export palm oil? how has it changed over time? global map/heatmap.
- **land used for palm oil**: how much land is used for palm oil cultivation over time? line plot.
- **land used for vegetable oil**: how much land is used for the cultivation of oil crops, by crop, by country and region, over time? stacked area plot.
- **oil yield by crop**: a comparison of oil yield per hectare of land cultivated by crop. shows that palm oil is the most productive per hectare. horizontal bar plot showing top 10 crops
- **price**: palm olein and other edible oil prices and other edible oil prices. line chart

# Introduction
There are a number of forces at play that make this an interesting topic
- palm oil demand growing continuously year-on-year
- the amount of cultivation and production has increased
- increased land use is the primary driver of increased production volumes
- increased yields have also contributed
- there is an increasingly negative view of palm oil and its role in driving deforestation. "palm oil free" is now a prominent issue and should cause downward pressure on demand
- labour shortages due to restrictions imposed during COVID 19 had an effect in 2020 and the "stickiness" of agricultural production means the effect of this will take time to work through the supply chain. This may be one of the causes behind lagging supply.
- global shipping disruptions and delays have impacted trade broadly and palm oil is affected. This should have a dampening effect on demand
- Public policy in indonesia constraining exports will have had a dampening effect on supply
- El nino weather conditions will have a postive impact on supply, however the timing of this impact is unclear at this time

# 1) Global palm oil production
Palm oil production has increased rapidly over the past 50 years. In 1970, the world was producing only 2 million tonnes. This is now 35 times higher: in 2018 the world produced 71 million tonnes. The change in global production is shown in the chart.3

The rise of palm oil follows the rapid increase in demand for vegetable oils more broadly. The breakdown of global vegetable oil production by crop is shown in the stacked area chart. Global production increased ten-fold since the 1960s – from 17 to 170 million tonnes in 2014. As we will see later in this article, **more recent data for 2018 comes to 218 million tonnes**.

The story of palm oil is less about it as an isolated commodity, but more about the story of the rising demand for vegetable oils. Palm oil is a very productive crop. It produces 36% of the world’s oil, but uses less than 9% of croplands devoted to oil production. It has favourable production costs and is among the cheapest edible oils. Palm Oil has therefore been a natural choice to meet this demand.

Production of palm oil has increased by **4800% over the last 60 years**. The growth has occurred to meet rising demands for vegetable oils in general. Palm oil's growth is a function of increased demand for edible oils, combined with palm oil's favourable cost of production.


In [35]:
production = pd.read_csv(os.path.join(DATA, "palm-oil-production.csv"))
world_df = production.loc[production['Entity'] == 'World']

In [83]:
palm_oil_prodn_fig = px.line(world_df, x="Year", y="Crops - Oil, palm - 257 - Production - 5510 - tonnes")

# Add figure title
palm_oil_prodn_fig.update_layout(title_text="<b>Global Oil Palm Fruit Production<b>",title_font_size=40, legend_font_size=20, width=1400, height=1000)

# format x-axis
palm_oil_prodn_fig.update_xaxes(title_text="Year", title_font=dict(size=30, family='Verdana', color='white'), tickfont=dict(family='Calibri', color='white', size=25))

# Format y-axes
palm_oil_prodn_fig.update_yaxes(title_text="<b>Palm Oil production (mt)</b>", title_font=dict(size=30, family='Verdana', color='white'), tickfont=dict(family='Calibri', color='white', size=25))

palm_oil_prodn_fig.show()
# To-do: format plot. button to add country or region. automation

- 1961 = 1,478,901mt
- 2018 = 71,453,193mt
- 48x increase in 57 years

need to include a drop down menu to select countries and regions

# 2) Land used for Palm Oil Production

There should be a strong correlation between increased areas under cultivation for oil palm and increased production of palm oil.

Total production should effectively be the sum of total hectares under cultivation and yield per hectare. Production increases are driven by increases in land under cultivation and improving (or deteriorating) yields per hectare.

- plot by country over time (stacked line plot)
- plot by country over time (geo heat map)

In [37]:
land = pd.read_csv(os.path.join(DATA, 'land-use-palm-oil.csv'))
world_land = land.loc[land['Entity'] == 'World']
oil_palm_fruit = world_land["Crops - Oil palm fruit - 254 - Area harvested - 5312 - ha"]

In [85]:
land_fig = px.line(world_land, x="Year", y=oil_palm_fruit)

# Add figure title
land_fig.update_layout(title_text="<b>Land under Cultivation (Palm Oil)<b>",title_font_size=40, legend_font_size=20, width=1400, height=1000)

# format x-axis
land_fig.update_xaxes(title_text="</b>Year</b>", title_font=dict(size=30, family='Verdana', color='white'), tickfont=dict(family='Calibri', color='white', size=25))

# Format y-axes
land_fig.update_yaxes(title_text="<b>Palm Oil Fruit (mt)</b>", title_font=dict(size=30, family='Verdana', color='white'), tickfont=dict(family='Calibri', color='white', size=25))

land_fig.show()


# 3) Vegetable Oil Production
- plot 1: production over time by crop. currently a simple line chart. Need to turn it into a stacked area chart
- geo-map plot showing production by country or region over time

In [39]:
vegetable_oil_production = pd.read_csv(os.path.join(DATA, 'vegetable-oil-production.csv'))
year = vegetable_oil_production['Year'].drop_duplicates(keep='first', inplace=False)
vegetable_oil_production.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 11064 entries, 0 to 11063
Data columns (total 16 columns):
 #   Column                                                                     Non-Null Count  Dtype  
---  ------                                                                     --------------  -----  
 0   Entity                                                                     11064 non-null  object 
 1   Code                                                                       9323 non-null   object 
 2   Year                                                                       11064 non-null  int64  
 3   Crops processed - Oil, soybean - 237 - Production - 5510 - tonnes          5739 non-null   float64
 4   Crops processed - Oil, sesame - 290 - Production - 5510 - tonnes           4212 non-null   float64
 5   Crops processed - Oil, linseed - 334 - Production - 5510 - tonnes          4387 non-null   float64
 6   Crops processed - Oil, palm - 257 - Production - 5510 

In [40]:
veg_oil_yearly_production = vegetable_oil_production.groupby('Year').sum()
veg_oil_yearly_production.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 54 entries, 1961 to 2014
Data columns (total 13 columns):
 #   Column                                                                     Non-Null Count  Dtype  
---  ------                                                                     --------------  -----  
 0   Crops processed - Oil, soybean - 237 - Production - 5510 - tonnes          54 non-null     float64
 1   Crops processed - Oil, sesame - 290 - Production - 5510 - tonnes           54 non-null     float64
 2   Crops processed - Oil, linseed - 334 - Production - 5510 - tonnes          54 non-null     float64
 3   Crops processed - Oil, palm - 257 - Production - 5510 - tonnes             54 non-null     float64
 4   Crops processed - Oil, rapeseed - 271 - Production - 5510 - tonnes         54 non-null     float64
 5   Crops processed - Oil, groundnut - 244 - Production - 5510 - tonnes        54 non-null     float64
 6   Crops processed - Oil, cottonseed - 331 - Production - 

In [41]:
import re
pattern = r'(?<=Oil, ).+?(?= - \d)'
cols = [re.search(pattern, c, re.RegexFlag.IGNORECASE)[0] for c in veg_oil_yearly_production]
cols = [re.sub(' ', '_', c) for c in cols]
cols = [re.sub('\W', '', c) for c in cols]

In [42]:
veg_oil_yearly_production.columns = cols
veg_oil_yearly_production.reset_index(inplace=True)
veg_oil_yearly_production.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 54 entries, 0 to 53
Data columns (total 14 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Year           54 non-null     int64  
 1   soybean        54 non-null     float64
 2   sesame         54 non-null     float64
 3   linseed        54 non-null     float64
 4   palm           54 non-null     float64
 5   rapeseed       54 non-null     float64
 6   groundnut      54 non-null     float64
 7   cottonseed     54 non-null     float64
 8   coconut_copra  54 non-null     float64
 9   olive_virgin   54 non-null     float64
 10  safflower      54 non-null     float64
 11  sunflower      54 non-null     float64
 12  maize          54 non-null     float64
 13  palm_kernel    54 non-null     float64
dtypes: float64(13), int64(1)
memory usage: 6.0 KB


In [80]:
veg_oil_prodn_fig = px.area(
    veg_oil_yearly_production,
    x='Year',
    y=veg_oil_yearly_production.columns[1:]
)
veg_oil_prodn_fig.update_traces(textfont_size=16, hovertemplate=None)
veg_oil_prodn_fig.update_layout(hovermode="x")
veg_oil_prodn_fig.update_layout(title_text="<b>Vegetable Oil Production<b>",title_font_size=40, legend_font_size=20, width=1800, height=1400)  # Add figure title
veg_oil_prodn_fig.update_xaxes(title_text="</b>Year</b>", title_font=dict(size=30, family='Verdana', color='white'), tickfont=dict(family='Calibri', color='white', size=25))  # format x-axis
veg_oil_prodn_fig.update_yaxes(title_text="<b>Palm Oil Fruit (mt)</b>", title_font=dict(size=30, family='Verdana', color='white'), tickfont=dict(family='Calibri', color='white', size=25))  # Format y-axes
veg_oil_prodn_fig.show()

demand has continued to grow for eddibeloils across the board. Palm oil is not the single larges source of edible oils, followed by soy and rapeseed (canola). The historical long run trends suggest no massive changes to supply or demand.

THere have however been a number of "shocks" that have affected the oil market.
- pandemic
- weather
- war in ukraine



Who uses palm oil and what is it used for?

Why has the market for palm oil – and vegetable oils more broadly – increased so rapidly? What is it used for?

Palm oil is a versatile product which is used in a range of products across the world:
- Foods: over two-thirds (68%) is used in foods ranging from margarine to chocolate, pizzas, breads and cooking oils;
- Industrial applications: 27% is used in industrial applications and consumer products such as soaps, detergents, cosmetics and cleaning agents;
- Bioenergy: 5% is used as biofuels for transport, electricity or heat.

While food products dominate globally, this breakdown varies from country-to-country. Some countries use much more palm oil for biofuels than others. In Germany, for example, bioenergy is the largest use, accounting for 41% (more than food at 40%). A push towards increased biofuel consumption in the transport sector has been driving this, despite it being worse for the environment than normal diesel.

In the next section we will look at what countries produce palm oil, but here we see a map of palm oil imports. Although production is focused in only a few countries across the tropical belt, we see that palm oil is an important product across the world.


In [44]:
oil_yield = pd.read_csv(os.path.join(DATA, "oil-yield-by-crop.csv"))
oil_yield.head()

Unnamed: 0,Entity,Code,Year,Oil yield (t/ha)
0,Coconut Oil,,2018,0.257735
1,Cottonseed Oil,,2018,0.141688
2,Groundnut Oil,,2018,0.183447
3,Olive Oil,,2018,0.339773
4,Palm Oil,,2018,2.835088


# Where is palm oil grown?

Oil palm is a tropical plant species. It thrives on high rainfall, adequate sunlight and humid conditions – this means the best growing areas are along a narrow band around the equator.4 Palm oil is therefore grown in many countries across Africa, South America, and Southeast Asia. In the map we see the distribution of production across the world.

Small amounts of palm oil are grown in many countries, but the global market is dominated by only two: Indonesia and Malaysia. In 2018, the world produced 72 million tonnes of oil palm. Indonesia accounted for 57% of this (41 million tonnes), and Malaysia produced 27% (20 million tonnes).

84% of global palm oil production comes from Indonesia and Malaysia.

In the chart we see the production of the palm oil plant across a number of countries. Other producers include Thailand, Colombia, Nigeria, Guatemala, and Ecuador. As we’d expect, all of these countries lie along the zone of ‘optimal conditions’ around the equator.

# 4) Price

In [45]:
price = pd.read_csv(os.path.join(DATA, 'palm oil prices 021020 - 290422.csv'))
price.head()

Unnamed: 0,DAILY PRICES,Palm olein RBD Mal FOB US$
0,02/01/20,790.0
1,03/01/20,790.0
2,04/01/20,
3,05/01/20,
4,06/01/20,785.0


I need to find more data. Ideally daily price data back to 1961

In [46]:
fig = px.line(price, x='DAILY PRICES', y='Palm olein RBD Mal FOB US$', title='Palm Olein Price')
fig.show()
# format this

In [47]:
price.shape

(849, 2)

In [48]:
# line chart horizontal, no hover text

# 5) Export Volumes
are a very good way to look at total volumes. Most palm oi is exported as evidenced by comparing the importer country vs the

In [49]:
# stacked line chart with hover text

# 6) Key Import Markets

In [50]:
imports = pd.read_csv(os.path.join(DATA, 'palm-oil-imports.csv'))
imports.head()

Unnamed: 0,Entity,Code,Year,New Food Balances - Palm Oil - 2577 - Import Quantity - 5611 - 1000 tonnes
0,Afghanistan,AFG,2014,145000
1,Afghanistan,AFG,2015,153000
2,Afghanistan,AFG,2016,161000
3,Afghanistan,AFG,2017,175000
4,Albania,ALB,2014,1000


In [51]:
imports.shape

(688, 4)

In [52]:
imports = pd.read_csv(os.path.join(DATA, 'vegetable-oil-production.csv'))
# imports.head()
year = imports['Year']
imports = imports.groupby(year).sum()
imports

Unnamed: 0_level_0,"Crops processed - Oil, soybean - 237 - Production - 5510 - tonnes","Crops processed - Oil, sesame - 290 - Production - 5510 - tonnes","Crops processed - Oil, linseed - 334 - Production - 5510 - tonnes","Crops processed - Oil, palm - 257 - Production - 5510 - tonnes","Crops processed - Oil, rapeseed - 271 - Production - 5510 - tonnes","Crops processed - Oil, groundnut - 244 - Production - 5510 - tonnes","Crops processed - Oil, cottonseed - 331 - Production - 5510 - tonnes","Crops processed - Oil, coconut (copra) - 252 - Production - 5510 - tonnes","Crops processed - Oil, olive, virgin - 261 - Production - 5510 - tonnes","Crops processed - Oil, safflower - 281 - Production - 5510 - tonnes","Crops processed - Oil, sunflower - 268 - Production - 5510 - tonnes","Crops processed - Oil, maize - 60 - Production - 5510 - tonnes","Crops processed - Oil, palm kernel - 258 - Production - 5510 - tonnes"
Year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
1961,12563566.0,1762131.0,3605247.0,6708734.0,4840867.0,11234153.0,9230397.0,7354670.0,6608670.0,388947.0,8131734.0,1465322.0,2392785.0
1962,13777553.0,2090167.0,4059024.0,6712690.0,5174853.0,11774274.0,9653376.0,8749060.0,4533310.0,641181.0,9552219.0,1540017.0,2378154.0
1963,14725863.0,2103291.0,3896389.0,6947784.0,4961239.0,12955374.0,10483243.0,8475221.0,8856432.0,635618.0,10056050.0,1584599.0,2199279.0
1964,14743503.0,2117241.0,3952174.0,7076191.0,4836472.0,13222081.0,11163961.0,8165172.0,4481156.0,516356.0,9881294.0,1669078.0,2396552.0
1965,15993252.0,2155306.0,4375128.0,7011976.0,6780248.0,12167658.0,11763107.0,8443615.0,6078829.0,524554.0,12519604.0,1831634.0,2469792.0
1966,18080003.0,2036072.0,3965364.0,7395521.0,6757745.0,12484300.0,11877619.0,9289872.0,6190790.0,748605.0,12591332.0,1852351.0,2441664.0
1967,19360549.0,2182103.0,3504699.0,7486559.0,7332706.0,13346223.0,10318935.0,8741430.0,6608902.0,766063.0,14621180.0,1883002.0,2044249.0
1968,19411049.0,2362568.0,3480347.0,8210069.0,8080160.0,12505530.0,10198331.0,8229023.0,7262915.0,528304.0,15323309.0,1892525.0,2127513.0
1969,21180907.0,2284582.0,3543701.0,8731743.0,7641357.0,12834120.0,11136005.0,8099855.0,6294838.0,578457.0,15288671.0,1987756.0,2340698.0
1970,26414015.0,2786629.0,4423724.0,8703692.0,8332219.0,14398952.0,10847887.0,8320713.0,6841166.0,648149.0,15321150.0,2080306.0,2337788.0


In [53]:
# stacked area plot

In [54]:
# world heat map

# 7) Palm Oil Production Volumes by Country & over time

In [55]:
production_time = pd.read_csv(os.path.join(DATA, 'palm-oil-production.csv'))
production_time.head()

year = production_time['Year']
production_time = production_time.groupby(year).sum()
production_time

Unnamed: 0_level_0,"Crops - Oil, palm - 257 - Production - 5510 - tonnes"
Year,Unnamed: 1_level_1
1961,7121016
1962,7131496
1963,7389088
1964,7508695
1965,7418087
1966,7784970
1967,7936266
1968,8713890
1969,9248339
1970,9242471


In [56]:
production_time.shape

(58, 1)

In [57]:
# world heat map of producers
# line chart with drop box for country

# 8) land use for vegetable oil crops

the amount of land under cultivation is a key driver of production volumes for agricultural products.

How has the world achieved such a rapid expansion of palm oil production? There are only two ways in which we can produce more of a given crop:
- increase yields (growing more on a given amount of land) or
- expand the amount of land we use to grow it.

Global palm yields have increased over time, but far short of the increase in demand. This means that over the last 50 years the amount of land devoted to growing palm oil has increased and is the main driver of increased production. In the chart here we see the change in land use. Since 1980 the amount of land the world uses to grow palm has more than quadrupled, from 4 million to 19 million hectares in 2018. Indonesia and Malaysia account for 63% of global land use for palm. This is low when we consider that it accounts for 84% of **production**. This is because both countries achieve high yields.

19 million hectares might sound like a lot of land. But we should consider this in the context of all land used to grow oil crops. The world devotes more than 300 million hectares for oilcrop production. Palm oil accounts for 6% of this land use, which is small when we consider that it produces 36% of the oil.


In [58]:
crops = pd.read_csv(os.path.join(DATA, 'land-use-for-vegetable-oil-crops.csv'))
crops.head()

Unnamed: 0,Entity,Code,Year,Crops - Olives - 260 - Area harvested - 5312 - ha,Crops - Rapeseed - 270 - Area harvested - 5312 - ha,Crops - Soybeans - 236 - Area harvested - 5312 - ha,Crops - Safflower seed - 280 - Area harvested - 5312 - ha,Crops - Sunflower seed - 267 - Area harvested - 5312 - ha,Crops - Oil palm fruit - 254 - Area harvested - 5312 - ha,Crops - Coconuts - 249 - Area harvested - 5312 - ha,"Crops - Groundnuts, with shell - 242 - Area harvested - 5312 - ha",Crops - Linseed - 333 - Area harvested - 5312 - ha,Crops - Seed cotton - 328 - Area harvested - 5312 - ha,Crops - Sesame seed - 289 - Area harvested - 5312 - ha,Crops - Castor oil seed - 265 - Area harvested - 5312 - ha,Crops - Karite nuts (sheanuts) - 263 - Area harvested - 5312 - ha,Crops - Tung nuts - 275 - Area harvested - 5312 - ha
0,Afghanistan,AFG,1961,600.0,,,,8300.0,,,,46800.0,76892.0,33200.0,,,
1,Afghanistan,AFG,1962,600.0,,,,8300.0,,,,46800.0,91056.0,33200.0,,,
2,Afghanistan,AFG,1963,600.0,,,,8300.0,,,,46800.0,121408.0,33200.0,,,
3,Afghanistan,AFG,1964,600.0,,,,8300.0,,,,46800.0,121408.0,33200.0,,,
4,Afghanistan,AFG,1965,600.0,,,,8300.0,,,,46800.0,80939.0,33200.0,,,


In [59]:
crops.shape

(12191, 17)

# stacked line chart

Palm oil versus the alternatives

Palm oil has been an important driver of deforestation. But would the alternatives have fared any better?

There are a couple of reasons why palm oil has been the favored crop to meet growing demand for vegetable oils. Firstly, it has lowest production costs.11 Secondly, its composition means it’s versatile and can be used for food and non-food purposes alike: some oils are not suited for cosmetic uses such as shampoos and detergents. Third, it gets incredibly high yields.

If we weren’t meeting global oil demand through palm oil, another oilcrop would have to take its place. Would the alternatives be any better for the environment?

We can compare crops in terms of their yields – how much oil we can produce from one hectare of land. This comparison is shown in the chart.12 Palm oil stands out immediately. It achieves a much higher yield than the alternatives. From each hectare of land, you can produce about 2.8 tonnes of palm oil. That’s around four times higher than alternatives such as sunflower or rapeseed oil (where you get about 0.7 tonnes per hectare); and 10 to 15 times higher than popular alternatives such as coconut or groundnut oil (where you get 0.2 tonnes per hectare).



Let’s take a look at how this comparison affects the global landscape of oilcrops in terms of production and land use. In the chart we see the breakdown of global vegetable oil production in 2018. On the left we have each crop’s share of global land use for vegetable oils; on the right we have its share of production.

We know from our yield comparison that palm oil achieves a much higher yield. What this means is that it accounts for a very high share of oil production without taking up much land. In 2017 it produced 36% of our vegetable oil, but took up only 8.6% of the land.

Sunflower oil was almost exactly proportional in terms of how much oil it produced relative to how much land it took up: it produced 9% of oil, and required 8.3% of land. Rape and mustardseed oil were also in proportion. The rest – soybean, olive, coconut, groundnut, and sesameseed – used more land than they gave back in oil production. Coconut oil, for example, provided only 1.4% of global oil but required 3.6% of the land.

It’s of course true that some crops provide co-products in the process. The non-oil fraction of soybeans, for example, can be allocated to other uses such as high-protein animal feed. Therefore using this land to grow the crop is meeting other food demand at the same time. But this doesn’t change the fact that if the world requires a given amount of vegetable oil, it is the oil yield per hectare of each crop that we care about – regardless of whether it provides co-products in the process.


Substitutes for palm oil do not always exist. As we’ve discussed, substituting palm oil with alternatives can do more harm than good. But it’s also true that alternative oils are not always suitable for the products we need. Palm oil is unique in its versatility, meaning it is suitable for a range of foods, cosmetics, industrial applications and biofuels. Substitution would be feasible for most food products. Substitution in industrial processes would be more difficult, especially if we want to replace it with oils grown in temperate countries: sunflower or rapeseed oil is not suited to products such as soaps, detergents or cosmetics. One sector where alternatives do exist is bioenergy, which brings us to our next point.

# 6) Total Vegetable Oil Production (substitutes) Over Time
palm oil prices are effected by the price and supply of substitute vegetable oils

In [61]:
oil_production = pd.read_csv(os.path.join(DATA, 'vegetable-oil-production.csv'))
oil_production.head()

Unnamed: 0,Entity,Code,Year,"Crops processed - Oil, soybean - 237 - Production - 5510 - tonnes","Crops processed - Oil, sesame - 290 - Production - 5510 - tonnes","Crops processed - Oil, linseed - 334 - Production - 5510 - tonnes","Crops processed - Oil, palm - 257 - Production - 5510 - tonnes","Crops processed - Oil, rapeseed - 271 - Production - 5510 - tonnes","Crops processed - Oil, groundnut - 244 - Production - 5510 - tonnes","Crops processed - Oil, cottonseed - 331 - Production - 5510 - tonnes","Crops processed - Oil, coconut (copra) - 252 - Production - 5510 - tonnes","Crops processed - Oil, olive, virgin - 261 - Production - 5510 - tonnes","Crops processed - Oil, safflower - 281 - Production - 5510 - tonnes","Crops processed - Oil, sunflower - 268 - Production - 5510 - tonnes","Crops processed - Oil, maize - 60 - Production - 5510 - tonnes","Crops processed - Oil, palm kernel - 258 - Production - 5510 - tonnes"
0,Afghanistan,AFG,1961,,2253.0,3531.0,,,,4997.0,,82.0,,2938.0,,
1,Afghanistan,AFG,1962,,1876.0,3701.0,,,,7716.0,,90.0,,3138.0,,
2,Afghanistan,AFG,1963,,1831.0,2857.0,,,,11742.0,,82.0,,3138.0,,
3,Afghanistan,AFG,1964,,2722.0,3377.0,,,,7960.0,,90.0,,3138.0,,
4,Afghanistan,AFG,1965,,2821.0,4327.0,,,,7926.0,,82.0,,3238.0,,


In [62]:
oil_production.shape

(11064, 16)

# 7) Palm oil uses

# 8) oil yield by crop (2018)

In [63]:
_yield = pd.read_csv(os.path.join(DATA, 'oil-yield-by-crop.csv'))
_yield.head()
# need to pivot this

Unnamed: 0,Entity,Code,Year,Oil yield (t/ha)
0,Coconut Oil,,2018,0.257735
1,Cottonseed Oil,,2018,0.141688
2,Groundnut Oil,,2018,0.183447
3,Olive Oil,,2018,0.339773
4,Palm Oil,,2018,2.835088


In [64]:
_yield.shape

(9, 4)

# Palm oil terms
We're mainly interested in RBD palm olein but its worth mentioning some others
- RBD: refined deodorised and bleached
- CPO: crude palm olein
- CPKO: crude palm kernel oil

# INCO terms
We're mainly interested in FOB but its work knowing some other INCO terms
- EXW: ex works
- FAS: free alongside
- FOB: free on board
- CFR: cost and freight
- CIF: cost, insurance & freight

# Ports of origin
we will reference 'malaysia', however in fact, there are two primary ports for exports of palm olein.
- Port Klang
- Kuala Lumpur
the often quoted 'FOB malaysia' is an index price, traded on Bursa Malaysia. In reality, the prices will differ slightly between the two ports

# Data Sources
- https://agropost.wordpress.com/
- This has a lot of great market data going back to 2010. however it doesn't seem to have the data as CSV
- https://www.fao.org/home/en
- FAO food and agriculture organisation. Part of the UN. Good data, high quality and easy to download, but lacking in granularity
- its meant to have a python api but its not clear if it is maintained. I am struggling to make a connection.

# Data Pipeline
Software automation solution to move data around. host data in a data warehouse & move the data to where you can use it.

http://tmgmdashboarding-env-1.eba-srqgw9ij.ap-southeast-2.elasticbeanstalk.com/

- data storage
- ETL
- preprocess data
- seasonality