# 1. Introduction: Business Goal & Problem Definition

IF YOU LIKE IT OR IF IT HELPS YOU SOMEHOW, COULD YOU PLEASE UPVOTE? THANK YOU VERY MUCH!!!

This project´s goal is doing an exploratory data analysis (EDA) in the World-Wide Commodity Prices to help investors and companies gain visibility on commodities prices along the time, from 1980 to 2016, analyzing the data set to summarize its main characteristics using several different visual methods. The available dataset brings 53 different commodities to be analyzed. Please look at the conclusion’s comments in the last section.


# 2. Importing Basic Libraries

In [None]:
!pip install openpyxl
import io
import openpyxl
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px

# 3. Data Collection

In [None]:
commodity_ds = pd.read_csv("../input/usa-commodity-prices/commodity-prices-2016.csv", sep=",")

commodity_ds

# 4. Data Preliminary Exploration

In [None]:
#Checking a dataset sample

pd.set_option("display.max_rows", 100)
pd.set_option("display.max_columns", 100)
pd.options.display.float_format="{:,.2f}".format

commodity_ds.sample(n=10, random_state=0)

In [None]:
#Checking dataset info by feature

commodity_ds.info(verbose=True, null_counts=True)

In [None]:
#Checking the existence of zeros in rows

(commodity_ds==0).sum(axis=0).to_excel("commodity_ds_zeros_per_feature.xlsx")
(commodity_ds==0).sum(axis=0)

In [None]:
#Checking the existence of duplicated rows

commodity_ds.duplicated().sum()

In [None]:
#Checking basic statistical data by feature

describe = commodity_ds.describe(include="all")
std_percentage = pd.DataFrame(describe.iloc[5,:]/describe.iloc[4,:]).T
describe_with_percentage = describe.append(std_percentage)
describe_with_percentage

# 5. Data Preparation

    We´ll perform the following:
    
    1. Remove consolidated index columns - in this exercise we´ll analyze every product individually


    2. Convert "Date" to datetime datatype
        2.1 Join "Month"&"Year"for plotting
    
    
    * No duplications found
    * No missing, zero or invalid values to treat
    * No calculated columns to create
    * No outliers found
    * The entire dataset will be taken

In [None]:
#1

commodity_ds.drop(["All Commodity Price Index", "Non-Fuel Price Index", "Food and Beverage Price Index", "Food Price Index", 
                   "Beverage Price Index", "Industrial Inputs Price Index", "Agricultural Raw Materials Index",
                   "Metals Price Index", "Fuel Energy Index", "Crude Oil petroleum"], axis=1, inplace=True)

#2

commodity_ds["Date"] = pd.to_datetime(commodity_ds["Date"])
commodity_ds["Month/Year"] = commodity_ds["Date"].dt.strftime("%m/%Y")

# 6. Data Exploration

# 6.1.1 Agricultural-Soft Market Prices Along the Time - Line Chart

In [None]:
#Things you drink, such as sugar, cocoa, coffee, and orange juice. These are called the softs markets.

fig = px.line(commodity_ds, x="Month/Year", y=["Bananas", "Coffee Other Mild Arabicas", "Coffee Robusta", "Olive Oil", "Oranges",
                                               "Palm oil", "Soybean Oil", "Sugar European import price", "Sugar Free Market",
                                               "Sugar U.S. import price", "Sunflower oil", "Tea"], height=1000, width=2000)

fig.update_layout(title="Agricultural-Soft Market Prices", xaxis_title="Month/Year", yaxis_title="Price Index", legend_title="Product")

# 6.1.2 Agricultural-Soft Market Prices Along the Time - Histogram

In [None]:
fig = px.histogram(commodity_ds, x="Month/Year", y=["Bananas", "Coffee Other Mild Arabicas", "Coffee Robusta", "Olive Oil", "Oranges",
                                                    "Palm oil", "Soybean Oil", "Sugar European import price", "Sugar Free Market",
                                                    "Sugar U.S. import price", "Sunflower oil", "Tea"], height=1000, width=2000)

fig.update_layout(title="Agricultural-Soft Market Prices", xaxis_title="Month/Year", yaxis_title="Price Index", legend_title="Product")

# 6.1.3 Agricultural-Grains Prices Along the Time - Line Chart

In [None]:
#Grains, such as wheat, soybeans, soybean oil, rice, oats, and corn.

fig = px.line(commodity_ds, x="Month/Year", y=["Barley", "Cocoa beans", "Groundnuts peanuts", "Maize corn", "Rice", "Soybean Meal",
                                               "Soybeans", "Wheat"], height=1000, width=2000)

fig.update_layout(title="Agricultural-Grains Prices", xaxis_title="Month/Year", yaxis_title="Price Index", legend_title="Product")

# 6.1.4 Agricultural-Grains Prices Along the Time - Histogram

In [None]:
fig = px.histogram(commodity_ds, x="Month/Year", y=["Barley", "Cocoa beans", "Groundnuts peanuts", "Maize corn", "Rice", "Soybean Meal",
                                                    "Soybeans", "Wheat"], height=1000, width=2000)

fig.update_layout(title="Agricultural-Grains Prices", xaxis_title="Month/Year", yaxis_title="Price Index", legend_title="Product")

# 6.1.5 Agricultural-Not Eatable Prices Along the Time - Line Chart

In [None]:
#Things you wouldn't eat, such as cotton and lumber.

fig = px.line(commodity_ds, x="Month/Year", y=["Cotton", "Soft Logs", "Hard Logs", "Rubber", "Soft Sawnwood"], 
                                               height=1000, width=2000)

fig.update_layout(title="Agricultural-Not Eatable Prices", xaxis_title="Month/Year", yaxis_title="Price Index", legend_title="Product")

# 6.1.6 Agricultural-Not Eatable Prices Along the Time - Histogram

In [None]:
fig = px.histogram(commodity_ds, x="Month/Year", y=["Cotton", "Soft Logs", "Hard Logs", "Rubber", "Soft Sawnwood"], 
                                                    height=1000, width=2000)

fig.update_layout(title="Agricultural-Not Eatable Prices", xaxis_title="Month/Year", yaxis_title="Price Index", legend_title="Product")

# 6.2.1 Livestock Prices Along the Time - Line Chart

In [None]:
#Domesticated animals raised in an agricultural setting to produce labor and commodities such as meat, eggs, milk, fur, leather, and wool.

fig = px.line(commodity_ds, x="Month/Year", y=["Beef", "Fishmeal", "Hides", "Lamb", "Swine - pork", "Poultry chicken",
                                               "Fish salmon", "Hard Sawnwood","Shrimp", "Wool coarse", "Wool fine"],
                                               height=1000, width=2000)

fig.update_layout(title="Livestock Prices", xaxis_title="Month/Year", yaxis_title="Price Index", legend_title="Product")

# 6.2.2 Livestock Prices Along the Time - Histogram

In [None]:
fig = px.histogram(commodity_ds, x="Month/Year", y=["Beef", "Fishmeal", "Hides", "Lamb", "Swine - pork", "Poultry chicken",
                                                    "Fish salmon", "Hard Sawnwood","Shrimp", "Wool coarse", "Wool fine"],
                                                    height=1000, width=2000)

fig.update_layout(title="Livestock Prices", xaxis_title="Month/Year", yaxis_title="Price Index", legend_title="Product")

# 6.3.1 Metal Prices Along the Time - Line Chart

In [None]:
#Metals include mined commodities, such as gold, copper, silver, and platinum.

fig = px.line(commodity_ds, x="Month/Year", y=["Aluminum", "Copper", "China import Iron Ore Fines 62% FE spot", "Lead",
                                               "Nickel", "Tin", "Uranium", "Zinc"], height=1000, width=2000)

fig.update_layout(title="Metal Prices", xaxis_title="Month/Year", yaxis_title="Price Index", legend_title="Product")

# 6.3.2 Metal Prices Along the Time - Histogram

In [None]:
fig = px.histogram(commodity_ds, x="Month/Year", y=["Aluminum", "Copper", "China import Iron Ore Fines 62% FE spot", "Lead",
                                                    "Nickel", "Tin", "Uranium", "Zinc"], height=1000, width=2000)

fig.update_layout(title="Metal Prices", xaxis_title="Month/Year", yaxis_title="Price Index", legend_title="Product")

# 6.4.1 Energy Prices Along the Time - Line Chart

In [None]:
#The energy category includes crude oil, RBOB gasoline, natural gas, and heating oil.

fig = px.line(commodity_ds, x="Month/Year", y=["Coal", "Rapeseed oil", "Natural Gas - Russian Natural Gas border price in Germany",
                                               "Natural Gas - Indonesian Liquefied Natural Gas in Japan", "Natural Gas - Spot price at the Henry Hub terminal in Louisiana",
                                               "Crude Oil - petroleum-simple average of three spot prices", "Crude Oil - petroleum - Dated Brent light blend",
                                               "Oil Dubai", "Crude Oil petroleum - West Texas Intermediate 40 API"], height=1000, width=2000)

fig.update_layout(title="Energy Prices", xaxis_title="Month/Year", yaxis_title="Price Index", legend_title="Product")

# 6.4.2 Energy Prices Along the Time - Histogram

In [None]:
fig = px.histogram(commodity_ds, x="Month/Year", y=["Coal", "Rapeseed oil", "Natural Gas - Russian Natural Gas border price in Germany",
                                                    "Natural Gas - Indonesian Liquefied Natural Gas in Japan", "Natural Gas - Spot price at the Henry Hub terminal in Louisiana",
                                                    "Crude Oil - petroleum-simple average of three spot prices", "Crude Oil - petroleum - Dated Brent light blend",
                                                    "Oil Dubai", "Crude Oil petroleum - West Texas Intermediate 40 API"], height=1000, width=2000)

fig.update_layout(title="Energy Prices", xaxis_title="Month/Year", yaxis_title="Price Index", legend_title="Product")

In [None]:
#Alternatively using Profile Report to see variables statistics and correlations

# from pandas_profiling import ProfileReport
# profile = ProfileReport(commodity_ds, title="Worldwide Commodity Prices EDA")
# profile.to_file(output_file="Worldwide Commodity Prices EDA.html")

# 7. Correlations Analysis

# 7.1 Agricultural

In [None]:
#Plotting a Heatmap

sns.set(font_scale=1)

fig, ax = plt.subplots(1, figsize=(20,20))
sns.heatmap(commodity_ds[["Bananas", "Coffee Other Mild Arabicas", "Coffee Robusta", "Olive Oil", "Oranges", "Palm oil",
                          "Soybean Oil", "Sugar European import price", "Sugar Free Market", "Sugar U.S. import price",
                          "Sunflower oil", "Tea", "Barley", "Cocoa beans", "Groundnuts peanuts", "Maize corn", "Rice",
                          "Soybean Meal", "Soybeans", "Wheat", "Cotton", "Soft Logs", "Hard Logs", "Rubber",
                          "Soft Sawnwood"]].corr(), annot=True, fmt=",.2f")
plt.title("Agricultural Heatmap Correlation", fontsize=20)
plt.tick_params(labelsize=12)
plt.xticks(rotation=90)
plt.yticks(rotation=45)

#Printing Sorted Correlation List
commodity_ds[["Bananas", "Coffee Other Mild Arabicas", "Coffee Robusta", "Olive Oil", "Oranges", "Palm oil",
                          "Soybean Oil", "Sugar European import price", "Sugar Free Market", "Sugar U.S. import price",
                          "Sunflower oil", "Tea", "Barley", "Cocoa beans", "Groundnuts peanuts", "Maize corn", "Rice",
                          "Soybean Meal", "Soybeans", "Wheat", "Cotton", "Soft Logs", "Hard Logs", "Rubber",
                          "Soft Sawnwood"]].corr().unstack().sort_values().to_excel("agricultural_corr.xlsx")


#Plotting a Pairplot

sns.pairplot(commodity_ds[["Bananas", "Coffee Other Mild Arabicas", "Coffee Robusta", "Olive Oil", "Oranges", "Palm oil",
                           "Soybean Oil", "Sugar European import price", "Sugar Free Market", "Sugar U.S. import price",
                           "Sunflower oil", "Tea", "Barley", "Cocoa beans", "Groundnuts peanuts", "Maize corn", "Rice",
                           "Soybean Meal", "Soybeans", "Wheat", "Cotton", "Soft Logs", "Hard Logs", "Rubber",
                           "Soft Sawnwood"]])

# 7.2 Livestock

In [None]:
#Plotting a Heatmap

sns.set(font_scale=1)

fig, ax = plt.subplots(1, figsize=(20,20))
sns.heatmap(commodity_ds[["Beef", "Fishmeal", "Hides", "Lamb", "Swine - pork", "Poultry chicken", "Fish salmon", "Hard Sawnwood",
                          "Shrimp", "Wool coarse", "Wool fine"]].corr(), annot=True, fmt=",.2f")
plt.title("Livestock Heatmap Correlation", fontsize=20)
plt.tick_params(labelsize=12)
plt.xticks(rotation=90)
plt.yticks(rotation=45)

#Printing Sorted Correlation List
commodity_ds[["Beef", "Fishmeal", "Hides", "Lamb", "Swine - pork", "Poultry chicken", "Fish salmon", "Hard Sawnwood",
                          "Shrimp", "Wool coarse", "Wool fine"]].corr().unstack().sort_values().to_excel("livestock_corr.xlsx")


#Plotting a Pairplot

sns.pairplot(commodity_ds[["Beef", "Fishmeal", "Hides", "Lamb", "Swine - pork", "Poultry chicken", "Fish salmon", "Hard Sawnwood",
                          "Shrimp", "Wool coarse", "Wool fine"]])

# 7.3 Metal

In [None]:
#Plotting a Heatmap

sns.set(font_scale=1)

fig, ax = plt.subplots(1, figsize=(20,20))
sns.heatmap(commodity_ds[["Aluminum", "Copper", "China import Iron Ore Fines 62% FE spot", "Lead", "Nickel", "Tin", "Uranium",
                          "Zinc"]].corr(), annot=True, fmt=",.2f")
plt.title("Metal Heatmap Correlation", fontsize=20)
plt.tick_params(labelsize=12)
plt.xticks(rotation=90)
plt.yticks(rotation=45)

#Printing Sorted Correlation List
commodity_ds[["Aluminum", "Copper", "China import Iron Ore Fines 62% FE spot", "Lead", "Nickel", "Tin", "Uranium",
                          "Zinc"]].corr().unstack().sort_values().to_excel("metal_corr.xlsx")


#Plotting a Pairplot

sns.pairplot(commodity_ds[["Aluminum", "Copper", "China import Iron Ore Fines 62% FE spot", "Lead", "Nickel", "Tin", "Uranium",
                           "Zinc"]])

# 7.4 Energy

In [None]:
#Plotting a Heatmap

sns.set(font_scale=1)

fig, ax = plt.subplots(1, figsize=(20,20))
sns.heatmap(commodity_ds[["Coal", "Rapeseed oil", "Natural Gas - Russian Natural Gas border price in Germany",
                          "Natural Gas - Indonesian Liquefied Natural Gas in Japan", "Natural Gas - Spot price at the Henry Hub terminal in Louisiana",
                          "Crude Oil - petroleum-simple average of three spot prices", "Crude Oil - petroleum - Dated Brent light blend",
                          "Oil Dubai", "Crude Oil petroleum - West Texas Intermediate 40 API"]].corr(), annot=True, fmt=",.2f")
plt.title("Energy Heatmap Correlation", fontsize=20)
plt.tick_params(labelsize=12)
plt.xticks(rotation=90)
plt.yticks(rotation=45)

#Printing Sorted Correlation List
commodity_ds[["Coal", "Rapeseed oil", "Natural Gas - Russian Natural Gas border price in Germany",
                          "Natural Gas - Indonesian Liquefied Natural Gas in Japan", "Natural Gas - Spot price at the Henry Hub terminal in Louisiana",
                          "Crude Oil - petroleum-simple average of three spot prices", "Crude Oil - petroleum - Dated Brent light blend",
                          "Oil Dubai", "Crude Oil petroleum - West Texas Intermediate 40 API"]].corr().unstack().sort_values().to_excel("energy_corr.xlsx")


#Plotting a Pairplot

sns.pairplot(commodity_ds[["Coal", "Rapeseed oil", "Natural Gas - Russian Natural Gas border price in Germany",
                           "Natural Gas - Indonesian Liquefied Natural Gas in Japan", "Natural Gas - Spot price at the Henry Hub terminal in Louisiana",
                           "Crude Oil - petroleum-simple average of three spot prices", "Crude Oil - petroleum - Dated Brent light blend",
                           "Oil Dubai", "Crude Oil petroleum - West Texas Intermediate 40 API"]])

# 8. Conclusions

    
    Initial Considerations:
    
    * Price increase usually happens when there´s product shortage or demand increase, following the supply and demand law. Price devaluation, on the other hand, usually happens when there´s product excess in the market or demand decrease.
    
    * Volatile markets are usually characterized by wide price fluctuations and heavy trading. High volatility often results from an imbalance of trade orders in one direction (for example, all buys and no sells), or it could be due to more speculation, meaning short sellers and institutional investors. It´s a riskier market, with the potential to bring higher profits or losses.
    
    * A positive correlation may indicate the crops share common price-determining factors. Common price determinants for Soybeans and Corn for example are substitutability, demand, biofuels, the value of the U.S. dollar, weather, and crude oil. It´s also important to notice that investors looking to build a well-diversified portfolio will often look to add stocks with such a negative correlation so that as some parts of a portfolio fall in price, others necessarily rise.
    
    * It´s important understanding & discussing within the business team those behaviors and what we could expect to the future.
    
    
    1. Agricultural
        1.1 Most valuable products in 1980 were Cocoa beans at $ 3,167, Olive Oil at $ 2,272 and Groundnuts peanuts at $ 980, while in 2016 were Olive Oil at $ 4,546, Cocoa beans at $ 2,916 and Groundnuts peanuts at $ 1,830.
        1.2 Highest prices increase in all historical series happened in Bananas at 162%, Soft Sawnwood at 125% and Olive Oil at 100%. At the same time, the highest prices devaluations were in Coffee Robusta at -50%, Sugar European at -26% and Cotton at -25%.
        1.3 Products with the highest standard deviation% were Cocoa beans at 59%, Olive Oil at 35% and Bananas at 8%. On the other hand, the most stable products prices were Sugar European and Sugar Import at 0%, followed by Sugar Free Market at 1%.
        1.4 Most positively correlated prices were found in Soybeans vs Maize corn at 93%, Barley vs Maize corn at 92% and Barley vs Soybean Oil at 90%. However, most negatively related prices were in Coffee Robusta vs Soft Logs and Soft Sawnwood at -43%, and Coffee Robusta vs Olive Oil at -29%.


    2. Livestock
        2.1 Most valuable products in 1980 were Fishmeal at $ 987, Wool fine at $ 684 and Wool coarse at $ 553, while in 2016 were Fishmeal at $ 1,456, Wool fine at $ 1,017 and Wool coarse at $ 957.
        2.2 Highest prices increase in all historical series happened in Poultry chicken at 231%, Hard Sawnwood at 162% and Wool coarse at 73%. At the same time, the highest prices devaluations were in Lamb at -18%, Fish salmon at -18% and Shrimp at -17%.
        2.3 Products with the highest standard deviation% were Fishmeal at 50%, Hard Sawnwood at 34% and Wool coarse at 14%. On the other hand, the most stable products prices were Fish salmon and Shrimp at 0%, followed by Poultry chicken at 2%.
        2.4 Most positively correlated prices were found in Fishmeal vs Wool coarse and Beef vs Fishmeal at 85% and Hard Sawnwood vs Poultry chicken at 82%. However, most negatively related prices were in Swine - pork vs Hard Sawnwood at -46%, Lamb vs Swine - pork at -40%, and Lamb vs Fish salmon at -35%.


    3. Metal
        3.1 Most valuable products in 1980 were Tin at $ 16,974, Nickel at $ 6,585 and Copper at $ 2,593, while in 2016 were Tin at $ 15,610, Nickel at $ 6,585 and Copper at $ 2,593.
        3.2 Highest prices increase in all historical series happened in China import Iron Ore Fines 62% FE spot at 280%, Zinc at 121% and Copper at 77%. At the same time, the highest prices devaluations were in Aluminum at -25%, Uranium at -16% and Tin at -8%.
        3.3 Products with the highest standard deviation% were China import Iron Ore Fines 62% FE spot at 123%, Uranium at 82% and Lead at 71%. On the other hand, the most stable products prices were Aluminum at 28%, Zinc at 50% and Tin at 58%.
        3.4 Most positively correlated prices were found in Copper vs Lead at 94%, Nickel vs Zinc at 89% and Copper vs China import Iron Ore Fines 62% FE spot at 86%. However, most negatively related prices were in Aluminum vs Tin at 45%, Aluminum vs China import Iron Ore Fines 62% FE spot at 47%, and Tin vs Zinc at 50%.


    4. Energy
        4.1 Most valuable products in 1980 were Rapessed oil at $ 592, Crude Oil - petroleum - Dated Brent light blend at $ 40 and Coal at $ 39.70, while in 2016 were Rapessed oil at $ 779, Coal at $ 55 and Crude Oil - petroleum - Dated Brent light blend at $ 33.
        4.2 Highest prices increase in all historical series happened in Natural Gas - Indonesian Liquefied Natural Gas in Japan at 137%, and Coal and Natural Gas - Spot price at the Henry Hub terminal in Louisiana at 37%. At the same time, the highest prices devaluations were in Oil Dubai at -22%, Crude Oil petroleum - West Texas Intermediate 40 API at -18% and Crude Oil - petroleum - Dated Brent light blend at -17%.
        4.3 Products with the highest standard deviation% were Oil Dubai at 78%, Crude Oil - petroleum - Dated Brent light blend at 75% and Crude Oil - petroleum-simple average of three spot prices at 74%. On the other hand, the most stable products prices were Rapessed oil at 44%, Coal at 57% and Natural Gas - Spot price at the Henry Hub terminal in Louisiana at 58%.
        4.4 Most positively correlated prices were found in Crude Oil - petroleum - Dated Brent light blend vs Crude Oil petroleum - West Texas Intermediate 40 API and Crude Oil petroleum - West Texas Intermediate 40 API vs Oil Dubai at 99%, followed by Natural Gas - Indonesian Liquefied Natural Gas in Japan vs Oil Dubai at 93%. However, most negatively related prices were in Natural Gas - Spot price at the Henry Hub terminal in Louisiana vs Natural Gas - Indonesian Liquefied Natural Gas in Japan at 24%, Natural Gas - Spot price at the Henry Hub terminal in Louisiana vs Coal at 38% and Natural Gas - Spot price at the Henry Hub terminal in Louisiana vs Oil Dubai at 42%.
