# Full Analysis of Philippine Food prices from 2007 - 2025

## Aims:
    ### 1. Compare average prices across different regions for the same food item and year.
    ### 2. Examine how prices of a specific food item have changed over time within a region.
    ### 3. Identify regional price variations for different food items.
    ### 4. Predict future food prices based on historical data.

In [1]:
import pandas as pd
import numpy as np
from scipy.stats import shapiro
from scipy.stats import kruskal
import seaborn as sns
import statsmodels.api as sm
import matplotlib.pyplot as plt
from scipy.stats import levene
from ipywidgets import interact
import dash
from dash import dcc, html, Input, Output
import plotly.express as px

In [2]:
#open the file that was already cleaned, analyzed, and saved.
file_path = "../Data/Processed/eda_results.csv"
df = pd.read_csv(file_path)

#### Inspect the data

In [3]:
# View the first few rows of the dataset
print(df.head())

                                 Region Province Food_Items  year  mean  \
0  Autonomous region in Muslim Mindanao  Basilan      beans  2007   NaN   
1  Autonomous region in Muslim Mindanao  Basilan      beans  2008   NaN   
2  Autonomous region in Muslim Mindanao  Basilan      beans  2009   NaN   
3  Autonomous region in Muslim Mindanao  Basilan      beans  2010   NaN   
4  Autonomous region in Muslim Mindanao  Basilan      beans  2011   NaN   

   median  Mode  Range  Variance  Standard Deviation  IQR  
0     NaN   NaN    NaN       NaN                 NaN  NaN  
1     NaN   NaN    NaN       NaN                 NaN  NaN  
2     NaN   NaN    NaN       NaN                 NaN  NaN  
3     NaN   NaN    NaN       NaN                 NaN  NaN  
4     NaN   NaN    NaN       NaN                 NaN  NaN  


In [4]:
# View the columns and data types of the dataset
print(df.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 91200 entries, 0 to 91199
Data columns (total 11 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   Region              91200 non-null  object 
 1   Province            91200 non-null  object 
 2   Food_Items          91200 non-null  object 
 3   year                91200 non-null  int64  
 4   mean                78875 non-null  float64
 5   median              78875 non-null  float64
 6   Mode                36432 non-null  float64
 7   Range               78875 non-null  float64
 8   Variance            76125 non-null  float64
 9   Standard Deviation  76125 non-null  float64
 10  IQR                 78875 non-null  float64
dtypes: float64(7), int64(1), object(3)
memory usage: 7.7+ MB
None


In [5]:
# View the summary statistics of the dataset though this may not make sense as they are already described in the EDA report
print(df.describe())

               year          mean        median          Mode         Range  \
count  91200.000000  78875.000000  78875.000000  36432.000000  78875.000000   
mean    2016.000000    105.451058    104.863522     53.431795     19.928863   
std        5.477256     78.042087     77.886750     36.845196     29.255316   
min     2007.000000      3.913333      3.910000      4.000000      0.000000   
25%     2011.000000     45.180000     44.880000     21.550000      4.300000   
50%     2016.000000     83.240417     82.675000     50.880000     10.210000   
75%     2021.000000    157.035417    156.402500     82.030000     24.410000   
max     2025.000000    472.291667    470.625000    220.310000    524.850000   

           Variance  Standard Deviation           IQR  
count  76125.000000        76125.000000  78875.000000  
mean     122.240405            6.382062      7.540924  
std      563.636799            9.028331     10.880566  
min        0.000000            0.000000      0.000000  
25%     

In [6]:
#check the shape of the dataset
print(df.shape)

(91200, 11)


In [7]:
# get the first and last readings for each food item
print(df['year'].agg(['min', 'max']))

min    2007
max    2025
Name: year, dtype: int64


For comparing average prices across regions for the same food item and year, the closing price is generally the best price to use. Here’s why:

1. Consistency: The closing price is considered the most representative of the market's consensus for that period. It accounts for the entire trading session and reflects both supply and demand dynamics over time.
2. Standard Usage: The closing price is the most widely used price in financial markets, meaning data and analysis are typically focused around this price.
3. Simplicity: It eliminates the noise created by intra-day fluctuations, focusing on the price at the end of the trading session, which is more relevant for long-term comparisons.

In [8]:
# Filter the dataframe to keep only rows where 'food_item' starts with 'c_'
df_filtered = df[df['Food_Items'].str.startswith('c_')]

print(df_filtered)

                                     Region         Province  Food_Items  \
19     Autonomous region in Muslim Mindanao          Basilan     c_beans   
20     Autonomous region in Muslim Mindanao          Basilan     c_beans   
21     Autonomous region in Muslim Mindanao          Basilan     c_beans   
22     Autonomous region in Muslim Mindanao          Basilan     c_beans   
23     Autonomous region in Muslim Mindanao          Basilan     c_beans   
...                                     ...              ...         ...   
90302                           Region XIII  Surigao del Sur  c_tomatoes   
90303                           Region XIII  Surigao del Sur  c_tomatoes   
90304                           Region XIII  Surigao del Sur  c_tomatoes   
90305                           Region XIII  Surigao del Sur  c_tomatoes   
90306                           Region XIII  Surigao del Sur  c_tomatoes   

       year       mean  median  Mode  Range   Variance  Standard Deviation  \
19     20

In [36]:
# Due to having 18 regions in the Philippines, instead of creating a graph for each region, we will create a graph for each food item
# and make an interactive plot with a dropdown to select the food item.

# Initialize the Dash app
app = dash.Dash(__name__)

# Dropdown options for food items, removing c_ prefix
food_item_options = [{'label': item.replace('c_', ''), 'value': item} for item in df_filtered['Food_Items'].unique()]

# Dropdown options for mean/median
mean_median_options = [
    {'label': 'mean', 'value': 'mean'},
    {'label': 'median', 'value': 'median'}
]

# App layout
app.layout = html.Div([
    html.H1("Region-wise Food Data Visualization"),
    
    # Dropdown for selecting food item
    dcc.Dropdown(
        id='food_item_dropdown',
        options=food_item_options,
        value='Apples',  # Default food item
        style={'width': '50%'}
    ),
    
    # Dropdown for selecting mean or median
    dcc.Dropdown(
        id='mean_median_dropdown',
        options=mean_median_options,
        value='mean',  # Default selection is 'mean'
        style={'width': '50%'}
    ),
    
    # Graph to display the data
    dcc.Graph(id='region_graph')
])

# Callback to update the graph based on selected food item and mean/median
@app.callback(
    dash.dependencies.Output('region_graph', 'figure'),
    [
        dash.dependencies.Input('food_item_dropdown', 'value'),
        dash.dependencies.Input('mean_median_dropdown', 'value')
    ]
)
def update_graph(selected_food_item, selected_stat):
    # Handle the case where selected_food_item is None
    if not selected_food_item:
        return px.line(title="No food item selected")
    
    # Filter the data based on the selected food item
    filtered_df = df_filtered[df_filtered['Food_Items'] == selected_food_item]
    
    # Remove the 'c_' prefix for display purposes in the title
    display_food_item = selected_food_item.replace('c_', '')
    
    # Find the first and last year in the filtered dataset
    first_year = filtered_df['year'].min()
    last_year = filtered_df['year'].max()

    # Create a plot based on the selected statistic (mean or median)
    fig = px.line(
        filtered_df, 
        x='year', 
        y=selected_stat, 
        color='Region', 
        title=f'{selected_stat.capitalize()} for {display_food_item} by Region and Year',
        labels={selected_stat: selected_stat.capitalize(), 'year': 'Year', 'Region': 'Region'}
    )

    # Extend the x-axis to include the last year
    fig.update_xaxes(
        tickmode='linear',  # Ensure all years are shown on the x-axis
        tick0=first_year,   # Start from the first year
        dtick=1,            # Increment by 1 year
        range=[first_year, last_year + 1]  # Extend the range to include the last year
    )
    
    return fig


# Run the app
if __name__ == '__main__':
    app.run_server(debug=True, port=8050, mode='inline', name="app")
# Run the Dash apps with unique URLs


The graph shows the mean (average) price or median price of different food items over time for different regions in the Philippines.
#### Key Observation
1. Trend - Generally, the price of all food items has been increasing over the years across most regions. There are some fluctuations and periods of stability for some items, but the overall trend is upward.
2. Regional Differences - There is significant variation in price of different food items across regions. Some regions consistently have higher prices than others. Price fluctuations also varies between regions. Some regions experience more volatile prices than others.

Factors like weather patterns, government policies, agriculture changes, food transportation and global market trends can influence these prices.

To know which regions has the highest and lowest price every year for every food item, max, min will be calculated. Then range is next to know how wide the difference between the highest and lowest price.

In [23]:
file_path = "../Data/Interim/cleaned_food_prices.csv"
df_range = pd.read_csv(file_path)

#dropping columns related to food price index
df_nofpi_range = df_range.drop(columns=['o_food_price_index', 'h_food_price_index', 'l_food_price_index', 'c_food_price_index', 'inflation_food_price_index', 'trust_food_price_index'])

# Convert 'Date' column to datetime format
df_nofpi_range['Date'] = pd.to_datetime(df_nofpi_range['Date'])

#dropping columns related to inflation
df_noinf_range = df_nofpi_range.drop(columns=['inflation_beans','inflation_cabbage', 'inflation_carrots', 'inflation_eggs', 'inflation_garlic', 'inflation_meat_beef_chops', 'inflation_meat_chicken_whole', 'inflation_meat_pork', 'inflation_onions', 'inflation_potatoes', 'inflation_rice', 'inflation_tomatoes'])

#dropping columns related to trust scores
df_cleaned_range = df_noinf_range.drop(columns=['trust_beans','trust_cabbage', 'trust_carrots', 'trust_eggs', 'trust_garlic', 'trust_meat_beef_chops', 'trust_meat_chicken_whole', 'trust_meat_pork', 'trust_onions', 'trust_potatoes', 'trust_rice', 'trust_tomatoes'])

#dropping uneeded columns
df_unneeded_range = df_cleaned_range.drop(columns=['country', 'City', 'lat', 'lon', 'Province', 'Date', 'month'])

# Reshaping from wide to long format (including year and month as part of the identifier)
df_range = df_unneeded_range.melt(id_vars=['Region', 'year'], var_name='Food_Items', value_name='Price')

df_range_filtered = df_range[df_range['Food_Items'].str.startswith('c_')]

df_range_filtered.loc[:, 'Food_Items'] = df_range_filtered['Food_Items'].str.replace('c_', '', regex=True)

print(df_range_filtered)


                                   Region  year Food_Items  Price
354795   Cordillera Administrative region  2007      beans  84.71
354796   Cordillera Administrative region  2007      beans  84.03
354797   Cordillera Administrative region  2007      beans  83.63
354798   Cordillera Administrative region  2007      beans  83.91
354799   Cordillera Administrative region  2007      beans  83.76
...                                   ...   ...        ...    ...
1419175                    Market Average  2024   tomatoes  83.80
1419176                    Market Average  2024   tomatoes  83.34
1419177                    Market Average  2024   tomatoes  87.72
1419178                    Market Average  2024   tomatoes  79.34
1419179                    Market Average  2025   tomatoes  78.71

[283836 rows x 4 columns]


In [38]:
# Dash app setup
apph = dash.Dash(__name__)

apph.layout = html.Div([
    html.H3("Highest Prices Per Year by Region"),
    
    # Dropdown for selecting food items
    dcc.Dropdown(
        id='food-item-dropdown',
        options=[{'label': item, 'value': item} for item in df_range_filtered['Food_Items'].unique()],
        value=df_range_filtered['Food_Items'].unique()[0],
        placeholder="Select a food item"
    ),
    
    # Graph for displaying highest prices
    dcc.Graph(id='highest-price-graph')
])

# Callback to update the graph based on selected food item
@apph.callback(
    Output('highest-price-graph', 'figure'),
    [Input('food-item-dropdown', 'value')]
)
def update_graph(selected_food_item):
    # Filter data for the selected food item
    filtered_dfr = df_range_filtered[df_range_filtered['Food_Items'] == selected_food_item]
    
    # Find the highest price per year and the corresponding region
    highest_prices = (
        filtered_dfr.loc[filtered_dfr.groupby('year')['Price'].idxmax()]
        .reset_index(drop=True)
    )
    
    # Create the bar chart
    fig = px.bar(
        highest_prices,
        x='year',
        y='Price',
        color='Region',  # Highlight the region in the bar color
        title=f'Highest Prices Per Year for {selected_food_item}',
        labels={'Price': 'Price', 'year': 'Year', 'Region': 'Region'}
    )
    
    return fig

# Run the app
if __name__ == '__main__':
    apph.run_server(debug=True, port=8051, mode='inline', name="apph")

In [37]:
# Dash app setup
appl = dash.Dash(__name__)

appl.layout = html.Div([
    html.H3("lowest Prices Per Year by Region"),
    
    # Dropdown for selecting food items
    dcc.Dropdown(
        id='food-item-dropdown',
        options=[{'label': item, 'value': item} for item in df_range_filtered['Food_Items'].unique()],
        value=df_range_filtered['Food_Items'].unique()[0],
        placeholder="Select a food item"
    ),
    
    # Graph for displaying lowest prices
    dcc.Graph(id='lowest-price-graph')
])

# Callback to update the graph based on selected food item
@appl.callback(
    Output('lowest-price-graph', 'figure'),
    [Input('food-item-dropdown', 'value')]
)
def update_graph(selected_food_item):
    # Filter data for the selected food item
    filtered_dfr = df_range_filtered[df_range_filtered['Food_Items'] == selected_food_item]
    
    # Find the lowest price per year and the corresponding region
    lowest_prices = (
        filtered_dfr.loc[filtered_dfr.groupby('year')['Price'].idxmin()]
        .reset_index(drop=True)
    )
    
    # Create the bar chart
    fig = px.bar(
        lowest_prices,
        x='year',
        y='Price',
        color='Region',  # Highlight the region in the bar color
        title=f'Lowest Prices Per Year for {selected_food_item}',
        labels={'Price': 'Price', 'year': 'Year', 'Region': 'Region'}
    )
    
    return fig

# Run the app, changing port to make it have different URL and not interfere with other app
if __name__ == '__main__':
    appl.run_server(debug=True, port=8052, mode='inline', name="appl")

In [42]:
# Dash App Setup
apprange = dash.Dash(__name__)

apprange.layout = html.Div([
    html.H3("Price Range of Food Items per Year Across Regions"),
    
    # Dropdown for selecting food items
    dcc.Dropdown(
        id='food-item-dropdown',
        options=[{'label': item, 'value': item} for item in df_range_filtered['Food_Items'].unique()],
        value=df_range_filtered['Food_Items'].unique()[0],
        placeholder="Select a food item"
    ),
    
    # Graph for displaying the price range (min and max) per year
    dcc.Graph(id='price-range-graph')
])

# Callback to update the graph based on selected food item
@apprange.callback(
    Output('price-range-graph', 'figure'),
    [Input('food-item-dropdown', 'value')]
)
def update_graph(selected_food_item):
    # Filter data for the selected food item
    filtered_df = df_range_filtered[df_range_filtered['Food_Items'] == selected_food_item]
    
    # Get the minimum and maximum price per year
    price_range = (
        filtered_df.groupby('year')['Price']
        .agg(['min', 'max'])
        .reset_index()
    )
    
    # Create the graph (min and max prices as lines)
    fig = px.line(
        price_range, 
        x='year', 
        y=['min', 'max'], 
        title=f'Price Range (Min & Max) Per Year for {selected_food_item} Across Regions',
        labels={'year': 'Year', 'value': 'Price'},
        line_shape='linear'
    )
    
    return fig

# Run the app
apprange.run_server(debug=True, port=8053, mode='inline', name="apprange")

Key Observations:
1. Overall Trend - Both the minimum and maximum prices show a general upward trend over the years, indicating an overall increase in the price of every food items. Some even recorded highest price increase on 2020, likely due to the COVID-19 pandemic and its impact on supply chains.
2. Price Range - The gap between the minimum and maximum prices for most items widens significantly in 2020 and 2021. This suggests that there were greater variations in prices across regions during these years.

Posssible causes:
1. COVID-19 Pandemic - The pandemic disrupted supply chains and increased demand for certain food items, which could have led to price volatility.
2. Regional Differences - Differences in the impact of the pandemic on regional economies and agricultural production could have contributed to price variations.