<a href="https://colab.research.google.com/github/pratikagithub/All-About-Data-Analyst/blob/main/Consumer_Price_Index_Analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Consumer Price Index (CPI) Analysis involves tracking the average price change over time for a basket of goods and services typically consumed by households. It serves as a primary measure of inflation, which helps companies and governments understand purchasing power trends, inflationary pressures, and economic stability.

In [2]:
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
from statsmodels.tsa.seasonal import seasonal_decompose
from google.colab import files
uploaded = files.upload()
cpi_data = pd.read_csv("cpi.csv")
print(cpi_data.head())

Saving cpi.csv to cpi (1).csv
        Sector  Year     Month  Cereals and products  Meat and fish    Egg  \
0        Rural  2013   January                 107.5          106.3  108.1   
1        Urban  2013   January                 110.5          109.1  113.0   
2  Rural+Urban  2013   January                 108.4          107.3  110.0   
3        Rural  2013  February                 109.2          108.7  110.2   
4        Urban  2013  February                 112.9          112.9  116.9   

   Milk and products  Oils and fats  Fruits  Vegetables  ...  Housing  \
0              104.9          106.1   103.9       101.9  ...      NaN   
1              103.6          103.4   102.3       102.9  ...    100.3   
2              104.4          105.1   103.2       102.2  ...    100.3   
3              105.4          106.7   104.0       102.4  ...      NaN   
4              104.0          103.5   103.1       104.9  ...    100.4   

   Fuel and light  Household goods and services  Health  \
0  

During the initial analysis of this dataset, I found that some of the month values contain extra whitespace, which can cause errors in parsing. So, I’ll clean up the data before the data conversion to ensure smooth analysis. I also noticed a typo in the Month column, such as “Marcrh” instead of “March”. I’ll check for such inconsistencies, correct them, and then proceed with the analysis:

In [3]:
cpi_data['Month'] = cpi_data['Month'].str.strip()
cpi_data['Month'] = cpi_data['Month'].replace('Marcrh', 'March')
cpi_data['Date'] = pd.to_datetime(cpi_data['Year'].astype(str) + '-' + cpi_data['Month'], format='%Y-%B')

**Inflation Trend Analysis**

Now, I will analyze the general CPI index over time for the Rural+Urban sector. This trend can help in identifying periods of inflationary spikes or stability:

In [4]:
# filter for "Rural+Urban" sector
rural_urban_cpi = cpi_data[cpi_data['Sector'] == 'Rural+Urban'].sort_values('Date')

# inflation trend analysis
fig = px.line(rural_urban_cpi, x='Date', y='General index', title='Inflation Trend Analysis (General CPI Index)')
fig.update_layout(xaxis_title='Date', yaxis_title='CPI - General Index')
fig.show()

From around 2013 to 2023, there is a steady increase in the CPI in India, which reflects a continuous rise in inflation. The general upward trend suggests that the cost of goods and services has gradually increased over this period, with occasional fluctuations. The sharp rise in the last few years points to a significant inflationary impact, especially around and after 2020.

**Seasonal and Cyclical Patterns**

Now, I’ll decompose the CPI data into seasonal, trend, and residual components to identify patterns:

In [5]:
# seasonal and cyclical patterns
rural_urban_cpi.set_index('Date', inplace=True)
monthly_cpi = rural_urban_cpi['General index'].resample('M').mean().interpolate(method='linear')
decomposition = seasonal_decompose(monthly_cpi, model='multiplicative', period=12)

fig = go.Figure()
fig.add_trace(go.Scatter(x=decomposition.observed.index, y=decomposition.observed, mode='lines', name='Observed'))
fig.add_trace(go.Scatter(x=decomposition.trend.index, y=decomposition.trend, mode='lines', name='Trend'))
fig.add_trace(go.Scatter(x=decomposition.seasonal.index, y=decomposition.seasonal, mode='lines', name='Seasonal'))
fig.add_trace(go.Scatter(x=decomposition.resid.index, y=decomposition.resid, mode='lines', name='Residual'))
fig.update_layout(title='Seasonal Decomposition of CPI (Observed, Trend, Seasonal, Residual)', xaxis_title='Date')
fig.show()


'M' is deprecated and will be removed in a future version, please use 'ME' instead.



The trend line (in red) closely follows the observed CPI values, which indicates a steady upward trend over time. The seasonal component (in green) is minimal, which suggests little seasonal fluctuation in the CPI. The residual component (in purple) is close to zero, which indicates minimal random variation, which implies that the CPI trend is consistent and primarily driven by long-term factors rather than seasonal or irregular influences.

**Comparison Across Sectors or Regions**

Now, let’s compare the average CPI across different sectors (Rural, Urban, Rural+Urban):

In [6]:
# comparison across sectors or regions
sector_cpi_means = cpi_data.groupby(['Sector'])['General index'].mean().reset_index()
fig = px.bar(sector_cpi_means, x='Sector', y='General index', title='Average CPI Comparison Across Sectors (Rural, Urban, Rural+Urban)')
fig.update_layout(xaxis_title='Sector', yaxis_title='Average CPI - General Index')
fig.show()

The CPI values are relatively consistent across all sectors, with only slight differences, which indicates that inflation, as measured by the CPI, affects rural and urban areas similarly. This suggests that price changes in goods and services are fairly uniform across these regions.

**Correlation with Economic Indicators**

Now, let’s examine the correlation between various categories within the CPI (e.g., Food, Fuel, Health) and the overall General index:

In [7]:
# replace non-numeric values with NaN and ensure all columns are numeric
cpi_categories = cpi_data[['Cereals and products', 'Meat and fish', 'Egg', 'Milk and products', 'Oils and fats',
                           'Fruits', 'Vegetables', 'Fuel and light', 'Housing', 'Health', 'Transport and communication',
                           'Recreation and amusement', 'Education', 'Personal care and effects', 'Miscellaneous', 'General index']]
cpi_categories = cpi_categories.apply(pd.to_numeric, errors='coerce')  # convert to numeric

# calculate the correlation matrix
correlation_matrix = cpi_categories.corr()

# plot the correlation matrix as a heatmap
fig = px.imshow(correlation_matrix, text_auto=True, color_continuous_scale='RdBu_r', zmin=-1, zmax=1,
                title='Correlation between CPI Categories and General Index')
fig.update_layout(xaxis_title='CPI Category', yaxis_title='CPI Category')
fig.show()

Categories such as Housing, Transport and communication, and Miscellaneous show high positive correlations with each other and with the overall index, which suggests that changes in these categories have a significant impact on the general CPI. Conversely, categories like Egg and Vegetables show relatively lower correlations with other categories, which indicates more independent or variable price movements in these areas.

**CPI and Specific Sector Analysis**

Now, let’s analyze the inflation trends within specific sectors over time:

In [8]:
# CPI and specific sector analysis
sectors_to_analyze = ['Fuel and light', 'Health', 'Housing', 'Cereals and products']
sector_data = rural_urban_cpi[sectors_to_analyze].fillna(method='ffill').reset_index()

fig = go.Figure()
for sector in sectors_to_analyze:
    fig.add_trace(go.Scatter(x=sector_data['Date'], y=sector_data[sector], mode='lines', name=sector))
fig.update_layout(title='CPI Trends for Selected Sectors', xaxis_title='Date', yaxis_title='CPI Value')
fig.show()


DataFrame.fillna with 'method' is deprecated and will raise in a future version. Use obj.ffill() or obj.bfill() instead.



Each sector shows a general upward trend over time, which indicates rising prices. Fuel and light have experienced the steepest increase, particularly after 2020, which reflectes higher inflation in this category. Health and Housing have followed a more gradual, steady increase over the years, with Health showing a relatively consistent rise. Cereals and products, while generally increasing, show more fluctuations, particularly around 2020, which indicates price volatility in this category.


**Event-Based Analysis (COVID-19 Periods)**

Now, let’s analyze CPI trends specifically during the COVID-19 period (2020-2021):

In [9]:
# event-based analysis (COVID-19 Period)
covid_period = rural_urban_cpi[(rural_urban_cpi.index >= '2020-01-01') & (rural_urban_cpi.index <= '2021-12-31')][sectors_to_analyze + ['General index']].fillna(method='ffill').reset_index()

fig = go.Figure()
fig.add_trace(go.Scatter(x=covid_period['Date'], y=covid_period['General index'], mode='lines', name='General CPI Index', line=dict(width=2, color='black')))
for sector in sectors_to_analyze:
    fig.add_trace(go.Scatter(x=covid_period['Date'], y=covid_period[sector], mode='lines', name=sector))
fig.update_layout(title='CPI Trends During COVID-19 Period (2020-2021)', xaxis_title='Date', yaxis_title='CPI Value')
fig.show()


DataFrame.fillna with 'method' is deprecated and will raise in a future version. Use obj.ffill() or obj.bfill() instead.



The Health and Housing sectors experienced notable increases, with Health showing a steady rise and Housing seeing a sharper increase from early 2021. Fuel and light saw a significant decline in early 2020, possibly due to reduced demand during lockdowns, followed by a steep rise in 2021 as economic activities resumed. Cereals and products remained relatively stable with minor fluctuations. Overall, the graph reflects the varied inflationary impacts of COVID-19 across these sectors, with essentials like health and housing showing resilience and growth.