This is the starting point for the zara project. We start by importing numpy and pandas packages.

In [1]:
import numpy as np
import pandas as pd

Next we move on to uploading the dataset. The original version is by Varisha and available on Kaggle as Zara Clothes US (2024).

In [2]:
zd = pd.read_csv("C:/Users/48679025/Downloads/zara_clothes.csv")
zd.head()

Unnamed: 0,Product ID,Product Position,Promotion,Product Category,Seasonal,Sales Volume,brand,url,sku,name,description,price,currency,scraped_at,terms,section
0,185102,Aisle,No,Clothing,No,2823,Zara,https://www.zara.com/us/en/basic-puffer-jacket...,272145190-250-2,BASIC PUFFER JACKET,Puffer jacket made of tear-resistant ripstop f...,19.99,USD,2024-02-19T08:50:05.654618,jackets,MAN
1,188771,Aisle,No,Clothing,No,654,Zara,https://www.zara.com/us/en/tuxedo-jacket-p0889...,324052738-800-46,TUXEDO JACKET,Straight fit blazer. Pointed lapel collar and ...,169.0,USD,2024-02-19T08:50:06.590930,jackets,MAN
2,180176,End-cap,Yes,Clothing,Yes,2220,Zara,https://www.zara.com/us/en/slim-fit-suit-jacke...,335342680-800-44,SLIM FIT SUIT JACKET,Slim fit jacket. Notched lapel collar. Long sl...,129.0,USD,2024-02-19T08:50:07.301419,jackets,MAN
3,112917,Aisle,Yes,Clothing,Yes,1568,Zara,https://www.zara.com/us/en/stretch-suit-jacket...,328303236-420-44,STRETCH SUIT JACKET,Slim fit jacket made of viscose blend fabric. ...,129.0,USD,2024-02-19T08:50:07.882922,jackets,MAN
4,192936,End-cap,No,Clothing,Yes,2942,Zara,https://www.zara.com/us/en/double-faced-jacket...,312368260-800-2,DOUBLE FACED JACKET,Jacket made of faux leather faux shearling wit...,139.0,USD,2024-02-19T08:50:08.453847,jackets,MAN


Let's start by taking a quick look into the distribution of Product Position. As we can see, Aisle dominates the other two positions. We move on to check how much sales get affected when position changes from Aisle.

In [4]:
zd['Product Position'].value_counts()

Aisle             97
End-cap           86
Front of Store    69
Name: Product Position, dtype: int64

Let's create binary variables. is_promotion equals one if the product is a promotion product and zero otherwise. Same goes for is_seasonal and is_premium.

In [5]:
zd['is_promotion'] = zd['Promotion'].map({'Yes': 1, 'No': 0})
zd['is_seasonal'] = zd['Seasonal'].map({'Yes': 1, 'No': 0})

In [6]:
zd['is_premium'] = zd['Product Position'].apply(lambda x: 1 if x in ['End-cap', 'Front of Store'] else 0)

In [7]:
zd.head()

Unnamed: 0,Product ID,Product Position,Promotion,Product Category,Seasonal,Sales Volume,brand,url,sku,name,description,price,currency,scraped_at,terms,section,is_promotion,is_seasonal,is_premium
0,185102,Aisle,No,Clothing,No,2823,Zara,https://www.zara.com/us/en/basic-puffer-jacket...,272145190-250-2,BASIC PUFFER JACKET,Puffer jacket made of tear-resistant ripstop f...,19.99,USD,2024-02-19T08:50:05.654618,jackets,MAN,0,0,0
1,188771,Aisle,No,Clothing,No,654,Zara,https://www.zara.com/us/en/tuxedo-jacket-p0889...,324052738-800-46,TUXEDO JACKET,Straight fit blazer. Pointed lapel collar and ...,169.0,USD,2024-02-19T08:50:06.590930,jackets,MAN,0,0,0
2,180176,End-cap,Yes,Clothing,Yes,2220,Zara,https://www.zara.com/us/en/slim-fit-suit-jacke...,335342680-800-44,SLIM FIT SUIT JACKET,Slim fit jacket. Notched lapel collar. Long sl...,129.0,USD,2024-02-19T08:50:07.301419,jackets,MAN,1,1,1
3,112917,Aisle,Yes,Clothing,Yes,1568,Zara,https://www.zara.com/us/en/stretch-suit-jacket...,328303236-420-44,STRETCH SUIT JACKET,Slim fit jacket made of viscose blend fabric. ...,129.0,USD,2024-02-19T08:50:07.882922,jackets,MAN,1,1,0
4,192936,End-cap,No,Clothing,Yes,2942,Zara,https://www.zara.com/us/en/double-faced-jacket...,312368260-800-2,DOUBLE FACED JACKET,Jacket made of faux leather faux shearling wit...,139.0,USD,2024-02-19T08:50:08.453847,jackets,MAN,0,1,1


In [8]:
summary = zd.groupby('Product Position')['Sales Volume'].mean().sort_values(ascending=False)
print("Average Sales by Position:")
print(summary)

Average Sales by Position:
Product Position
Front of Store    1873.144928
Aisle             1828.824742
End-cap           1778.255814
Name: Sales Volume, dtype: float64


In [9]:
summary = zd.groupby('Product Position')['price'].mean().sort_values(ascending=False)
print("Average Price by Position:")
print(summary)

Average Price by Position:
Product Position
Front of Store    88.893478
Aisle             88.785773
End-cap           81.276395
Name: price, dtype: float64


In [10]:
import statsmodels.formula.api as smf

In [11]:
zd.columns = [c.replace(' ', '_').lower() for c in zd.columns]

In [13]:
##Simple OLS with controls 

model_with_controls = smf.ols(
    'sales_volume ~ C(product_position) + price + is_promotion + is_seasonal + C(terms)', 
    data=zd
).fit()

##

In [14]:
print(model_with_controls.summary())

                            OLS Regression Results                            
Dep. Variable:           sales_volume   R-squared:                       0.021
Model:                            OLS   Adj. R-squared:                 -0.015
Method:                 Least Squares   F-statistic:                    0.5752
Date:                Thu, 19 Feb 2026   Prob (F-statistic):              0.817
Time:                        10:51:48   Log-Likelihood:                -2004.4
No. Observations:                 252   AIC:                             4029.
Df Residuals:                     242   BIC:                             4064.
Df Model:                           9                                         
Covariance Type:            nonrobust                                         
                                            coef    std err          t      P>|t|      [0.025      0.975]
--------------------------------------------------------------------------------------------------------

In [15]:
zd['log_sales'] = np.log(zd['sales_volume'])
zd['log_price'] = np.log(zd['price'])

In [17]:
#OLS with log price and sales

log_model_with_controls = smf.ols(
    'log_sales ~ C(product_position) + log_price + is_promotion + is_seasonal + C(terms)', 
    data=zd
).fit()
print(log_model_with_controls.summary())

                            OLS Regression Results                            
Dep. Variable:              log_sales   R-squared:                       0.021
Model:                            OLS   Adj. R-squared:                 -0.016
Method:                 Least Squares   F-statistic:                    0.5635
Date:                Thu, 19 Feb 2026   Prob (F-statistic):              0.826
Time:                        11:10:40   Log-Likelihood:                -152.99
No. Observations:                 252   AIC:                             326.0
Df Residuals:                     242   BIC:                             361.3
Df Model:                           9                                         
Covariance Type:            nonrobust                                         
                                            coef    std err          t      P>|t|      [0.025      0.975]
--------------------------------------------------------------------------------------------------------

In [20]:
#interaction: on-promotion*position

model_interaction = smf.ols(
    'log_sales ~ is_promotion * C(product_position) + log_price + is_seasonal + C(terms)', 
    data=zd
).fit()
print(model_interaction.summary())

                            OLS Regression Results                            
Dep. Variable:              log_sales   R-squared:                       0.031
Model:                            OLS   Adj. R-squared:                 -0.014
Method:                 Least Squares   F-statistic:                    0.6935
Date:                Thu, 19 Feb 2026   Prob (F-statistic):              0.744
Time:                        11:23:49   Log-Likelihood:                -151.66
No. Observations:                 252   AIC:                             327.3
Df Residuals:                     240   BIC:                             369.7
Df Model:                          11                                         
Covariance Type:            nonrobust                                         
                                                         coef    std err          t      P>|t|      [0.025      0.975]
-------------------------------------------------------------------------------------------

In [21]:
#interaction: seasonal*position
model_interaction = smf.ols(
    'log_sales ~ is_seasonal * C(product_position) + log_price + is_promotion + C(terms)', 
    data=zd
).fit()
print(model_interaction.summary())

                            OLS Regression Results                            
Dep. Variable:              log_sales   R-squared:                       0.027
Model:                            OLS   Adj. R-squared:                 -0.018
Method:                 Least Squares   F-statistic:                    0.5994
Date:                Thu, 19 Feb 2026   Prob (F-statistic):              0.829
Time:                        11:26:28   Log-Likelihood:                -152.19
No. Observations:                 252   AIC:                             328.4
Df Residuals:                     240   BIC:                             370.7
Df Model:                          11                                         
Covariance Type:            nonrobust                                         
                                                        coef    std err          t      P>|t|      [0.025      0.975]
--------------------------------------------------------------------------------------------

In [27]:
log_model_with_controls = smf.ols(
    'log_sales ~ is_premium + log_price + is_promotion + is_seasonal + C(terms)', 
    data=zd
).fit()
print(log_model_with_controls.summary())

                            OLS Regression Results                            
Dep. Variable:              log_sales   R-squared:                       0.020
Model:                            OLS   Adj. R-squared:                 -0.012
Method:                 Least Squares   F-statistic:                    0.6269
Date:                Thu, 19 Feb 2026   Prob (F-statistic):              0.755
Time:                        12:05:24   Log-Likelihood:                -153.03
No. Observations:                 252   AIC:                             324.1
Df Residuals:                     243   BIC:                             355.8
Df Model:                           8                                         
Covariance Type:            nonrobust                                         
                           coef    std err          t      P>|t|      [0.025      0.975]
----------------------------------------------------------------------------------------
Intercept                7.8161 

In [28]:
#interaction:promotion*premium
log_model_with_controls = smf.ols(
    'log_sales ~ is_premium*is_promotion + log_price + is_seasonal + C(terms)', 
    data=zd
).fit()
print(log_model_with_controls.summary())

                            OLS Regression Results                            
Dep. Variable:              log_sales   R-squared:                       0.021
Model:                            OLS   Adj. R-squared:                 -0.015
Method:                 Least Squares   F-statistic:                    0.5797
Date:                Thu, 19 Feb 2026   Prob (F-statistic):              0.813
Time:                        12:05:53   Log-Likelihood:                -152.92
No. Observations:                 252   AIC:                             325.8
Df Residuals:                     242   BIC:                             361.1
Df Model:                           9                                         
Covariance Type:            nonrobust                                         
                              coef    std err          t      P>|t|      [0.025      0.975]
-------------------------------------------------------------------------------------------
Intercept                 

In [29]:
#interaction:seasonal*premium
log_model_with_controls = smf.ols(
    'log_sales ~ is_seasonal*is_promotion + log_price + is_promotion + C(terms)', 
    data=zd
).fit()
print(log_model_with_controls.summary())

                            OLS Regression Results                            
Dep. Variable:              log_sales   R-squared:                       0.020
Model:                            OLS   Adj. R-squared:                 -0.012
Method:                 Least Squares   F-statistic:                    0.6305
Date:                Thu, 19 Feb 2026   Prob (F-statistic):              0.752
Time:                        12:07:12   Log-Likelihood:                -153.01
No. Observations:                 252   AIC:                             324.0
Df Residuals:                     243   BIC:                             355.8
Df Model:                           8                                         
Covariance Type:            nonrobust                                         
                               coef    std err          t      P>|t|      [0.025      0.975]
--------------------------------------------------------------------------------------------
Intercept               

In [30]:
#interaction:terms*premium
log_model_with_controls = smf.ols(
    'log_sales ~ is_premium*C(terms) + log_price + is_seasonal + is_promotion + C(terms)', 
    data=zd
).fit()
print(log_model_with_controls.summary())

                            OLS Regression Results                            
Dep. Variable:              log_sales   R-squared:                       0.030
Model:                            OLS   Adj. R-squared:                 -0.018
Method:                 Least Squares   F-statistic:                    0.6215
Date:                Thu, 19 Feb 2026   Prob (F-statistic):              0.823
Time:                        12:43:38   Log-Likelihood:                -151.73
No. Observations:                 252   AIC:                             329.5
Df Residuals:                     239   BIC:                             375.3
Df Model:                          12                                         
Covariance Type:            nonrobust                                         
                                      coef    std err          t      P>|t|      [0.025      0.975]
---------------------------------------------------------------------------------------------------
Intercept 

In [33]:
#elasticity for different products
model_marginal = smf.ols(
    'log_sales ~ is_promotion * C(terms) + log_price + is_seasonal', 
    data=zd
).fit()
print(model_marginal.summary())

                            OLS Regression Results                            
Dep. Variable:              log_sales   R-squared:                       0.033
Model:                            OLS   Adj. R-squared:                 -0.011
Method:                 Least Squares   F-statistic:                    0.7457
Date:                Thu, 19 Feb 2026   Prob (F-statistic):              0.694
Time:                        13:08:15   Log-Likelihood:                -151.37
No. Observations:                 252   AIC:                             326.7
Df Residuals:                     240   BIC:                             369.1
Df Model:                          11                                         
Covariance Type:            nonrobust                                         
                                        coef    std err          t      P>|t|      [0.025      0.975]
-----------------------------------------------------------------------------------------------------
Interc

In [35]:
summary = zd.groupby('terms')['sales_volume'].mean().sort_values(ascending=False)
print("Average price by terms:")
print(summary)

Average price by terms:
terms
shoes       1867.935484
jackets     1853.342857
sweaters    1835.170732
t-shirts    1676.156250
jeans       1665.000000
Name: sales_volume, dtype: float64


In [36]:
#elasticity for different products
model_marginal = smf.ols(
    'log_sales ~ is_seasonal * C(terms) + log_price + is_promotion', 
    data=zd
).fit()
print(model_marginal.summary())

                            OLS Regression Results                            
Dep. Variable:              log_sales   R-squared:                       0.037
Model:                            OLS   Adj. R-squared:                 -0.007
Method:                 Least Squares   F-statistic:                    0.8439
Date:                Thu, 19 Feb 2026   Prob (F-statistic):              0.596
Time:                        13:24:22   Log-Likelihood:                -150.82
No. Observations:                 252   AIC:                             325.6
Df Residuals:                     240   BIC:                             368.0
Df Model:                          11                                         
Covariance Type:            nonrobust                                         
                                       coef    std err          t      P>|t|      [0.025      0.975]
----------------------------------------------------------------------------------------------------
Intercep