# Sales Data Simulation and Analysis

**Scenario: You are a junior data analyst at an online retail company. Your manager wants to understand daily sales patterns. You need to simulate a month's worth of sales data and then use NumPy to extract insights about sales performance.**

In [116]:
import numpy as np
np.random.rand(100)

array([0.99940426, 0.09753425, 0.79456672, 0.82864675, 0.23285793,
       0.97685853, 0.13635933, 0.6753885 , 0.73428336, 0.49847251,
       0.39253266, 0.12996685, 0.42930479, 0.74045669, 0.59999004,
       0.20937775, 0.22412009, 0.28046453, 0.42285425, 0.69791041,
       0.09479626, 0.68440971, 0.41908736, 0.4464365 , 0.90113224,
       0.01934634, 0.91886811, 0.7443973 , 0.8790782 , 0.23896968,
       0.23021525, 0.66038315, 0.7029805 , 0.45925823, 0.70709092,
       0.68243358, 0.63065687, 0.83687274, 0.26491069, 0.89776469,
       0.4725346 , 0.12727415, 0.48493505, 0.97488529, 0.85831933,
       0.46922702, 0.43900809, 0.98384121, 0.34246639, 0.2068469 ,
       0.69541023, 0.21467665, 0.84952734, 0.37258057, 0.81917157,
       0.1715936 , 0.44411488, 0.027782  , 0.74774155, 0.72091612,
       0.33419896, 0.07136314, 0.7978935 , 0.26931099, 0.46637646,
       0.55507008, 0.04014182, 0.28667032, 0.28755867, 0.86225977,
       0.95512551, 0.71696424, 0.66965781, 0.04054911, 0.51667

### 1.Sales Data Generation: 

###### 1.1. Simulate daily sales_revenue for 30 days. Assume a base daily revenue (e.g., $1000) with random fluctuations. Use np.random.rand() or np.random.normal() to add variability. Ensure no negative sales.

In [117]:
base_daily_revenue = 1000
days = 30

random_fluctuations = np.random.uniform(0, 200, days)
sales_revenue = np.maximum(base_daily_revenue + random_fluctuations, 0)
print("Simulate daily sales_revenue for the first 5 days:\n", np.round(sales_revenue[:days], 0))

Simulate daily sales_revenue for the first 5 days:
 [1194. 1018. 1192. 1113. 1131. 1069. 1095. 1004. 1001. 1137. 1082. 1105.
 1026. 1072. 1031. 1053. 1086. 1116. 1188. 1078. 1031. 1169. 1045. 1162.
 1098. 1188. 1106. 1175. 1056. 1045.]


###### 2.1.Simulate units_sold for the same 30 days, correlated with sales revenue but with its own random fluctuations.

In [118]:
units_sold = (sales_revenue / 10) + np.random.normal(0, 5, days)
units_sold = np.maximum(units_sold, 0).astype(int)
print("The Unit Sold = \n", units_sold[:days])

The Unit Sold = 
 [120 101 113 107 114 112 111  92  94 111 108 113 101 105  98 100 102 111
 121 108  94 106 112 114 116 129 109 123 105 100]


###### 3.1. Create two 1D NumPy arrays, one for sales_revenue and one for units_sold.

In [119]:
print("Sales_revenue = \n\n", np.round(sales_revenue[:days], 0),"\n")
print("Unit Sold = \n\n", units_sold[:days], "\n")

Sales_revenue = 

 [1194. 1018. 1192. 1113. 1131. 1069. 1095. 1004. 1001. 1137. 1082. 1105.
 1026. 1072. 1031. 1053. 1086. 1116. 1188. 1078. 1031. 1169. 1045. 1162.
 1098. 1188. 1106. 1175. 1056. 1045.] 

Unit Sold = 

 [120 101 113 107 114 112 111  92  94 111 108 113 101 105  98 100 102 111
 121 108  94 106 112 114 116 129 109 123 105 100] 



### 2.Combine Data: 
 

###### 1.2. Create a 2D NumPy array where the first column is sales_revenue and the second is units_sold.

In [120]:
combined_data = np.column_stack((sales_revenue , units_sold))
print("Combined Data :\n\n", combined_data[:days])

Combined Data :

 [[1194.42743434  120.        ]
 [1018.42464307  101.        ]
 [1192.18731621  113.        ]
 [1113.37579938  107.        ]
 [1131.3454966   114.        ]
 [1068.62893243  112.        ]
 [1094.72633473  111.        ]
 [1004.11025777   92.        ]
 [1000.59003259   94.        ]
 [1136.68415636  111.        ]
 [1081.59385163  108.        ]
 [1104.74288845  113.        ]
 [1025.98571627  101.        ]
 [1072.1316489   105.        ]
 [1031.21610844   98.        ]
 [1053.19990878  100.        ]
 [1085.51702018  102.        ]
 [1115.71096089  111.        ]
 [1187.99354083  121.        ]
 [1078.47347789  108.        ]
 [1031.24098982   94.        ]
 [1168.58726763  106.        ]
 [1044.78886655  112.        ]
 [1161.8992608   114.        ]
 [1097.73727153  116.        ]
 [1188.41504544  129.        ]
 [1105.65615654  109.        ]
 [1175.2729335   123.        ]
 [1055.61455212  105.        ]
 [1044.65447987  100.        ]]


### 3.Key Performance Indicators (KPIs): 

###### 1.3.Calculate the total sales_revenue for the month.

In [121]:
sales_revenue = combined_data[:, 0]
units_sold = combined_data[:, 1]
total_revenue = sales_revenue.sum()
print("Total Sales Revenue = ",total_revenue,"$")

Total Sales Revenue =  32864.93234955431 $


###### 1.3. Calculate the average units_sold per day.

In [122]:
average_units_sold = units_sold.mean()
print("Average Units Sold per Day = ",average_units_sold)

Average Units Sold per Day =  108.33333333333333


###### 2.3. Determine the maximum daily sales_revenue and the day (index) it occurred.

In [123]:
max_revenue = sales_revenue.max()
max_day_index = sales_revenue.argmax()
print("Maximum Daily Sales Revenue =",max_revenue ,"$","\nDay =",max_day_index + 1)

Maximum Daily Sales Revenue = 1194.4274343399463 $ 
Day = 1


###### 3.3. Calculate the average revenue per unit sold for the entire month (total revenue / total units sold).

In [124]:
total_units = units_sold.sum()
avg_revenue_per_unit = total_revenue / total_units
print("Average Revenue per Unit Sold =",avg_revenue_per_unit, "$")

Average Revenue per Unit Sold = 10.11228687678594 $


### 4. Conditional Analysis: 

###### 1.4. Identify and count how many days had sales_revenue above a certain target (e.g., $1200).

In [125]:
target_revenue = 1200
high_revenue_days = sales_revenue > target_revenue
count_high_days = np.sum(high_revenue_days)
days_above_target = np.where(high_revenue_days)[0] + 1
print("Number of days with sales revenue >" ,target_revenue,"$:", count_high_days)
print("Days with high revenue = ",days_above_target)

Number of days with sales revenue > 1200 $: 0
Days with high revenue =  []


###### 2.4. Calculate the average units_sold only for days when sales_revenue was below a certain threshold (e.g., $900).

In [126]:
threshold = 900
low_revenue_days = sales_revenue < threshold
units_on_low_days = units_sold[low_revenue_days]

if len(units_on_low_days) > 0:
    avg_units_low_days = units_on_low_days.mean()
    print("Average units sold on days with sales revenue <",threshold ,"$:", avg_units_low_days)
else:
    print("No days with sales revenue below",threshold,"$")


No days with sales revenue below 900 $


### 5. Weekly Aggregations: 

###### 1.5. Assume the 30 days start on a Monday. Calculate the total sales_revenue for each of the 4 full weeks (days 1-7, 8-14, 15-21, 22-28). You will need to reshape or carefully slice your data.

In [127]:
week1_total = sales_revenue[0:7].sum()
week2_total = sales_revenue[7:14].sum()
week3_total = sales_revenue[14:21].sum()
week4_total = sales_revenue[21:28].sum()

print("Week 1 Total Revenue=",week1_total,"$")
print("\nWeek 2 Total Revenue=",week2_total,"$")
print("\nWeek 3 Total Revenue=",week3_total,"$")
print("\nWeek 4 Total Revenue=",week4_total,"$")

Week 1 Total Revenue= 7813.115956763169 $

Week 2 Total Revenue= 7425.838551960793 $

Week 3 Total Revenue= 7583.352006845067 $

Week 4 Total Revenue= 7942.356801993062 $
