# Sales Force Training

**Company Background:**

Company X wants to increase their sales. Previous sales data shows that the average sale is $100 per transaction. After training the sales workers, the latest sales data (taken from a sample of 25 sales workers) is stored in the table below.

**Objective:**

Perform a hypothesis testing to prove if the training of sales workers is indeed affect in increasing the sales.

## Import Library & Generate Dataset

In [1]:
#Import Libraries
import pandas as pd
import numpy as np
import scipy.stats as stats

In [2]:
#Create DataFrame based on latest sales data
df = pd.DataFrame([100, 150, 50, 100, 130, 120,
                   100, 85, 70, 150, 150, 120,
                   50, 100, 100, 140, 90, 150,
                   50, 90, 120, 100, 110, 75, 65],
                  columns = ["TransactionAmount"])
df

Unnamed: 0,TransactionAmount
0,100
1,150
2,50
3,100
4,130
5,120
6,100
7,85
8,70
9,150


## Check Central Tendency & Variability

In [3]:
#Central Tendency
import statistics
print("Mean = ", statistics.mean(df["TransactionAmount"]))
print("Median = ", statistics.median(df["TransactionAmount"]))

Mean =  102.6
Median =  100


#### **Analysis on `Central Tendency`:**

From the results, it can be concluded that the data tends to be slightly skewed to the right (positive skewed), indicating that the values in the data tend to be higher than the average.

In [4]:
#Variability
import numpy as np
print("Variance = ", np.var(df["TransactionAmount"]))
print("Std Dev = ", np.std(df["TransactionAmount"]))
print("Range = ", np.max(df["TransactionAmount"]) - np.min(df["TransactionAmount"]))
print("Q1 = ", np.quantile(df["TransactionAmount"], 0.25))
print("Median = ", np.quantile(df["TransactionAmount"], 0.5))
print("Q3 = ", np.quantile(df["TransactionAmount"], 0.75))

Variance =  972.2399999999997
Std Dev =  31.180763300471007
Range =  100
Q1 =  85.0
Median =  100.0
Q3 =  120.0


#### **Analysis on `Variability`:**

From the results, it can be concluded that the data tends to be far apart from the average value with high variation. This is characterized by high var/std values and the range of data.

## Hypothesis Testing

#### Define H0 and H1:

* `H0: Avg sales = $100`
* `H1: Avg sales > $100`

#### Define alpha:

* `alpha = 0.05`

### T Statistics

In [5]:
#Calculate t statistics and P-value
#Perform one sample t-test

stats.ttest_1samp(a=df["TransactionAmount"], popmean=100)

TtestResult(statistic=0.4085001556802841, pvalue=0.6865284813438117, df=24)

#### Results:

* `t-statistic = 0.41`
* `p-value = 0.69`
* `degree of freedom = 24`

As the p-value is larger than alpha (5%), then we fail to reject H0

#### Conclusion:

Training salesman is not effective to increase the sales, as the avg sales is still at $100

**To make sure the validity of this results, let's check if t-stats is in critical region or not.**

In [6]:
#Conclude with t statistics and critical region
#Find the critical region

#H1: larger (right side)
stats.t.ppf(1-0.05, 24)

1.7108820799094275

#### Results:
* `Critical Region: t > 1.71`

As the t-stats (0.41) is not in the critical region (> 1.71), then we fail to reject H0

#### Conclusion:

Training salesman is indeed not effective to increase the sales, as the avg sales is still at $100