# Example  1: sample theory

The problem is calculating the average age of people in a district. The total population is 10000.

In [1]:
import numpy as np

In [2]:
# 10000 random persons between the ages of 0 and 80 were assigned.
population = np.random.randint(0,80,10000)
population[0:10]

array([16, 58, 45, 67, 75, 42, 46, 69, 59, 40])

In [3]:
# sample drawing
np.random.seed(10) #ensures that the same sample is drawn every time the function runs.
sample = np.random.choice(a=population, size=100)  #sample(observation) = 100
sample[0:10]

array([42, 41, 16, 37, 24, 29, 31, 29, 54,  7])

In [4]:
print(sample.mean())
print(population.mean())

44.99
39.458


In [5]:
#sample distribution
np.random.seed(10)
sample1  = np.random.choice(a=population, size=100)
sample2  = np.random.choice(a=population, size=100)
sample3  = np.random.choice(a=population, size=100)
sample4  = np.random.choice(a=population, size=100)
sample5  = np.random.choice(a=population, size=100)
sample6  = np.random.choice(a=population, size=100)
sample7  = np.random.choice(a=population, size=100)
sample8  = np.random.choice(a=population, size=100)
sample9  = np.random.choice(a=population, size=100)
sample10 = np.random.choice(a=population, size=100)

In [6]:
( sample1.mean() + sample2.mean() + sample3.mean() + sample4.mean()  + sample4.mean()+ sample5.mean()
+ sample6.mean() + sample7.mean() + sample8.mean() + sample9.mean() + sample10.mean() ) / 10

42.54299999999999

### Descriptive Statistics

 **N:** number of observations

 **SD:** standard deviation

 **SE:** standard error

**Conf:** confidence interval

In [7]:
import seaborn as sns
tips = sns.load_dataset("tips")
df = tips.copy()
df.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.5,Male,No,Sun,Dinner,3
3,23.68,3.31,Male,No,Sun,Dinner,2
4,24.59,3.61,Female,No,Sun,Dinner,4


In [8]:
df.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
total_bill,244.0,19.785943,8.902412,3.07,13.3475,17.795,24.1275,50.81
tip,244.0,2.998279,1.383638,1.0,2.0,2.9,3.5625,10.0
size,244.0,2.569672,0.9511,1.0,2.0,2.0,3.0,6.0


In [9]:
#!pip install researchpy
import researchpy as rp

In [10]:
# for numeric variables
rp.summary_cont(df[["total_bill","tip","size"]])





Unnamed: 0,Variable,N,Mean,SD,SE,95% Conf.,Interval
0,total_bill,244.0,19.7859,8.9024,0.5699,18.6633,20.9086
1,tip,244.0,2.9983,1.3836,0.0886,2.8238,3.1728
2,size,244.0,2.5697,0.9511,0.0609,2.4497,2.6896


In [11]:
# for categorical variables
rp.summary_cat(df[["sex","smoker","day"]])

Unnamed: 0,Variable,Outcome,Count,Percent
0,sex,Male,157,64.34
1,,Female,87,35.66
2,smoker,No,151,61.89
3,,Yes,93,38.11
4,day,Sat,87,35.66
5,,Sun,76,31.15
6,,Thur,62,25.41
7,,Fri,19,7.79


In [12]:
# covariance calculation
df[["tip","total_bill"]].cov()

Unnamed: 0,tip,total_bill
tip,1.914455,8.323502
total_bill,8.323502,79.252939


In [13]:
# correlation calculation
df[["tip","total_bill"]].corr()

Unnamed: 0,tip,total_bill
tip,1.0,0.675734
total_bill,0.675734,1.0


# Example  2:  price strategy


It explains how to implement a business application using confidence intervals.

**Detail:**
There is a seller, a buyer, and a product.
Buyers are asked how much you would pay for the product.
It is desired to be found with the optimum price confidence interval.

In [14]:
# The price range that 1000 people are willing to pay for the product is randomly between 10 and 110.
prices = np.random.randint(10,110,1000)
prices.mean()

58.492

In [15]:
import statsmodels.stats.api as sms

In [16]:
# The price that users want to pay for the product is between 56 and 60 with 95% confidence.
sms.DescrStatsW(prices).tconfint_mean()

(56.67953887736034, 60.30446112263965)

# Example 3: bernoulli distribution

In this example, the probability of a coin toss is calculated.

In [17]:
from scipy.stats import bernoulli

In [18]:
p = 0.6            #probability of heads
ht = bernoulli(p)  #heads or tails
ht.pmf(k = 0)      #probability mass function (k=0, probability of tails)

0.4

# Example 4: law of large numbers

It is the probability theorem that describes the long-term stability of a random variable. 

**result:** as the number of experiments increases, the expected probabilistic expressions of the respective event occur.

In [19]:
import numpy as np 
rng = np.random.RandomState(123)
for i in np.arange(1,21):
    experiments = 2**i
    heads_tails = rng.randint(0,2, size = experiments)
    tails_probability = np.mean(heads_tails)
    print("Number of throwing:", experiments, "-->", "Tails probability: %2.f" % (tails_probability * 100))

Number of throwing: 2 --> Tails probability: 50
Number of throwing: 4 --> Tails probability:  0
Number of throwing: 8 --> Tails probability: 62
Number of throwing: 16 --> Tails probability: 44
Number of throwing: 32 --> Tails probability: 47
Number of throwing: 64 --> Tails probability: 56
Number of throwing: 128 --> Tails probability: 51
Number of throwing: 256 --> Tails probability: 53
Number of throwing: 512 --> Tails probability: 53
Number of throwing: 1024 --> Tails probability: 50
Number of throwing: 2048 --> Tails probability: 49
Number of throwing: 4096 --> Tails probability: 49
Number of throwing: 8192 --> Tails probability: 50
Number of throwing: 16384 --> Tails probability: 50
Number of throwing: 32768 --> Tails probability: 50
Number of throwing: 65536 --> Tails probability: 50
Number of throwing: 131072 --> Tails probability: 50
Number of throwing: 262144 --> Tails probability: 50
Number of throwing: 524288 --> Tails probability: 50
Number of throwing: 1048576 --> Tails pr

# Example 5:  ad spend optimization

Explaining how to implement a business application using the binomial distribution.

**Problem:** Advertising is given in various areas, the click-through and recycling rates of the ads are tried to be optimized.  In order to do this, it is required to calculate the probability of clicking on the ad according to various situations in a certain area.


**Detail:**

It will be advertised in a field.

The distribution and the probability of clicking on the ad are known. (0.01)

Question: What is the probability that the ad will be 1, 5, 10 clicks when 100 people see it?


In [20]:
from scipy.stats import binom

In [21]:
p = 0.01
n = 100
rv = binom(n,p)
print(rv.pmf(1))  #the possibility of seeing the adversiting for 1 person
print(rv.pmf(5))  #the possibility of seeing the adversiting for 5 person
print(rv.pmf(10)) #the possibility of seeing the adversiting for 10 person

0.36972963764971983
0.0028977871237616114
7.006035693977161e-08


# Example 6:  ad entry error probabilities

Explaining how to implement a business application using the poisson distribution.

**Problem:** 
The probability of wrong listing entries is being calculated.


**Detail:**

Measurements are made for a year.

The distribution is known (Poisson) and lambda 0.1 (mean number of errors).

What are the probabilities that there are no errors, 3 errors, and 5 errors?


In [22]:
from scipy.stats import poisson

In [23]:
lambda_ = 0.1
rv = poisson(mu = lambda_) 

print(rv.pmf(k = 0))   #probability of no errors
print(rv.pmf(k = 3))   #probability of three error
print(rv.pmf(k = 5))   #probability of five error

0.9048374180359595
0.00015080623633932676
7.54031181696634e-08


# Example 7:  calculation of sales probabilities


A business application is implemented using the normal distribution.

**Problem:** 
Before an investment/meeting, it is desired to determine the probability of realization of sales for the next month at certain values.


**Detail:**


The distribution is known to be normal.

The monthly average number of sales is 80K, the standard deviation is 5K.

What is the probability of selling more than 90K?

In [24]:
from scipy.stats import norm

In [25]:
# Probability of average monthly sales to be more than 90K
print(1-norm.cdf(90, 80, 5))   #1-cumulative density function 

# Probability More than 70
print(1-norm.cdf(70,80,5))

# Probability Less than 73
print(norm.cdf(73,80,5))

# Probability Between 85 and 90
print(norm.cdf(90,80,5) - norm.cdf(85,80,5))

0.02275013194817921
0.9772498680518208
0.08075665923377107
0.13590512198327787
