NumPy Data Distributions

What is Data Distribution?

Data distribution describes how values are spread in a dataset and shows the possible values and how frequently each value occurs.

In NumPy, the numpy.random module helps generate random samples that follow different probability distributions. These are widely used in:

Statistics

Data Science

Machine Learning

Simulation modeling

types of data distributions:
1.normal
2.binomial
3.multinomial
4.uniform
5.chisquare
6.rayleigh
7.pareto
8.zipf
9.poisson
10.logistic
11.exponential

 Types of Data Distributions in NumPy:

1️. Normal Distribution (Gaussian Distribution)

- Definition:

A continuous distribution that forms a bell-shaped curve where most values cluster around the mean.

- When to Use

Natural measurements (height, weight, marks)


Measurement errors


Machine learning data assumptions


- Parameters

loc → Mean (center)

scale → Standard deviation (spread)

size → Output shape


2️.  Binomial Distribution

- Definition:

A discrete distribution showing the number of successes in fixed trials with only two outcomes (success/failure).

- When to Use:

Tossing coins

Pass/fail results

Yes/No surveys

- Parameters

n → Number of trials

p → Probability of success

size → Output shape


3️. Multinomial Distribution:

-  Definition:

Extension of binomial distribution for more than two outcomes.

- When to Use:

Voting results

Product preference surveys

Market choice modeling

- Parameters

n → Total trials

pvals → Probability list

size → Output shape


4. Uniform Distribution:

- Definition:

All values within a range have equal probability of occurring.

- When to Use:

Random reminders

Simulation sampling

Gaming systems

- Parameters

low → Minimum value

high → Maximum value

size → Output shape

5️. Chi-Square Distribution

- Definition:

Used to measure variance and goodness of fit in statistical tests.

- When to Use:

Hypothesis testing

Feature selection

Variance analysis

- Parameters:

df → Degrees of freedom

size → Output shape

6️. Rayleigh Distribution

- Definition:

Used to model magnitude of vectors like wind speed or signal strength.

- When to Use:

Signal processing

Wind speed prediction

Radar modeling

- Parameters:

scale → Spread of distribution

size → Output shape

7. Pareto Distribution

- Definition:

Represents the 80-20 rule where a small portion contributes most results.

- When to Use:

Wealth distribution

Business profits

Website traffic

- Parameters:

a → Shape parameter

size → Output shape

8️. Zipf Distribution

- Definition:

Models rank-frequency relationships where few items occur very frequently.

- When to Use:

Word frequency in text

Website popularity

Social media trends

- Parameters:

a → Distribution parameter

size → Output shape


9. Poisson Distribution

- Definition:

Models the number of events occurring in a fixed interval.

- When to Use:

Customers per hour

Call center arrivals

Website hits

- Parameters:

lam → Average rate

size → Output shape

10. Logistic Distribution

- Definition:

Similar to normal distribution but with heavier tails.

- When to Use:

Logistic regression

Growth modeling

Machine learning classification

- Parameters:

loc → Mean

scale → Spread

size → Output shape

1️1. Exponential Distribution

- Definition:

Models time between events in a Poisson process.

- When to Use:

Time between customer arrivals

System failure prediction

Waiting time analysis

- Parameters:

scale → Mean time between events

size → Output shape

Tasks:

1. Generate 100 Random Temperatures (25°C to 40°C) — Uniform Distribution

In [1]:
import numpy as np

# Generate temperatures
temps = np.random.uniform(low=25, high=40, size=100)

print(temps)


[28.708965   32.84068026 34.49771739 31.52508744 25.53955224 31.74107239
 25.92513218 31.30881796 34.7334089  26.08760938 39.94671253 26.31053408
 30.03457248 26.06947405 30.12785083 28.58349279 36.9420386  35.19205744
 30.6166828  35.56802837 33.34348026 33.46042438 30.07695237 35.90494202
 34.61704938 30.02854582 27.70839696 34.96856289 27.42913745 34.24819454
 25.1683289  32.94533439 35.90082068 31.88242927 29.46523613 30.30561942
 33.38731817 34.8350117  34.10996738 33.9780028  27.77291388 30.00712628
 38.51335522 38.74912422 34.91763357 30.49824183 28.25307822 36.45304418
 35.58637258 28.5170649  31.09996463 37.6134319  26.11386933 32.27129956
 32.02119738 29.9270746  30.03995407 27.53678869 31.6630941  39.07605931
 36.80762592 25.37864554 29.32420918 34.2344578  35.83537122 39.1388501
 34.48188146 37.78562366 34.26138093 37.24397159 34.1639998  39.55776119
 38.11296337 32.61313589 39.73200324 28.10993673 32.56880696 37.15263411
 35.33808812 34.14942264 26.69223009 31.40876718 30.

2. Simulate 50 Student Marks — Normal Distribution

In [2]:
import numpy as np

marks = np.random.normal(loc=60, scale=10, size=50)

print(marks)


[63.40766872 72.19396349 68.2718624  59.43319974 47.2709065  65.68405606
 56.7930651  44.89288301 67.33969236 60.4670776  47.7880962  50.61821473
 66.08758594 53.33332961 53.26800255 55.36805012 61.06414198 58.13517703
 54.16346363 57.93916319 62.24784038 69.01517659 71.05500374 66.53011725
 57.67950143 39.41718456 65.54223698 39.73906373 55.69145264 42.6713531
 72.39986353 42.93625216 46.68072932 61.08006044 66.72885371 61.45202795
 55.94372087 54.73957099 61.28288235 56.60352794 69.43326005 69.12310225
 63.35229714 60.43844374 52.61167409 75.39240712 58.91847031 57.25744071
 57.69348582 47.22001592]


3. Simulate 100 Coin Toss Results — Binomial Distribution

In [3]:
import numpy as np

coin_toss = np.random.binomial(n=1, p=0.5, size=100)

print(coin_toss)


[1 0 0 1 1 0 1 0 1 1 0 1 0 0 0 1 1 1 1 1 1 0 1 1 1 1 0 0 1 1 1 0 0 1 0 0 0
 0 0 1 1 1 1 1 1 1 0 0 1 0 0 1 1 0 1 1 0 1 0 0 0 1 0 1 1 1 0 1 0 1 1 1 0 1
 1 0 1 0 0 1 0 1 0 0 1 1 0 0 0 0 0 0 1 1 0 0 0 1 0 0]


4. Customers Visiting Store per Hour — Poisson Distribution

In [4]:
import numpy as np

customers = np.random.poisson(lam=20, size=100)

print(customers)


[25 20 16 26 20 15 18 20 19 19 14 17 19 14 16 20 32 20 18 33 23 21 23 14
 28 22 20 17 27 18 22 16 21  9 19 26 27 15 18 25 19 23 16 18 19 17 22 17
 17 20 16 22 28 14 18 22 25 25 21 22 14 26 24 13 17 23 16 19 25 25 21 18
 17 15 21 20 23 22 19 25 13 18 27 27 21 18 17 19 23 16 23 22 18 18 17 15
 19 22 15 21]


5. Generate 30 Waiting Times Between Customer Arrivals — Exponential Distribution

In [5]:
import numpy as np

waiting_times = np.random.exponential(scale=2, size=30)

print(waiting_times)


[0.93468005 2.69288009 1.05427298 0.60654444 1.26117981 0.09759692
 2.88411679 0.3479855  0.84282409 3.8468299  1.33293053 0.19401971
 3.89740033 5.94651956 4.45605618 0.06712425 0.79034926 0.57324889
 0.01681246 0.4026114  1.82856348 0.40175489 4.76212374 3.37009072
 0.29313691 0.5717684  3.06986946 0.45197615 0.2736143  0.51736914]


Mini tasks

1. Salary Analysis — Normal Distribution (Synthetic Data)

In [1]:
import numpy as np
salary = np.random.normal(loc=50000, scale=8000, size=100)

print("Mean Salary:", np.mean(salary))
print("Max Salary:", np.max(salary))
print("Total Salary:", np.sum(salary))


Mean Salary: 49155.616599718014
Max Salary: 68300.81447744416
Total Salary: 4915561.659971802


2. Website Visitors (365 Days) — Poisson Distribution

In [2]:
visitors = np.random.poisson(lam=500, size=365)

print("Average Visitors:", np.mean(visitors))
print("Peak Day Visitors:", np.max(visitors))


Average Visitors: 500.2931506849315
Peak Day Visitors: 562


3. Hospital Emergency Cases — Poisson Distribution

In [3]:
cases = np.random.poisson(lam=30, size=365)

print("Total Cases:", np.sum(cases))
print("Maximum Cases in a Day:", np.max(cases))


Total Cases: 11030
Maximum Cases in a Day: 45


4. Exam Pass/Fail — Binomial Distribution

In [4]:
results = np.random.binomial(n=1, p=0.7, size=100)

print("Total Passed:", np.sum(results))
print("Pass Percentage:", np.mean(results) * 100)


Total Passed: 70
Pass Percentage: 70.0


5. Product Selection — Multinomial Distribution

In [5]:
products = np.random.multinomial(n=100, pvals=[0.3, 0.25, 0.25, 0.2])

print("Product Selection [A,B,C,D]:", products)


Product Selection [A,B,C,D]: [28 24 26 22]


6. Server Response Time — Exponential Distribution

In [6]:
response = np.random.exponential(scale=2, size=100)

print("Average Response Time:", np.mean(response))
print("Max Response Time:", np.max(response))


Average Response Time: 2.119090676283461
Max Response Time: 10.958731653595752


7. Call Center Calls (24 Hours) — Poisson Distribution

In [7]:
calls = np.random.poisson(lam=40, size=24)

print("Total Calls:", np.sum(calls))
print("Peak Hour Calls:", np.max(calls))


Total Calls: 951
Peak Hour Calls: 54


8. Manufacturing Quality — Binomial Distribution

In [8]:
quality = np.random.binomial(n=1, p=0.95, size=100)

print("Good Items:", np.sum(quality))
print("Defective Items:", 100 - np.sum(quality))


Good Items: 93
Defective Items: 7


9. Bank Loan Approval — Binomial Distribution

In [9]:
loans = np.random.binomial(n=1, p=0.6, size=200)

print("Total Approved:", np.sum(loans))
print("Approval Percentage:", np.mean(loans) * 100)


Total Approved: 129
Approval Percentage: 64.5


Mini Project:

In [11]:
# MINI PROJECT: E-Commerce Business Data Simulation (30 Days)
# Using Only NumPy (Synthetic Data Generation)


import numpy as np


# 1. Generate Synthetic Data


# Daily customers visiting website (Poisson distribution)
customers = np.random.poisson(lam=120, size=30)

# Daily purchase amount (Normal distribution)
purchase_amount = np.random.normal(loc=500, scale=100, size=30)

# Product category selection (Multinomial distribution)
# 4 products: Electronics, Clothes, Groceries, Accessories
product_selection = np.random.multinomial(
    n=100,
    pvals=[0.3, 0.25, 0.25, 0.2],
    size=30
)

# Delivery time in days (Poisson distribution)
delivery_time = np.random.poisson(lam=3, size=30)

# Waiting time between orders (Exponential distribution)
waiting_time = np.random.exponential(scale=2, size=30)

# Product quality check (Binomial distribution)
# 1 = Good product, 0 = Defective product
quality = np.random.binomial(n=1, p=0.95, size=30)



# 2. Analysis of Data


print("------ E-COMMERCE DATA ANALYSIS (30 DAYS) ------")

# Customers Analysis
print("\nCUSTOMERS ANALYSIS")
print("Average Daily Customers:", np.mean(customers))
print("Maximum Customers in a Day:", np.max(customers))
print("Total Customers in 30 Days:", np.sum(customers))

# Purchase Analysis
print("\nPURCHASE ANALYSIS")
print("Average Purchase Amount:", np.mean(purchase_amount))
print("Total Revenue:", np.sum(purchase_amount))
print("Highest Purchase Amount:", np.max(purchase_amount))

# Product Selection Analysis
print("\nPRODUCT CATEGORY ANALYSIS")
total_products = np.sum(product_selection, axis=0)
print("Electronics Sold:", total_products[0])
print("Clothes Sold:", total_products[1])
print("Groceries Sold:", total_products[2])
print("Accessories Sold:", total_products[3])

# Delivery Analysis
print("\nDELIVERY TIME ANALYSIS")
print("Average Delivery Time:", np.mean(delivery_time))
print("Maximum Delivery Time:", np.max(delivery_time))

# Waiting Time Analysis
print("\nORDER WAITING TIME ANALYSIS")
print("Average Waiting Time:", np.mean(waiting_time))
print("Longest Waiting Time:", np.max(waiting_time))

# Product Quality Analysis
print("\nPRODUCT QUALITY ANALYSIS")
print("Good Products:", np.sum(quality))
print("Defective Products:", 30 - np.sum(quality))
print("Quality Percentage:", np.mean(quality) * 100)


------ E-COMMERCE DATA ANALYSIS (30 DAYS) ------

CUSTOMERS ANALYSIS
Average Daily Customers: 123.56666666666666
Maximum Customers in a Day: 146
Total Customers in 30 Days: 3707

PURCHASE ANALYSIS
Average Purchase Amount: 515.8161717320392
Total Revenue: 15474.485151961175
Highest Purchase Amount: 646.3302389426087

PRODUCT CATEGORY ANALYSIS
Electronics Sold: 924
Clothes Sold: 741
Groceries Sold: 738
Accessories Sold: 597

DELIVERY TIME ANALYSIS
Average Delivery Time: 2.6666666666666665
Maximum Delivery Time: 7

ORDER WAITING TIME ANALYSIS
Average Waiting Time: 1.7181877430729502
Longest Waiting Time: 4.688031807723898

PRODUCT QUALITY ANALYSIS
Good Products: 28
Defective Products: 2
Quality Percentage: 93.33333333333333
