# Proportional Stratified Sampling

**Proportional stratified sampling** is a sampling technique used in statistics where a population is divided into distinct subgroups or strata, and samples are drawn from each subgroup in proportion to its size relative to the overall population. The goal is to ensure that each subgroup is represented in the sample according to its actual proportion in the population, which helps improve the accuracy and representativeness of the sample.

**Key Steps:**

1. Identify Strata: The population is divided into strata based on specific characteristics (e.g., age, gender, income levels). Each stratum should be mutually exclusive.

2. Determine Sample Size for Each Stratum: The sample size for each stratum is proportional to its size in the overall population. For example, if a stratum makes up 30% of the population, it should also constitute 30% of the sample.

3. Random Sampling within Strata: Within each stratum, random sampling is conducted to select individuals for the sample.

**Example:**

Suppose a school has 1,000 students, and you want to survey 100 students. The students are divided into strata by grade:

    400 students in grade 9
    300 students in grade 10
    200 students in grade 11
    100 students in grade 12

The sample size from each grade will be proportional to the total number of students in each grade:

    Grade 9: 4001000×100=401000400​×100=40 students
    Grade 10: 3001000×100=301000300​×100=30 students
    Grade 11: 2001000×100=201000200​×100=20 students
    Grade 12: 1001000×100=101000100​×100=10 students

**Benefits:**

1. Increased precision: By ensuring each stratum is proportionally represented, the sample better reflects the population.
2. Improved accuracy: Especially useful when the population has distinct subgroups that may behave differently.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Load the dataset
wnba = pd.read_csv('wnba.csv')
df = pd.DataFrame(wnba)

stratum_1 = df[df['Games Played'] <= 12]
stratum_2 = df[(df['Games Played'] > 12) & (df['Games Played'] <= 22)]
stratum_3 = df[df['Games Played'] > 22]

sample_means = []

for i in range(100):
    sample_1 = stratum_1.sample(1, random_state=i)
    sample_2 = stratum_2.sample(2, random_state=i)
    sample_3 = stratum_3.sample(7, random_state=i)
    
    final_sample = pd.concat([sample_1, sample_2, sample_3])
    
    sample_mean = final_sample['PTS'].mean()
    
    # Append the mean to the list
    sample_means.append(sample_mean)
    
    
    
parametre = df['PTS'].mean() 

plt.scatter(np.arange(1, 101), sample_means)
plt.axhline(y=parametre)
plt.show()

This code snippet implements a proportional stratified sampling approach on a dataset containing information about WNBA players. It specifically aims to analyze how the average points scored (PTS) by a sample of players compares to the overall average points in the dataset. 