# Stratified Sampling

Stratified Sampling is a method of sampling where a population is divided into smaller subgroups, or "strata," that share similar characteristics. From each stratum, a sample is taken, and these samples are combined to form a complete sample that is representative of the entire population. This approach ensures that specific subgroups within a population are adequately represented in the sample.

**Key Characteristics:**

1. Strata: The population is divided into groups (strata) based on certain characteristics like age, gender, income level, etc.
2. Sampling within strata: After dividing into strata, a random sample is taken from each stratum. This can be done proportionally to the size of the         stratum relative to the population, or equal-sized samples can be taken.
3. Representation: This method is used to ensure that minority groups or specific segments of the population are not underrepresented in the sample.

**Example:**

Imagine you're conducting a survey on a population that consists of 60% men and 40% women. If you were to take a random sample without stratifying, you might end up with a sample that's not proportionally representative (e.g., too many men or too many women). By using stratified sampling, you would:

1. Divide the population into two strata: men and women.
2. Take a sample from each stratum proportionally to its size in the population (e.g., 60% of the sample from men and 40% from women).

**Types of Stratified Sampling:**

1. Proportional Stratified Sampling: The sample size from each stratum is proportional to the size of the stratum in the population.
2. Disproportional (or Equal) Stratified Sampling: An equal number of samples is taken from each stratum, regardless of the stratum's size.

**Advantages:**

1. Improved representativeness: Ensures that all subgroups are fairly represented.
2. Increased precision: Since the variability within each stratum is lower than in the whole population, stratified sampling can provide more precise estimates.
3. More informative: Allows for better analysis of subgroups.

**When to Use Stratified Sampling:**

1. When the population has distinct subgroups.
2. When you want to ensure that all subgroups are represented in the sample.
3. When the population is heterogeneous (i.e., diverse), and you want to account for that diversity in your sampling.

In [None]:
import pandas as pd

# Load dataset
data = pd.read_csv('wnba.csv')
df = pd.DataFrame(data)

# Create a new column for points per game
df['points_per_game'] = df['PTS'] / df['Games Played']

# Get unique player positions
positions = df['Pos'].unique()

# Dictionary to store the mean points per game for each position
mean_points_per_position = {}

# Loop through each position and calculate mean points per game
for p in positions:
    # Filter the dataset for the current position
    stratum = df[df['Pos'] == p]
    
    # Sample 10 players from the stratum
    sample = stratum.sample(10, random_state=0)
    
    # Calculate the mean points per game for the sample
    mean_points = sample['points_per_game'].mean()
    
    # Store the mean points in the dictionary
    mean_points_per_position[p] = mean_points

# Find the position with the highest mean points per game
position_most_points = max(mean_points_per_position, key=mean_points_per_position.get)

# Print the result
print(f'The position that scores the highest points per game is: {position_most_points}')

**Summary:**

This code is performing stratified sampling by dividing the dataset into groups (strata) based on player positions and sampling 10 players from each position. It calculates the average points per game for these samples and determines which position has the highest average points per game.
Key Concepts:

    Stratified Sampling: The dataset is divided by position (strata) before sampling.
    Random Sampling: A sample of 10 players is randomly taken from each stratum (position).
    Mean Calculation: The mean points per game are calculated for each sample of players.
    Dictionary Usage: The mean points are stored in a dictionary for easy comparison across positions.