In [1]:
import pandas as pd

In [2]:
toyota_sales = pd.read_csv('data/car_sales/toyota_sales_data.csv')

# Range - The Simplest Measure of Spread

**Range = Maximum - Minimum**

## Advantages:
- Very easy to calculate and understand
- Shows the full span of your data

## Limitations:
- Only uses 2 values (ignores everything in between)
- Heavily affected by outliers
- Doesn't tell you about data distribution

In [3]:
# Range for sale_amount
sale_min = toyota_sales['sale_amount'].min()
sale_max = toyota_sales['sale_amount'].max()
sale_range = sale_max - sale_min

In [4]:
print(f"Minimum sale: ${sale_min:,.2f}")
print(f"Maximum sale: ${sale_max:,.2f}")
print(f"Range: ${sale_range:,.2f}")

Minimum sale: $20,000.94
Maximum sale: $49,995.68
Range: $29,994.74


In [5]:
# Range by car model
range_by_model = toyota_sales.groupby('car_model')['sale_amount'].agg([
    ('Min', 'min'),
    ('Max', 'max'),
    ('Range', lambda x: x.max() - x.min()),
    ('Count', 'count')
]).round(0)

In [6]:

range_by_model.sort_values('Range', ascending=False)

Unnamed: 0_level_0,Min,Max,Range,Count
car_model,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Tundra,35027.0,49996.0,14969.0,817
Tacoma,30012.0,39994.0,9982.0,826
Highlander,35009.0,44976.0,9967.0,814
RAV4,27003.0,34965.0,7962.0,860
Corolla,20001.0,24996.0,4995.0,827
Camry,25006.0,29982.0,4976.0,856


## Range vs Standard Deviation: Key Differences

**Range:**
- Uses only 2 values (min and max)
- Affected heavily by single outlier
- Easy to calculate

**Standard Deviation:**
- Uses ALL values
- Shows typical variation from mean
- More informative about distribution

In [8]:
# Compare range and standard deviation
comparison = toyota_sales.groupby('car_model')['sale_amount'].agg([
    ('Mean', 'mean'),
    ('Range', lambda x: x.max() - x.min()),
    ('Std_Dev', 'std')
]).round(0)

In [9]:

comparison.sort_values('Range', ascending=False)

Unnamed: 0_level_0,Mean,Range,Std_Dev
car_model,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Tundra,42479.0,14969.0,4236.0
Tacoma,34998.0,9982.0,2907.0
Highlander,40003.0,9967.0,2912.0
RAV4,30986.0,7962.0,2356.0
Corolla,22442.0,4995.0,1460.0
Camry,27471.0,4976.0,1431.0


In [12]:
# Show why range can be misleading
print("=== CAMRY ===")
camry = toyota_sales[toyota_sales['car_model'] == 'Camry']['sale_amount']
print(f"Range: ${camry.max() - camry.min():,.2f}")
print(f"Std Dev: ${camry.std():,.2f}")
print(f"Mean: ${camry.mean():,.2f}")

=== CAMRY ===
Range: $4,975.67
Std Dev: $1,430.89
Mean: $27,470.87


In [None]:
# Show the distribution
print(f"\nValues within $2,000 of mean: {((camry >= camry.mean()-2000) & (camry <= camry.mean()+2000)).sum()}")
print(f"Total values: {len(camry)}")


Values within $2,000 of mean: 694
Total values: 856


## When to Use Range

**Range is useful for:**
- Quick, rough sense of data span
- Quality control (checking if values fall within acceptable limits)
- Simple reports for non-technical audiences

**Use Std Dev instead when:**
- You need to understand typical variation
- You want to account for all data points
- You're doing statistical analysis

In [14]:
# Practical example: Quality control check
print("=== QUALITY CONTROL CHECK ===")
print(f"Expected price range: $20,000 to $50,000")
print(f"Actual min: ${toyota_sales['sale_amount'].min():,.2f}")
print(f"Actual max: ${toyota_sales['sale_amount'].max():,.2f}")
print(f"""\nAll sales within expected range: {
    (toyota_sales['sale_amount'] >= 20000).all() 
    and (toyota_sales['sale_amount'] <= 50000).all()}""")

=== QUALITY CONTROL CHECK ===
Expected price range: $20,000 to $50,000
Actual min: $20,000.94
Actual max: $49,995.68

All sales within expected range: True


## Summary: Range

**Formula:** Max - Min

**Pros:**
- Easy to calculate and understand
- Shows full data span
- Good for quick checks

**Cons:**
- Only uses 2 values
- Ignores distribution of data
- Heavily affected by outliers

**Bottom line:** Use range for quick checks, use standard deviation for real analysis.

---

## Section 1 of Statistics Complete! ðŸŽ‰

You've now learned all the key descriptive statistics:
- **Central tendency:** Mean, Median, Mode
- **Spread:** Variance, Standard Deviation, Range
- **Summary:** .describe() method

**Next up:** Section 2 - Advanced Statistical Analysis!