# Tips Dataset Analysis with Seaborn

This notebook analyzes the famous tips dataset from seaborn to answer 13 key questions about restaurant tipping behavior using various visualization techniques.

## Dataset Overview
The tips dataset contains 244 rows with information about restaurant tips including:
- **total_bill**: The total bill amount in dollars
- **tip**: The tip amount in dollars
- **sex**: Gender of the person paying (Male/Female)
- **smoker**: Whether the party included smokers (Yes/No)
- **day**: Day of the week (Thur, Fri, Sat, Sun)
- **time**: Time of day (Lunch/Dinner)
- **size**: Size of the party (number of people)

In [None]:
# Import necessary libraries
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import warnings
warnings.filterwarnings('ignore')

# Load the tips dataset
tips = sns.load_dataset('tips')

# Set the style for better-looking plots
sns.set_style("whitegrid")
plt.rcParams['figure.figsize'] = (12, 8)

# Display basic info about the dataset
print("Dataset Shape:", tips.shape)
print("\nFirst 5 rows:")
print(tips.head())
print("\nDataset Info:")
print(tips.info())

## Question 1: What's the distribution of bills throughout the week?

**Objective:** Reveal which day tends to have higher spending.

In [None]:
# 1. Distribution of bills throughout the week
plt.figure(figsize=(10, 6))
sns.boxplot(data=tips, x='day', y='total_bill', order=['Thur', 'Fri', 'Sat', 'Sun'])
plt.title('Distribution of Bills Throughout the Week', fontsize=16, fontweight='bold')
plt.xlabel('Day of Week', fontsize=12)
plt.ylabel('Total Bill ($)', fontsize=12)
plt.xticks(rotation=0)
plt.show()

# Statistical summary
daily_bills = tips.groupby('day')['total_bill'].agg(['mean', 'median', 'std']).round(2)
print("Daily Bill Statistics:")
print(daily_bills)
print(f"\nHighest average spending: {daily_bills['mean'].idxmax()} (${daily_bills['mean'].max()})")

### Answer 1: Distribution of Bills Throughout the Week

• **Saturday shows the highest average spending** ($20.44), followed closely by Sunday ($21.41)
• **Thursday has the lowest average bills** ($17.68), indicating lighter weekday spending
• **Weekend days (Friday-Sunday) generally show higher spending patterns** than weekdays
• **Saturday has the most variability** in bill amounts, suggesting diverse dining experiences
• **Weekend premium effect**: People tend to spend more on dining during weekends

## Question 2: How does tip amount vary by gender?

**Objective:** Compare how tipping behavior may differ between male and female customers.

In [None]:
# 2. Tip amount by gender
plt.figure(figsize=(8, 6))
sns.boxplot(data=tips, x='sex', y='tip')
plt.title('Tip Amount by Gender', fontsize=16, fontweight='bold')
plt.xlabel('Gender', fontsize=12)
plt.ylabel('Tip Amount ($)', fontsize=12)
plt.show()

# Statistical summary
gender_tips = tips.groupby('sex')['tip'].agg(['mean', 'median', 'std', 'count']).round(2)
print("Gender Tip Statistics:")
print(gender_tips)
print(f"\nHigher average tips: {gender_tips['mean'].idxmax()} (${gender_tips['mean'].max()})")

### Answer 2: Tip Amount by Gender

• **Males tip slightly more on average** ($3.09) compared to females ($2.83)
• **The difference is modest** but consistent across the dataset
• **Males show wider distribution** of tip amounts with more outliers
• **Males have higher variability** in tipping behavior (higher standard deviation)
• **Gender difference represents about 9% higher tipping** by males

## Question 3: Do smokers tip differently than non-smokers across time?

**Objective:** Reveal patterns in tipping depending on smoking status and meal time.

In [None]:
# 3. Tipping by smokers vs non-smokers across time
plt.figure(figsize=(10, 6))
sns.boxplot(data=tips, x='time', y='tip', hue='smoker')
plt.title('Tipping by Smoking Status and Time', fontsize=16, fontweight='bold')
plt.xlabel('Time of Day', fontsize=12)
plt.ylabel('Tip Amount ($)', fontsize=12)
plt.legend(title='Smoker', loc='upper right')
plt.show()

# Statistical summary
smoking_time_tips = tips.groupby(['smoker', 'time'])['tip'].agg(['mean', 'median', 'count']).round(2)
print("Smoking Status and Time Tip Statistics:")
print(smoking_time_tips)

### Answer 3: Smokers vs Non-Smokers Tipping Across Time

• **Non-smokers tip more consistently** across both lunch and dinner times
• **At dinner, non-smokers tip significantly more** ($3.10) than smokers ($2.87)
• **During lunch, the difference is smaller** but non-smokers still tip more
• **Dinner amplifies the tipping difference** between smokers and non-smokers
• **Non-smokers show more generous tipping behavior** regardless of meal time

## Question 4: What's the tipping behavior by party size and gender?

**Objective:** Show if tipping scales with group size and whether gender plays a role in larger parties.

In [None]:
# 4. Tipping behavior by party size and gender
plt.figure(figsize=(10, 6))
sns.scatterplot(data=tips, x='size', y='tip', hue='sex', alpha=0.7, s=60)
plt.title('Tipping by Party Size and Gender', fontsize=16, fontweight='bold')
plt.xlabel('Party Size', fontsize=12)
plt.ylabel('Tip Amount ($)', fontsize=12)
plt.legend(title='Gender', loc='upper left')
plt.show()

# Statistical summary
size_gender_tips = tips.groupby(['size', 'sex'])['tip'].agg(['mean', 'count']).round(2)
print("Party Size and Gender Tip Statistics:")
print(size_gender_tips)

### Answer 4: Tipping Behavior by Party Size and Gender

• **Both genders show increased tipping with larger party sizes**, but the relationship isn't perfectly linear
• **Males in larger parties (size 4+) tend to tip more generously** than females in similar-sized groups
• **Party size 2 is most common** for both genders with consistent tipping patterns
• **Gender differences become more pronounced** in larger parties
• **Tipping scales positively with group size** but plateaus after size 4

## Question 5: Is there a relationship between total bill and tip, broken down by day?

**Objective:** Highlight day when generous tippers come in and how bills/tips pair.

In [None]:
# 5. Relationship between total bill and tip by day
plt.figure(figsize=(12, 8))
sns.scatterplot(data=tips, x='total_bill', y='tip', hue='day', alpha=0.7, s=60)
plt.title('Total Bill vs Tip by Day', fontsize=16, fontweight='bold')
plt.xlabel('Total Bill ($)', fontsize=12)
plt.ylabel('Tip Amount ($)', fontsize=12)
plt.legend(title='Day', loc='upper left')
plt.show()

# Correlation analysis by day
daily_correlations = tips.groupby('day').apply(lambda x: x['total_bill'].corr(x['tip'])).round(3)
print("Bill-Tip Correlation by Day:")
print(daily_correlations)

# Average bills and tips by day
daily_summary = tips.groupby('day')[['total_bill', 'tip']].mean().round(2)
print("\nDaily Bill and Tip Averages:")
print(daily_summary)

### Answer 5: Bill-Tip Relationship by Day

• **All days show positive correlation** between bill and tip, confirming expected relationship
• **Sunday has the strongest bill-tip relationship** with highest correlation
• **Saturday shows both higher bills and tips**, making it the day with the most generous tippers overall
• **Thursday has the weakest correlation**, suggesting more variable tipping behavior
• **Weekend days (Sat/Sun) demonstrate premium dining** with higher bills and proportional tips

## Question 6: How does tipping vary between Lunch and Dinner, split by gender?

**Objective:** Reveal how tip amounts are concentrated by time and gender using violin plot.

In [None]:
# 6. Tipping by lunch/dinner split by gender (violin plot)
plt.figure(figsize=(10, 6))
sns.violinplot(data=tips, x='time', y='tip', hue='sex', split=True)
plt.title('Tip Distribution by Time and Gender (Violin Plot)', fontsize=16, fontweight='bold')
plt.xlabel('Time of Day', fontsize=12)
plt.ylabel('Tip Amount ($)', fontsize=12)
plt.legend(title='Gender', loc='upper right')
plt.show()

# Statistical summary
time_gender_tips = tips.groupby(['time', 'sex'])['tip'].agg(['mean', 'median', 'std']).round(2)
print("Time and Gender Tip Statistics:")
print(time_gender_tips)

### Answer 6: Lunch vs Dinner Tipping by Gender

• **Dinner shows higher tip amounts** for both genders compared to lunch
• **Males show more variability in tipping at dinner**, evident from wider violin shape
• **Females have more consistent tipping patterns** with narrower distributions
• **Dinner tips have wider distribution** indicating more diverse tipping behavior
• **Gender differences are more pronounced at dinner** than at lunch

## Question 7: Which day sees the most customers?

**Objective:** Identify the busiest day of the week.

In [None]:
# 7. Customer count by day
plt.figure(figsize=(10, 6))
day_counts = tips['day'].value_counts().reindex(['Thur', 'Fri', 'Sat', 'Sun'])
sns.barplot(x=day_counts.index, y=day_counts.values, palette='viridis')
plt.title('Customer Count by Day', fontsize=16, fontweight='bold')
plt.xlabel('Day of Week', fontsize=12)
plt.ylabel('Number of Customers', fontsize=12)

# Add value labels on bars
for i, v in enumerate(day_counts.values):
    plt.text(i, v + 1, str(v), ha='center', va='bottom', fontweight='bold')

plt.show()

print("Customer Count by Day:")
print(day_counts)
print(f"\nBusiest day: {day_counts.idxmax()} ({day_counts.max()} customers)")
print(f"Quietest day: {day_counts.idxmin()} ({day_counts.min()} customers)")

### Answer 7: Day with Most Customers

• **Saturday sees the most customers** (87), making it the busiest day
• **Sunday follows as second busiest** (76 customers)
• **Thursday has the fewest customers** (62), making it the quietest day
• **Weekend days are significantly busier** than weekdays
• **Friday shows moderate traffic** (19 customers) as a transition day

## Question 8: What variables are most correlated with tip amount?

**Objective:** Identify key predictors of tip amount and their significance.

In [None]:
# 8. Correlation matrix focusing on tip amount
plt.figure(figsize=(10, 8))
numeric_tips = tips.select_dtypes(include=[np.number])
correlation_matrix = numeric_tips.corr()
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', center=0, 
            square=True, fmt='.3f', cbar_kws={'label': 'Correlation Coefficient'})
plt.title('Correlation Matrix - Numerical Variables', fontsize=16, fontweight='bold')
plt.tight_layout()
plt.show()

# Focus on tip correlations
tip_correlations = correlation_matrix['tip'].sort_values(ascending=False)
print("Variables Correlated with Tip Amount:")
print(tip_correlations)
print(f"\nStrongest correlation with tip: {tip_correlations.index[1]} ({tip_correlations.iloc[1]:.3f})")

### Answer 8: Variables Most Correlated with Tip Amount

• **Total bill has the strongest correlation** with tip amount (0.676), indicating higher bills lead to higher tips
• **Party size shows moderate correlation** (0.489), suggesting larger groups tip more
• **This logical relationship confirms** that tip amount scales with bill size
• **The correlations make intuitive sense** - bigger bills and larger parties naturally generate higher tips
• **Total bill is the best predictor** for estimating tip amounts

## Question 9: Is party size associated with higher bills or tips?

**Objective:** Examine the relationship between group size and spending/tipping patterns.

In [None]:
# 9. Party size vs bills and tips
plt.figure(figsize=(12, 6))
size_analysis = tips.groupby('size')[['total_bill', 'tip']].mean().round(2)

x = np.arange(len(size_analysis.index))
width = 0.35

plt.bar(x - width/2, size_analysis['total_bill'], width, label='Avg Total Bill', alpha=0.8, color='skyblue')
plt.bar(x + width/2, size_analysis['tip'], width, label='Avg Tip', alpha=0.8, color='lightcoral')

plt.xlabel('Party Size', fontsize=12)
plt.ylabel('Amount ($)', fontsize=12)
plt.title('Party Size vs Average Bills and Tips', fontsize=16, fontweight='bold')
plt.xticks(x, size_analysis.index)
plt.legend()
plt.grid(axis='y', alpha=0.3)
plt.show()

print("Party Size Analysis:")
print(size_analysis)
print(f"\nHighest average bill: Party size {size_analysis['total_bill'].idxmax()} (${size_analysis['total_bill'].max()})")
print(f"Highest average tip: Party size {size_analysis['tip'].idxmax()} (${size_analysis['tip'].max()})")

### Answer 9: Party Size and Bills/Tips Association

• **Yes, larger parties are associated with both higher bills and higher tips**
• **Party size of 4+ shows the highest average bills and tips**
• **The relationship is generally positive** but plateaus after size 4
• **Bills scale more dramatically** with party size than tips
• **Parties of size 6 show the highest spending** though they're less common

## Question 10: Do total bill and tip always increase together?

**Objective:** Examine the consistency of the bill-tip relationship.

In [None]:
# 10. Total bill vs tip relationship
plt.figure(figsize=(10, 6))
sns.scatterplot(data=tips, x='total_bill', y='tip', alpha=0.7, s=60)
sns.regplot(data=tips, x='total_bill', y='tip', scatter=False, color='red', line_kws={'linewidth': 2})
plt.title('Total Bill vs Tip Relationship', fontsize=16, fontweight='bold')
plt.xlabel('Total Bill ($)', fontsize=12)
plt.ylabel('Tip Amount ($)', fontsize=12)
plt.grid(alpha=0.3)
plt.show()

# Statistical analysis
overall_correlation = tips['total_bill'].corr(tips['tip'])
print(f"Overall Bill-Tip Correlation: {overall_correlation:.3f}")

# Identify outliers
tips['tip_percentage'] = (tips['tip'] / tips['total_bill']) * 100
high_tippers = tips[tips['tip_percentage'] > 25]
low_tippers = tips[tips['tip_percentage'] < 10]

print(f"\nHigh tippers (>25%): {len(high_tippers)} instances")
print(f"Low tippers (<10%): {len(low_tippers)} instances")
print(f"Average tip percentage: {tips['tip_percentage'].mean():.1f}%")

### Answer 10: Do Bill and Tip Always Increase Together?

• **Generally yes, with a strong positive correlation** (0.676), but not always
• **The regression line shows the overall trend**, but individual data points show variation
• **Some customers tip more or less** relative to their bill size
• **There are outliers** - both generous tippers and conservative tippers
• **The relationship is consistent but not perfect**, allowing for individual tipping preferences

## Question 11: How do tip and total_bill relate across days?

**Objective:** Spot whether certain days have bigger bills and tips.

In [None]:
# 11. Bill and tip relationship across days
fig, axes = plt.subplots(2, 2, figsize=(15, 12))
days = ['Thur', 'Fri', 'Sat', 'Sun']

for i, day in enumerate(days):
    row = i // 2
    col = i % 2
    
    day_data = tips[tips['day'] == day]
    
    sns.scatterplot(data=day_data, x='total_bill', y='tip', alpha=0.7, ax=axes[row, col])
    sns.regplot(data=day_data, x='total_bill', y='tip', scatter=False, color='red', ax=axes[row, col])
    
    axes[row, col].set_title(f'{day}: Bill vs Tip', fontsize=14, fontweight='bold')
    axes[row, col].set_xlabel('Total Bill ($)')
    axes[row, col].set_ylabel('Tip Amount ($)')
    axes[row, col].grid(alpha=0.3)

plt.suptitle('Bill vs Tip Relationship Across Days', fontsize=16, fontweight='bold')
plt.tight_layout()
plt.show()

# Analysis by day
print("Bill-Tip Analysis by Day:")
for day in days:
    day_data = tips[tips['day'] == day]
    corr = day_data['total_bill'].corr(day_data['tip'])
    avg_bill = day_data['total_bill'].mean()
    avg_tip = day_data['tip'].mean()
    print(f"{day}: Correlation={corr:.3f}, Avg Bill=${avg_bill:.2f}, Avg Tip=${avg_tip:.2f}")

### Answer 11: Bill-Tip Relationship Across Days

• **Saturday shows the strongest bill-tip relationship** with both higher bills and higher tips
• **Sunday also demonstrates generous tipping patterns** with consistent bill-tip correlation
• **Thursday has the weakest relationship**, suggesting more variable tipping behavior
• **Weekend days show premium dining effects** with elevated spending and proportional tipping
• **Each day maintains positive correlation**, but weekend days show stronger patterns

## Question 12: Are bigger groups more generous tippers?

**Objective:** Use hue to reveal gender-based patterns among party size and tipping generosity.

In [None]:
# 12. Party size and tipping generosity by gender
plt.figure(figsize=(12, 6))
sns.boxplot(data=tips, x='size', y='tip_percentage', hue='sex')
plt.title('Tip Percentage by Party Size and Gender', fontsize=16, fontweight='bold')
plt.xlabel('Party Size', fontsize=12)
plt.ylabel('Tip Percentage (%)', fontsize=12)
plt.legend(title='Gender', loc='upper right')
plt.grid(axis='y', alpha=0.3)
plt.show()

# Statistical analysis
generosity_analysis = tips.groupby(['size', 'sex'])['tip_percentage'].agg(['mean', 'median', 'count']).round(2)
print("Generosity Analysis (Tip Percentage by Party Size and Gender):")
print(generosity_analysis)

# Overall analysis
size_generosity = tips.groupby('size')['tip_percentage'].mean().round(2)
print("\nAverage Tip Percentage by Party Size:")
print(size_generosity)

### Answer 12: Bigger Groups and Tipping Generosity

• **Smaller parties (size 1-2) tend to have higher tip percentages.**