# Data Reshaping - Part 3: Time Series Manipulation Basics

## Week 4, Day 1 (Wednesday) - April 30th, 2025

### Overview
This is the final part of our Data Reshaping session, focusing on basic time series manipulation. Understanding how to work with dates and times is crucial for e-commerce analysis, where you often need to analyze trends, seasonality, and time-based patterns in sales, customer behavior, and business metrics.

### Learning Objectives
- Understand pandas datetime data types and capabilities
- Master date parsing and datetime index creation
- Learn basic time series operations (filtering, resampling, shifting)
- Apply time series techniques to e-commerce scenarios
- Handle common time zone and date format issues
- Prepare data for time-based analysis and visualization

### Prerequisites
- Completed Part 1: Merge, Join, and Concatenate
- Completed Part 2: Melt and Pivot Operations
- Understanding of Pandas DataFrames and indexing
- Basic knowledge of date/time concepts

## 1. Introduction to Time Series Data

### What is Time Series Data?
Time series data consists of observations recorded at different points in time. In e-commerce, examples include:
- Daily sales figures
- Hourly website traffic
- Monthly customer acquisition
- Quarterly revenue reports

### Why Time Series Manipulation Matters
- **Trend Analysis**: Identify growth or decline patterns
- **Seasonality**: Understand cyclical patterns (holidays, weekends)
- **Forecasting**: Predict future performance
- **Reporting**: Create time-based dashboards and reports

In [None]:
# Import libraries
import pandas as pd
import numpy as np
from datetime import datetime, timedelta, date
import matplotlib.pyplot as plt

# Set display options
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)

print("Libraries imported successfully!")
print(f"Pandas version: {pd.__version__}")

## 2. Pandas Datetime Data Types

### Key Datetime Objects in Pandas
- **Timestamp**: A single point in time
- **DatetimeIndex**: An array of timestamps (used as DataFrame index)
- **Period**: A fixed time span (e.g., "January 2025")
- **Timedelta**: Duration between two points in time

In [None]:
# Creating different datetime objects

# 1. Timestamp - single point in time
timestamp = pd.Timestamp('2025-01-15 14:30:00')
print(f"Timestamp: {timestamp}")
print(f"Type: {type(timestamp)}")

# 2. DatetimeIndex - array of timestamps
date_index = pd.date_range('2025-01-01', periods=5, freq='D')
print(f"\nDatetimeIndex: {date_index}")
print(f"Type: {type(date_index)}")

# 3. Period - fixed time span
period = pd.Period('2025-01', freq='M')
print(f"\nPeriod: {period}")
print(f"Type: {type(period)}")

# 4. Timedelta - duration
timedelta = pd.Timedelta(days=30, hours=5, minutes=30)
print(f"\nTimedelta: {timedelta}")
print(f"Type: {type(timedelta)}")

## 3. Creating and Parsing Dates

### Converting Strings to Datetime

In [None]:
# Different ways to create datetime objects

# From string (automatic parsing)
date_str = '2025-01-15'
parsed_date = pd.to_datetime(date_str)
print(f"Parsed date: {parsed_date}")

# From multiple string formats
date_strings = ['2025-01-15', '01/16/2025', '2025-01-17 14:30:00', 'Jan 18, 2025']
parsed_dates = pd.to_datetime(date_strings)
print(f"\nMultiple formats: {parsed_dates}")

# From components
component_dates = pd.to_datetime({
    'year': [2025, 2025, 2025],
    'month': [1, 2, 3],
    'day': [15, 16, 17]
})
print(f"\nFrom components: {component_dates}")

### Handling Different Date Formats

In [None]:
# E-commerce data often comes in various formats
messy_dates = [
    '15-Jan-2025',
    '2025/01/16',
    '17.01.2025',
    '18-1-25'
]

# Pandas can handle most formats automatically
clean_dates = pd.to_datetime(messy_dates)
print("Cleaned dates:")
for original, cleaned in zip(messy_dates, clean_dates):
    print(f"{original:12} -> {cleaned}")

# For problematic formats, specify the format
specific_format = pd.to_datetime('18-1-25', format='%d-%m-%y')
print(f"\nSpecific format: '18-1-25' -> {specific_format}")

# Handle errors gracefully
problematic_dates = ['2025-01-15', 'not-a-date', '2025-02-30']
safe_dates = pd.to_datetime(problematic_dates, errors='coerce')
print(f"\nWith errors: {safe_dates}")
print("Note: Invalid dates become NaT (Not a Time)")

## 4. Creating Sample E-commerce Time Series Data

Let's create realistic e-commerce datasets with time components:

In [None]:
# Create daily sales data for a month
np.random.seed(42)

# Generate date range
date_range = pd.date_range('2025-01-01', '2025-01-31', freq='D')

# Create sales data with some realistic patterns
base_sales = 1000
weekend_boost = 200  # Higher sales on weekends
random_variation = np.random.normal(0, 100, len(date_range))

daily_sales = []
for i, date in enumerate(date_range):
    # Base sales
    sales = base_sales
    
    # Weekend boost (Saturday=5, Sunday=6)
    if date.weekday() >= 5:
        sales += weekend_boost
    
    # Random variation
    sales += random_variation[i]
    
    # Ensure non-negative
    sales = max(sales, 100)
    
    daily_sales.append(round(sales, 2))

# Create DataFrame
sales_data = pd.DataFrame({
    'date': date_range,
    'sales_amount': daily_sales,
    'day_of_week': date_range.day_name(),
    'is_weekend': date_range.weekday >= 5
})

print("Daily Sales Data:")
print(sales_data.head(10))
print(f"\nData shape: {sales_data.shape}")
print(f"Date range: {sales_data['date'].min()} to {sales_data['date'].max()}")

In [None]:
# Create hourly order data for a few days
hourly_range = pd.date_range('2025-01-15', '2025-01-17 23:00:00', freq='H')

# Simulate hourly order patterns (more orders during business hours)
hourly_orders = []
for dt in hourly_range:
    hour = dt.hour
    base_orders = 5
    
    # Business hours boost (9 AM - 6 PM)
    if 9 <= hour <= 18:
        base_orders += 10
    
    # Lunch time peak (12 PM - 2 PM)
    if 12 <= hour <= 14:
        base_orders += 5
    
    # Add some randomness
    final_orders = base_orders + np.random.poisson(3)
    hourly_orders.append(final_orders)

hourly_data = pd.DataFrame({
    'datetime': hourly_range,
    'order_count': hourly_orders
})

print("Hourly Order Data (sample):")
print(hourly_data.head(12))
print(f"\nTotal hours: {len(hourly_data)}")

## 5. Setting DateTime as Index

### Why Use DateTime Index?
- Enables powerful time-based operations
- Simplifies filtering and slicing by dates
- Required for many time series functions
- Improves performance for time-based queries

In [None]:
# Set datetime as index
sales_indexed = sales_data.set_index('date')
print("Sales data with datetime index:")
print(sales_indexed.head())
print(f"\nIndex type: {type(sales_indexed.index)}")
print(f"Index name: {sales_indexed.index.name}")

# Create directly with datetime index
hourly_indexed = hourly_data.set_index('datetime')
print("\nHourly data with datetime index:")
print(hourly_indexed.head(8))

## 6. Time-Based Indexing and Filtering

### Selecting Data by Date

In [None]:
# Select specific date
specific_date = sales_indexed.loc['2025-01-15']
print("Sales on 2025-01-15:")
print(specific_date)

# Select date range
date_range_data = sales_indexed.loc['2025-01-10':'2025-01-15']
print("\nSales from Jan 10-15:")
print(date_range_data)

# Select by month
january_data = sales_indexed.loc['2025-01']
print(f"\nJanuary data shape: {january_data.shape}")
print("First few rows:")
print(january_data.head(3))

### Boolean Indexing with Dates

In [None]:
# Filter by date conditions
recent_sales = sales_indexed[sales_indexed.index >= '2025-01-20']
print("Sales from Jan 20 onwards:")
print(recent_sales.head())

# Filter weekends
weekend_sales = sales_indexed[sales_indexed['is_weekend'] == True]
print(f"\nWeekend sales count: {len(weekend_sales)}")
print("Weekend sales sample:")
print(weekend_sales.head())

# Complex date filtering
mid_month_weekdays = sales_indexed[
    (sales_indexed.index >= '2025-01-10') & 
    (sales_indexed.index <= '2025-01-20') & 
    (sales_indexed['is_weekend'] == False)
]
print(f"\nMid-month weekdays count: {len(mid_month_weekdays)}")

## 7. Time Series Operations

### Extracting Date Components

In [None]:
# Extract various date components
sales_components = sales_indexed.copy()

# Extract date parts
sales_components['year'] = sales_components.index.year
sales_components['month'] = sales_components.index.month
sales_components['day'] = sales_components.index.day
sales_components['weekday'] = sales_components.index.weekday  # 0=Monday, 6=Sunday
sales_components['week_of_year'] = sales_components.index.isocalendar().week
sales_components['quarter'] = sales_components.index.quarter

print("Sales data with extracted date components:")
print(sales_components[['sales_amount', 'year', 'month', 'day', 'weekday', 'quarter']].head(10))

# Summary by day of week
weekday_summary = sales_components.groupby('weekday')['sales_amount'].agg(['mean', 'count'])
weekday_summary.index = ['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun']
print("\nSales by day of week:")
print(weekday_summary)

### Shifting and Lagging Data

In [None]:
# Create lagged variables for comparison
sales_with_lags = sales_indexed[['sales_amount']].copy()

# Previous day sales
sales_with_lags['prev_day_sales'] = sales_with_lags['sales_amount'].shift(1)

# Sales from a week ago
sales_with_lags['week_ago_sales'] = sales_with_lags['sales_amount'].shift(7)

# Calculate day-over-day change
sales_with_lags['daily_change'] = sales_with_lags['sales_amount'] - sales_with_lags['prev_day_sales']
sales_with_lags['daily_change_pct'] = (sales_with_lags['daily_change'] / sales_with_lags['prev_day_sales'] * 100).round(2)

# Week-over-week change
sales_with_lags['weekly_change'] = sales_with_lags['sales_amount'] - sales_with_lags['week_ago_sales']

print("Sales with lagged variables:")
print(sales_with_lags.head(10))

# Show some interesting changes
print("\nLargest daily increases:")
print(sales_with_lags.nlargest(3, 'daily_change')[['sales_amount', 'prev_day_sales', 'daily_change', 'daily_change_pct']])

## 8. Resampling Time Series Data

### What is Resampling?
Resampling changes the frequency of your time series data:
- **Downsampling**: Higher to lower frequency (daily to weekly)
- **Upsampling**: Lower to higher frequency (monthly to daily)

### Downsampling Examples

In [None]:
# Resample daily data to weekly
weekly_sales = sales_indexed['sales_amount'].resample('W').agg({
    'total_sales': 'sum',
    'avg_daily_sales': 'mean',
    'max_daily_sales': 'max',
    'days_count': 'count'
})

print("Weekly sales summary:")
print(weekly_sales)

# Monthly summary
monthly_sales = sales_indexed['sales_amount'].resample('M').agg({
    'total_sales': 'sum',
    'avg_daily_sales': 'mean',
    'std_daily_sales': 'std'
})

print("\nMonthly sales summary:")
print(monthly_sales)

In [None]:
# Resample hourly data to daily
daily_from_hourly = hourly_indexed['order_count'].resample('D').agg({
    'total_orders': 'sum',
    'avg_hourly_orders': 'mean',
    'peak_hourly_orders': 'max',
    'hours_with_data': 'count'
})

print("Daily summary from hourly data:")
print(daily_from_hourly)

# Business hours analysis (9 AM - 6 PM)
business_hours = hourly_indexed.between_time('09:00', '18:00')
business_daily = business_hours['order_count'].resample('D').sum()

print("\nDaily orders during business hours:")
print(business_daily)

### Common Resampling Frequencies
- **'D'**: Daily
- **'W'**: Weekly (Sunday to Saturday)
- **'M'**: Monthly (end of month)
- **'Q'**: Quarterly
- **'H'**: Hourly
- **'T' or 'min'**: Minute
- **'S'**: Second

In [None]:
# Different resampling periods
print("Different resampling frequencies:")

# Every 3 days
three_day_sales = sales_indexed['sales_amount'].resample('3D').sum()
print(f"Every 3 days: {len(three_day_sales)} periods")
print(three_day_sales.head())

# Bi-weekly (every 2 weeks)
biweekly_sales = sales_indexed['sales_amount'].resample('2W').sum()
print(f"\nBi-weekly: {len(biweekly_sales)} periods")
print(biweekly_sales)

# Custom business day frequency (Monday to Friday)
business_day_avg = sales_indexed['sales_amount'].resample('B').mean()
print(f"\nBusiness days average (first 10 days):")
print(business_day_avg.head(10))

## 9. Working with Time Zones

### Understanding Time Zones in E-commerce
- Global e-commerce operates across time zones
- Server time vs customer local time
- Important for accurate reporting and analysis

In [None]:
# Create timezone-aware data
utc_times = pd.date_range('2025-01-15 12:00:00', periods=5, freq='H', tz='UTC')
print("UTC times:")
print(utc_times)

# Convert to different time zones
eastern_times = utc_times.tz_convert('US/Eastern')
pacific_times = utc_times.tz_convert('US/Pacific')
london_times = utc_times.tz_convert('Europe/London')

print("\nSame moment in different time zones:")
for i in range(len(utc_times)):
    print(f"UTC: {utc_times[i]} | Eastern: {eastern_times[i]} | Pacific: {pacific_times[i]} | London: {london_times[i]}")

# Localize naive datetime to a specific timezone
naive_datetime = pd.Timestamp('2025-01-15 15:30:00')
eastern_localized = naive_datetime.tz_localize('US/Eastern')
print(f"\nNaive: {naive_datetime}")
print(f"Eastern localized: {eastern_localized}")
print(f"Converted to UTC: {eastern_localized.tz_convert('UTC')}")

## 10. Real-World E-commerce Time Series Examples

### Example 1: Customer Activity Patterns

In [None]:
# Create customer activity data
activity_dates = pd.date_range('2025-01-01', '2025-01-31', freq='D')
np.random.seed(123)

customer_activity = pd.DataFrame({
    'date': activity_dates,
    'new_customers': np.random.poisson(15, len(activity_dates)),
    'returning_customers': np.random.poisson(45, len(activity_dates)),
    'page_views': np.random.normal(10000, 2000, len(activity_dates)).astype(int),
    'orders': np.random.poisson(85, len(activity_dates))
})

# Set date as index
customer_activity = customer_activity.set_index('date')

print("Daily Customer Activity:")
print(customer_activity.head(10))

# Calculate conversion rates
customer_activity['total_customers'] = customer_activity['new_customers'] + customer_activity['returning_customers']
customer_activity['conversion_rate'] = (customer_activity['orders'] / customer_activity['total_customers'] * 100).round(2)
customer_activity['pages_per_customer'] = (customer_activity['page_views'] / customer_activity['total_customers']).round(1)

print("\nWith calculated metrics:")
print(customer_activity[['total_customers', 'orders', 'conversion_rate', 'pages_per_customer']].head())

### Example 2: Seasonal Sales Analysis

In [None]:
# Weekly performance analysis
weekly_performance = customer_activity.resample('W').agg({
    'new_customers': 'sum',
    'returning_customers': 'sum',
    'orders': 'sum',
    'page_views': 'sum',
    'conversion_rate': 'mean'
})

# Calculate weekly totals and rates
weekly_performance['total_customers'] = weekly_performance['new_customers'] + weekly_performance['returning_customers']
weekly_performance['weekly_conversion_rate'] = (weekly_performance['orders'] / weekly_performance['total_customers'] * 100).round(2)

print("Weekly Performance Summary:")
print(weekly_performance)

# Month-over-month comparison (if we had multiple months)
monthly_summary = customer_activity.resample('M').agg({
    'new_customers': 'sum',
    'returning_customers': 'sum',
    'orders': 'sum',
    'page_views': 'sum'
})

print("\nMonthly Summary:")
print(monthly_summary)

### Example 3: Moving Averages for Trend Analysis

In [None]:
# Calculate moving averages to smooth out daily fluctuations
trend_analysis = sales_indexed[['sales_amount']].copy()

# 3-day moving average
trend_analysis['sales_3day_ma'] = trend_analysis['sales_amount'].rolling(window=3).mean()

# 7-day moving average
trend_analysis['sales_7day_ma'] = trend_analysis['sales_amount'].rolling(window=7).mean()

# Exponential moving average (gives more weight to recent values)
trend_analysis['sales_ema'] = trend_analysis['sales_amount'].ewm(span=7).mean()

print("Trend Analysis with Moving Averages:")
print(trend_analysis.head(10))

# Show the smoothing effect
print("\nComparison of actual vs smoothed values (last 5 days):")
comparison = trend_analysis[['sales_amount', 'sales_7day_ma', 'sales_ema']].tail()
print(comparison.round(2))

# Calculate trend direction
trend_analysis['trend_direction'] = np.where(
    trend_analysis['sales_7day_ma'] > trend_analysis['sales_7day_ma'].shift(1), 
    'Up', 'Down'
)

print("\nTrend directions (last 10 days):")
print(trend_analysis[['sales_amount', 'sales_7day_ma', 'trend_direction']].tail(10))

## 11. Common Time Series Challenges and Solutions

### Challenge 1: Missing Dates

In [None]:
# Create data with missing dates
incomplete_dates = pd.date_range('2025-01-01', '2025-01-10', freq='D')
# Remove some dates to simulate missing data
incomplete_dates = incomplete_dates.delete([2, 5, 7])  # Remove 3rd, 6th, and 8th days

incomplete_sales = pd.DataFrame({
    'date': incomplete_dates,
    'sales': [1000, 1100, 950, 1200, 1050, 1150, 1080]
}).set_index('date')

print("Data with missing dates:")
print(incomplete_sales)

# Solution 1: Reindex to include all dates
full_date_range = pd.date_range('2025-01-01', '2025-01-10', freq='D')
complete_sales = incomplete_sales.reindex(full_date_range)

print("\nAfter reindexing (with NaN for missing dates):")
print(complete_sales)

# Solution 2: Fill missing values
complete_sales_filled = complete_sales.fillna(method='forward')  # Forward fill
print("\nWith forward fill:")
print(complete_sales_filled)

# Solution 3: Interpolate missing values
complete_sales_interpolated = complete_sales.interpolate()
print("\nWith interpolation:")
print(complete_sales_interpolated)

### Challenge 2: Irregular Time Intervals

In [None]:
# Create irregular timestamp data (like real customer orders)
irregular_orders = pd.DataFrame({
    'timestamp': [
        '2025-01-15 09:15:23',
        '2025-01-15 09:47:12',
        '2025-01-15 11:23:45',
        '2025-01-15 14:56:33',
        '2025-01-15 16:12:09',
        '2025-01-15 18:34:21'
    ],
    'order_value': [45.99, 123.50, 67.25, 234.00, 89.99, 156.75]
})

irregular_orders['timestamp'] = pd.to_datetime(irregular_orders['timestamp'])
irregular_orders = irregular_orders.set_index('timestamp')

print("Irregular order timestamps:")
print(irregular_orders)

# Solution: Resample to regular intervals
hourly_totals = irregular_orders['order_value'].resample('H').sum()
print("\nResampled to hourly totals:")
print(hourly_totals[hourly_totals > 0])  # Show only hours with orders

# Count orders per hour
hourly_counts = irregular_orders['order_value'].resample('H').count()
print("\nOrders per hour:")
print(hourly_counts[hourly_counts > 0])

## 12. Preparing Data for Visualization and Analysis

### Creating Analysis-Ready Time Series Data

In [None]:
# Create a comprehensive time series dataset for analysis
analysis_data = sales_indexed[['sales_amount']].copy()

# Add time components
analysis_data['year'] = analysis_data.index.year
analysis_data['month'] = analysis_data.index.month
analysis_data['day'] = analysis_data.index.day
analysis_data['weekday'] = analysis_data.index.weekday
analysis_data['is_weekend'] = analysis_data.index.weekday >= 5
analysis_data['day_name'] = analysis_data.index.day_name()

# Add lagged variables
analysis_data['sales_lag1'] = analysis_data['sales_amount'].shift(1)
analysis_data['sales_lag7'] = analysis_data['sales_amount'].shift(7)

# Add moving averages
analysis_data['sales_ma3'] = analysis_data['sales_amount'].rolling(3).mean()
analysis_data['sales_ma7'] = analysis_data['sales_amount'].rolling(7).mean()

# Add rolling statistics
analysis_data['sales_std7'] = analysis_data['sales_amount'].rolling(7).std()
analysis_data['sales_min7'] = analysis_data['sales_amount'].rolling(7).min()
analysis_data['sales_max7'] = analysis_data['sales_amount'].rolling(7).max()

# Add percentage changes
analysis_data['daily_change_pct'] = analysis_data['sales_amount'].pct_change() * 100
analysis_data['weekly_change_pct'] = analysis_data['sales_amount'].pct_change(periods=7) * 100

print("Comprehensive analysis dataset:")
print(analysis_data.head(10))
print(f"\nColumns: {list(analysis_data.columns)}")
print(f"Shape: {analysis_data.shape}")

In [None]:
# Summary statistics by different time periods
print("Weekly summary statistics:")
weekly_stats = analysis_data.groupby('weekday')['sales_amount'].agg([
    'count', 'mean', 'std', 'min', 'max'
]).round(2)
weekly_stats.index = ['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun']
print(weekly_stats)

print("\nWeekend vs Weekday comparison:")
weekend_comparison = analysis_data.groupby('is_weekend')['sales_amount'].agg([
    'count', 'mean', 'std'
]).round(2)
weekend_comparison.index = ['Weekday', 'Weekend']
print(weekend_comparison)

## 13. Practice Exercises

### Exercise 1: Date Parsing and Cleaning
Clean and parse the following messy date data from an e-commerce system.

In [None]:
# Exercise data with messy dates
messy_order_data = pd.DataFrame({
    'order_id': ['ORD001', 'ORD002', 'ORD003', 'ORD004', 'ORD005'],
    'order_date': ['2025-01-15', '15/01/2025', 'Jan 16, 2025', '2025.01.17', '17-1-25'],
    'ship_date': ['2025-01-16 14:30', '16/01/2025 10:15', 'Jan 17, 2025 16:45', '2025.01.18 09:30', '18-1-25 11:20'],
    'order_value': [123.45, 67.89, 234.56, 89.12, 156.78]
})

print("Messy order data:")
print(messy_order_data)

# Your task:
# 1. Convert order_date and ship_date to proper datetime format
# 2. Calculate delivery_time (difference between ship_date and order_date)
# 3. Set order_date as the index
# 4. Add day_of_week column

# Your code here:


### Exercise 2: Time Series Resampling
Create hourly sales data and then analyze it at different time frequencies.

In [None]:
# Create hourly sales data for a week
np.random.seed(456)
hourly_sales_dates = pd.date_range('2025-01-13', '2025-01-19 23:00:00', freq='H')
hourly_sales_values = []

for dt in hourly_sales_dates:
    base_sales = 50
    hour = dt.hour
    weekday = dt.weekday
    
    # Business hours boost
    if 9 <= hour <= 17:
        base_sales += 30
    
    # Weekend reduction
    if weekday >= 5:
        base_sales *= 0.7
    
    # Add randomness
    final_sales = base_sales + np.random.normal(0, 15)
    hourly_sales_values.append(max(final_sales, 10))  # Minimum 10

hourly_sales_df = pd.DataFrame({
    'datetime': hourly_sales_dates,
    'sales': hourly_sales_values
}).set_index('datetime')

print(f"Hourly sales data created: {len(hourly_sales_df)} hours")
print(hourly_sales_df.head(12))

# Your tasks:
# 1. Resample to daily totals
# 2. Resample to 6-hour periods with average sales
# 3. Create business hours summary (9 AM - 5 PM)
# 4. Compare weekend vs weekday average sales

# Your code here:


### Exercise 3: Moving Averages and Trend Analysis
Analyze trends in the provided customer data.

In [None]:
# Customer registration data
registration_dates = pd.date_range('2025-01-01', '2025-01-31', freq='D')
np.random.seed(789)

# Simulate growing customer registrations with some noise
base_registrations = 20
growth_trend = np.linspace(0, 10, len(registration_dates))  # Linear growth
noise = np.random.normal(0, 5, len(registration_dates))
registrations = base_registrations + growth_trend + noise
registrations = [max(int(reg), 1) for reg in registrations]  # Minimum 1

customer_registrations = pd.DataFrame({
    'date': registration_dates,
    'new_customers': registrations
}).set_index('date')

print("Daily customer registrations:")
print(customer_registrations.head(10))

# Your tasks:
# 1. Calculate 3-day, 7-day, and 14-day moving averages
# 2. Identify days where actual registrations were >20% above the 7-day average
# 3. Calculate the overall trend (is it increasing or decreasing?)
# 4. Find the best and worst performing weeks

# Your code here:


### Exercise 4: Time Zone Conversion Challenge
Convert order timestamps from different time zones to UTC for analysis.

In [None]:
# Orders from different time zones
global_orders = pd.DataFrame({
    'order_id': ['US_001', 'EU_001', 'ASIA_001', 'US_002', 'EU_002'],
    'local_timestamp': [
        '2025-01-15 10:30:00',  # US Eastern
        '2025-01-15 16:30:00',  # Europe/London
        '2025-01-16 01:30:00',  # Asia/Tokyo
        '2025-01-15 14:45:00',  # US Pacific
        '2025-01-15 18:15:00'   # Europe/Berlin
    ],
    'timezone': ['US/Eastern', 'Europe/London', 'Asia/Tokyo', 'US/Pacific', 'Europe/Berlin'],
    'order_value': [123.45, 67.89, 234.56, 89.12, 156.78]
})

print("Global orders with local timestamps:")
print(global_orders)

# Your tasks:
# 1. Convert all local timestamps to UTC
# 2. Sort orders by UTC timestamp
# 3. Calculate the time difference between the first and last order
# 4. Group orders by UTC hour to see global activity patterns

# Your code here:


## Next Steps and Summary

Congratulations! You've completed the comprehensive Data Reshaping session covering:

### What We've Learned Today:

**Part 1: Merge, Join, and Concatenate**
- Combining DataFrames using `pd.merge()` and `pd.concat()`
- Different types of joins (inner, left, right, outer)
- Handling complex multi-table relationships

**Part 2: Melt and Pivot Operations**
- Converting between wide and long formats
- Creating pivot tables for analysis and reporting
- Advanced reshaping techniques

**Part 3: Time Series Manipulation Basics**
- Working with datetime data types and indices
- Time-based filtering and resampling
- Moving averages and trend analysis
- Time zone handling for global e-commerce

### Key Skills Acquired:
1. **Data Integration**: Combine data from multiple sources effectively
2. **Data Reshaping**: Transform data for analysis and visualization
3. **Time Series Preparation**: Handle temporal data for trend analysis
4. **E-commerce Analytics**: Apply techniques to real business scenarios
5. **Problem Solving**: Handle common data quality issues

### What's Next:
Tomorrow (Thursday, May 1st), we'll dive into the **Olist Brazilian E-commerce Dataset**:
- Understanding the database schema and relationships
- Loading and exploring multiple data tables
- Applying today's techniques to real e-commerce data
- Initial data exploration and quality assessment

### Practice Recommendations:
- Work through all the exercises in each part
- Try combining techniques (e.g., merge data, then pivot, then analyze trends)
- Experiment with your own datasets
- Review SQL equivalents to reinforce your existing knowledge

### Important Reminders:
- **Always verify your data** after reshaping operations
- **Plan your transformations** step by step
- **Handle missing values** appropriately for your analysis
- **Consider performance** with large datasets
- **Document your process** for reproducibility

These data transformation skills form the foundation for all advanced analytics work you'll do throughout the course!