# CHALLENGE 4: The Timezone Trap (Data Alignment)

**The Problem:** Markets close at 3:30 PM IST. News at 8:00 PM today affects *tomorrow's* price.

**The Goal:** Map news timestamps to the correct **Trading Day** to avoid "Look-ahead bias."

In [37]:
import pandas as pd
import numpy as np
from datetime import datetime, time
import pytz

# Define IST timezone and Indian Market Hours
IST = pytz.timezone('Asia/Kolkata')
# Aligning news to 15:30:00 IST.
MARKET_CLOSE = time(15, 30)

In [38]:
news_data = {
    'headline': ['Apple Product', 'Tesla Earnings', 'Amazon Cloud', 'Microsoft AI', 'Google Quarter', 'Weekend Apple', 'Sat Tesla'],
    'timestamp_utc': [
        '2024-01-15 08:00:00', '2024-01-15 06:00:00', '2024-01-15 14:00:00',
        '2024-01-15 18:30:00', '2024-01-16 01:00:00', '2024-01-20 10:00:00', '2024-01-21 12:00:00'
    ],
    'sentiment_score': [0.8, 0.6, 0.7, 0.5, 0.9, 0.4, 0.3]
}

df_news = pd.DataFrame(news_data)
df_news['timestamp_utc'] = pd.to_datetime(df_news['timestamp_utc']).dt.tz_localize('UTC')
print("Input UTC Data Ready.")

Input UTC Data Ready.


In [39]:
def map_to_trading_day(ts):
    # Rule 1: Weekend → Monday
    if ts.weekday() >= 5:
        return (ts + pd.Timedelta(days=7 - ts.weekday())).replace(hour=0, minute=0, second=0, microsecond=0).tz_localize(None)

    # Rule 2: After Market Close → Next Business Day
    if ts.time() > MARKET_CLOSE:
        return (ts + pd.tseries.offsets.BusinessDay(1)).replace(hour=0, minute=0, second=0, microsecond=0).tz_localize(None)

    # Rule 3: During/Before Market → Today
    return pd.Timestamp(ts.date())

# Convert to IST and Apply Logic
df_news['timestamp_ist'] = df_news['timestamp_utc'].dt.tz_convert(IST)
df_news['trading_day'] = df_news['timestamp_ist'].apply(map_to_trading_day)

# Final Clean: Ensure everything is a simple date (removes +05:30 and time)
df_news['trading_day'] = pd.to_datetime(df_news['trading_day']).dt.normalize()

print(df_news[['headline', 'timestamp_ist', 'trading_day']])

         headline             timestamp_ist trading_day
0   Apple Product 2024-01-15 13:30:00+05:30  2024-01-15
1  Tesla Earnings 2024-01-15 11:30:00+05:30  2024-01-15
2    Amazon Cloud 2024-01-15 19:30:00+05:30  2024-01-16
3    Microsoft AI 2024-01-16 00:00:00+05:30  2024-01-16
4  Google Quarter 2024-01-16 06:30:00+05:30  2024-01-16
5   Weekend Apple 2024-01-20 15:30:00+05:30  2024-01-22
6       Sat Tesla 2024-01-21 17:30:00+05:30  2024-01-22


### Step 2: Aggregation and Merging
We group multiple news items by their `trading_day` and merge them with stock prices using a `left join`.

In [40]:
# 'freq=B' ensures we only generate dates for Business Days (Mon-Fri)
trading_dates = pd.date_range(start='2024-01-15', end='2024-01-25', freq='B')
df_stock = pd.DataFrame({'date': trading_dates, 'close': np.random.uniform(150, 160, len(trading_dates))})

# 2. Aggregate News
# Calculate the mean sentiment and count of articles per trading day
news_agg = df_news.groupby('trading_day').agg({'sentiment_score': ['mean', 'count']}).reset_index()
news_agg.columns = ['date', 'avg_sentiment', 'news_count']

# 3. Merge and Fill Zeros for days without news
df_merged = pd.merge(df_stock, news_agg, on='date', how='left').fillna(0)

print(df_merged.head(10))

        date       close  avg_sentiment  news_count
0 2024-01-15  152.306431           0.70         2.0
1 2024-01-16  157.442762           0.70         3.0
2 2024-01-17  159.202681           0.00         0.0
3 2024-01-18  150.454443           0.00         0.0
4 2024-01-19  155.319098           0.00         0.0
5 2024-01-22  156.286819           0.35         2.0
6 2024-01-23  154.813613           0.00         0.0
7 2024-01-24  150.232523           0.00         0.0
8 2024-01-25  154.815800           0.00         0.0
