# Data Collection – NIFTY 50 Pairs Trading

## Objective
Collect and clean historical price data for NIFTY 50 stocks
to prepare for correlation and cointegration analysis.

## Data Source
Yahoo Finance (via yfinance)

## Frequency
Daily adjusted close prices

## Time Period
2015 onwards


In [1]:
# ===============================
# Data Collection – NIFTY 50
# ===============================

import yfinance as yf
import pandas as pd
import numpy as np

print("Libraries imported successfully")

# -------------------------------
# NIFTY 50 ticker list (Yahoo format)
# -------------------------------
nifty_50 = [
    "RELIANCE.NS", "TCS.NS", "INFY.NS", "HDFCBANK.NS",
    "ICICIBANK.NS", "SBIN.NS", "HINDUNILVR.NS",
    "ITC.NS", "LT.NS", "AXISBANK.NS"
]

# -------------------------------
# Time period
# -------------------------------
start_date = "2015-01-01"
end_date = None   # till today

# -------------------------------
# Download price data
# -------------------------------
raw_data = yf.download(
    tickers=nifty_50,
    start=start_date,
    end=end_date,
    progress=False,
    group_by="ticker"
)

# -------------------------------
# Extract CLOSE prices safely
# -------------------------------
close_prices = pd.DataFrame()

for ticker in nifty_50:
    try:
        if "Adj Close" in raw_data[ticker].columns:
            close_prices[ticker] = raw_data[ticker]["Adj Close"]
        else:
            close_prices[ticker] = raw_data[ticker]["Close"]
    except Exception as e:
        print(f"Skipping {ticker}: {e}")

# -------------------------------
# Clean data
# -------------------------------
close_prices.dropna(axis=1, how="all", inplace=True)
close_prices.dropna(inplace=True)

# -------------------------------
# Final check
# -------------------------------
print("Final dataset shape:", close_prices.shape)
close_prices.head()


Libraries imported successfully
Final dataset shape: (2741, 10)


Unnamed: 0_level_0,RELIANCE.NS,TCS.NS,INFY.NS,HDFCBANK.NS,ICICIBANK.NS,SBIN.NS,HINDUNILVR.NS,ITC.NS,LT.NS,AXISBANK.NS
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
2015-01-01,189.999832,988.880005,367.770844,217.235352,294.023865,279.880554,633.171936,161.223831,848.333435,486.962921
2015-01-02,189.496948,1002.049133,374.998047,220.258682,302.36972,280.99472,631.084961,161.728806,866.226196,497.853119
2015-01-05,187.421265,986.820679,371.775604,218.399063,302.995636,278.766388,634.716309,162.365555,878.277222,500.999115
2015-01-06,178.915253,950.440186,364.008179,214.999222,290.143066,267.312653,646.737854,158.193893,848.982544,483.09079
2015-01-07,182.809845,939.213562,365.749817,215.626694,282.298004,267.535492,669.444946,155.25177,846.9505,482.703552
