# 1. Data Collection #
## Historical Market Data Collection ##
Identify Relevant Data: Determine which financial instruments (e.g., stocks, futures, forex) are relevant to your strategy. For the E-mini S&P 500 futures contract example provided, you'll collect price and volume data.

Data Granularity: Decide on the time frame for the data (e.g., 1-minute, 1-hour, daily). Higher-frequency data allows for more granular analysis but requires more storage and processing power.

Historical Depth: Determine how far back you need historical data. Longer historical periods can provide more robust training but may not always reflect current market conditions.

Data Sources: Use the TWS API for market data.

## New Headlines Collection ##
Relevance: Focus on news that directly impacts your financial instruments. For the S&P 500, this might include economic indicators, earnings reports, and major geopolitical events.

Sentiment Analysis: Consider performing sentiment analysis on news headlines to quantify the market sentiment. This can be a feature for your model.

API Integration: Use APIs provided by news sources or financial data providers. Make sure to handle rate limits and request quotas.


In [4]:
from ib_insync import IB, Future, util, Order

ib = IB()
ib.connect('127.0.0.1', 7497, clientId=2)

<IB connected to 127.0.0.1:7497 clientId=2>

In [18]:
# Update the contract details as needed
contract = Future(symbol='ES', lastTradeDateOrContractMonth='202403', exchange='CME')

# Attempt to qualify the contract
try:
    qualified_contract = ib.qualifyContracts(contract)[0]
    print("Contract qualified successfully:", qualified_contract)
except Exception as e:
    print(f"Error qualifying contract: {e}")
    qualified_contract = None

if qualified_contract:
    # Request historical data for the qualified contract
    historical_data = ib.reqHistoricalData(
        qualified_contract, endDateTime='', durationStr='30 D',
        barSizeSetting='1 hour', whatToShow='MIDPOINT', useRTH=True)

    if historical_data:
        df = util.df(historical_data)
        print(df.head())
    else:
        print("No historical data returned.")
else:
    print("Unable to request historical data due to contract issues.")

Contract qualified successfully: Future(conId=533620665, symbol='ES', lastTradeDateOrContractMonth='20240315', multiplier='50', exchange='CME', currency='USD', localSymbol='ESH4', tradingClass='ES')
                       date     open     high      low    close  volume  \
0 2024-01-30 08:30:00-06:00  4946.25  4953.75  4946.00  4953.50    -1.0   
1 2024-01-30 09:00:00-06:00  4953.50  4954.00  4944.00  4947.25    -1.0   
2 2024-01-30 10:00:00-06:00  4947.25  4955.75  4945.25  4954.25    -1.0   
3 2024-01-30 11:00:00-06:00  4954.25  4955.75  4941.50  4944.75    -1.0   
4 2024-01-30 12:00:00-06:00  4944.75  4951.25  4942.00  4948.25    -1.0   

   average  barCount  
0     -1.0        -1  
1     -1.0        -1  
2     -1.0        -1  
3     -1.0        -1  
4     -1.0        -1  


In [19]:
ib.reqMarketDataType(3)  # Switch to delayed data if necessary

historical_data = ib.reqHistoricalData(
    contract, endDateTime='', durationStr='30 D',
    barSizeSetting='1 hour', whatToShow='MIDPOINT', useRTH=True)

df = util.df(historical_data)
print(df.head())

                       date     open     high      low    close  volume  \
0 2024-01-30 08:30:00-06:00  4946.25  4953.75  4946.00  4953.50    -1.0   
1 2024-01-30 09:00:00-06:00  4953.50  4954.00  4944.00  4947.25    -1.0   
2 2024-01-30 10:00:00-06:00  4947.25  4955.75  4945.25  4954.25    -1.0   
3 2024-01-30 11:00:00-06:00  4954.25  4955.75  4941.50  4944.75    -1.0   
4 2024-01-30 12:00:00-06:00  4944.75  4951.25  4942.00  4948.25    -1.0   

   average  barCount  
0     -1.0        -1  
1     -1.0        -1  
2     -1.0        -1  
3     -1.0        -1  
4     -1.0        -1  


In [29]:
import requests
from dateutil.parser import parse

def get_headlines(api_token):
    url = "https://api.marketaux.com/v1/news/all?symbols=^GSPC&filter_entities=true&language=en&api_token="+api_token
    
    cleaned_data = []
    for _ in range(3):
        response = requests.get(url)
        news_headlines = response.json()
        for article in news_headlines['data']:
            date = article['published_at']
            title = article['title']
            cleaned_data.append({'date': date, 'headline': title})
    
    return cleaned_data

def extract_headlines(data):
    headlines = {}  # This will store dates with a list of headlines
    
    # Ensure 'data' is a list of dictionaries, each containing an article with 'publishedAt' and 'title'
    for item in data:
        # Parse the publication date of the article and format it as a string
        date = parse(item['date']).date()
        date_str = date.strftime('%Y-%m-%d')
        
        # Ensure there is a list to append headlines for the given date
        if date_str not in headlines:
            headlines[date_str] = []
        
        # Append the headline to the list for the given date
        headlines[date_str].append(item['headline'])
    
    return headlines

In [30]:
api_token = 'jLSvSMQQg0Kk22VWfOiLOqqwjui1e0CZy4gQWsnu'  # Replace with your actual API token
headlines_data = get_headlines(api_token)
organized_headlines = extract_headlines(headlines_data)

# To print the organized headlines:
for date, titles in organized_headlines.items():
    print(f"Date: {date}")
    for title in titles:
        print(f" - {title}")

Date: 2024-03-12
 - S&P 500 Gains and Losses Today: Oracle Surges as Results Show AI Advancement
 - U.S. Stock Market Optimism Amid Inflation Concerns
 - Market Talk – March 12, 2024
 - S&P 500 Gains and Losses Today: Oracle Surges as Results Show AI Advancement
 - U.S. Stock Market Optimism Amid Inflation Concerns
 - Market Talk – March 12, 2024
 - S&P 500 Gains and Losses Today: Oracle Surges as Results Show AI Advancement
 - U.S. Stock Market Optimism Amid Inflation Concerns
 - Market Talk – March 12, 2024


Peer closed connection.


# 2. Feature Extraction & Preprocessing #