# Add Sharadar Tickers Metadata - FAST VERSION

This notebook uses a SQL-based approach that is much faster and uses minimal memory.

Instead of creating 241M rows in Python and writing them, we:
1. Write ticker metadata once (60K rows)
2. Use SQL JOIN at query time (no pre-expansion needed)

This is **orders of magnitude faster** and uses almost no extra disk space.

In [1]:
import pandas as pd
import sqlite3

## 1. Load Sharadar Tickers Metadata

In [2]:
# Load Sharadar tickers
tickers_path = '/root/.zipline/data/sharadar/2025-11-23T04;09;32.033611/fundamentals/tickers.h5'
tickers = pd.read_hdf(tickers_path, key='tickers')

print(f'Loaded {len(tickers)} tickers')
tickers.head()

Loaded 60303 tickers


Unnamed: 0_level_0,table,permaticker,ticker,name,exchange,isdelisted,category,cusips,siccode,sicsector,...,currency,location,lastupdated,firstadded,firstpricedate,lastpricedate,firstquarter,lastquarter,secfilings,companysite
None,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,SFP,645772,IFLR,INNOVATOR INTERNATIONAL DEVELOPED MANAGED FLOO...,NYSEARCA,N,ETF,45784N387,,,...,USD,Illinois; U.S.A,2025-11-22,2025-11-20,2025-11-20,2025-11-21,,,https://www.sec.gov/cgi-bin/browse-edgar?actio...,
1,SFP,645771,CAIQ,CALAMOS NASDAQ AUTOCALLABLE INCOME ETF,NASDAQ,N,ETF,12811T530,,,...,USD,Illinois; U.S.A,2025-11-21,2025-11-20,2025-11-20,2025-11-21,,,https://www.sec.gov/cgi-bin/browse-edgar?actio...,
2,SFP,645770,ESBG,FIRST TRUST ENHANCED STOCKS BONDS & GOLD ETF,NYSEARCA,N,ETF,33739H200,,,...,USD,Illinois; U.S.A,2025-11-22,2025-11-20,2025-11-19,2025-11-21,,,https://www.sec.gov/cgi-bin/browse-edgar?actio...,
3,SFP,645769,TXXD,21SHARES 2X LONG DOGECOIN ETF,NASDAQ,N,ETF,53656G175,,,...,USD,Wisconsin; U.S.A,2025-11-21,2025-11-20,2025-11-20,2025-11-21,,,https://www.sec.gov/cgi-bin/browse-edgar?actio...,
4,SFP,645768,MNZL,MANZIL RUSSELL HALAL USA BROAD MARKET ETF,NASDAQ,N,ETF,02072Q317,,,...,USD,Pennsylvania; U.S.A,2025-11-21,2025-11-20,2025-11-19,2025-11-21,,,https://www.sec.gov/cgi-bin/browse-edgar?actio...,


## 2. Prepare Ticker Metadata (Static Table)

In [3]:
# Select columns we want
cols = [
    'ticker',
    'exchange',
    'category',
    'location',
    'sector',
    'industry',
    'sicsector',
    'sicindustry',
    'scalemarketcap'
]

tickers_subset = tickers[cols].copy()
tickers_subset = tickers_subset.rename(columns={'ticker': 'Symbol'})

# Create is_adr flag
tickers_subset['is_adr'] = tickers_subset['category'].str.contains('ADR', na=False).astype(int)

print(f'Prepared {len(tickers_subset)} ticker records')
print(f'\nExchange distribution:')
print(tickers_subset['exchange'].value_counts())
print(f'\nADR stocks: {tickers_subset["is_adr"].sum()}')

tickers_subset.head()

Prepared 60303 ticker records

Exchange distribution:
exchange
NASDAQ      26970
NYSE        13381
None        12092
NYSEARCA     3775
NYSEMKT      2627
BATS         1441
OTC            12
INDEX           5
Name: count, dtype: int64

ADR stocks: 4354


Unnamed: 0_level_0,Symbol,exchange,category,location,sector,industry,sicsector,sicindustry,scalemarketcap,is_adr
None,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
0,IFLR,NYSEARCA,ETF,Illinois; U.S.A,,,,,,0
1,CAIQ,NASDAQ,ETF,Illinois; U.S.A,,,,,,0
2,ESBG,NYSEARCA,ETF,Illinois; U.S.A,,,,,,0
3,TXXD,NASDAQ,ETF,Wisconsin; U.S.A,,,,,,0
4,MNZL,NASDAQ,ETF,Pennsylvania; U.S.A,,,,,,0


## 3. Write to Database (Just 60K rows!)

In [4]:
# Connect to database
db_path = '/data/custom_databases/fundamentals.sqlite'
conn = sqlite3.connect(db_path)

# Write ticker metadata (static - no dates)
print('Writing ticker metadata to database...')
tickers_subset.to_sql('SharadarTickers', conn, index=False, if_exists='replace', chunksize=1000)

# Create index on Symbol for fast lookups
print('Creating index on Symbol...')
conn.execute('CREATE INDEX IF NOT EXISTS idx_sharadar_tickers_symbol ON SharadarTickers(Symbol)')
conn.commit()

print('Done! Ticker metadata table created.')
print(f'Rows: {len(tickers_subset):,}')

Writing ticker metadata to database...
Creating index on Symbol...
Done! Ticker metadata table created.
Rows: 60,303


## 4. Test Query (SQL JOIN at Runtime)

In [5]:
# Test: Get AAPL metadata for a specific date
test_query = """
SELECT 
    t.Symbol,
    t.exchange,
    t.category,
    t.is_adr,
    t.sector,
    t.industry
FROM SharadarTickers t
WHERE t.Symbol = 'AAPL'
"""

result = pd.read_sql(test_query, conn)
print('Sample query result:')
print(result)

Sample query result:
  Symbol exchange               category  is_adr      sector  \
0   AAPL   NASDAQ  Domestic Common Stock       0  Technology   
1   AAPL   NASDAQ  Domestic Common Stock       0  Technology   

               industry  
0  Consumer Electronics  
1  Consumer Electronics  


## 5. Verify Data

In [6]:
# Check row count
count = pd.read_sql('SELECT COUNT(*) as count FROM SharadarTickers', conn)
print(f'Total rows: {count["count"][0]:,}')

# Check NYSE stocks
nyse = pd.read_sql("SELECT COUNT(*) as count FROM SharadarTickers WHERE exchange = 'NYSE'", conn)
print(f'NYSE stocks: {nyse["count"][0]:,}')

# Check Domestic Common Stock
domestic = pd.read_sql("SELECT COUNT(*) as count FROM SharadarTickers WHERE category = 'Domestic Common Stock'", conn)
print(f'Domestic Common Stock: {domestic["count"][0]:,}')

# Check ADRs
adr = pd.read_sql("SELECT COUNT(*) as count FROM SharadarTickers WHERE is_adr = 1", conn)
print(f'ADR stocks: {adr["count"][0]:,}')

# Check database file size
import os
size_mb = os.path.getsize(db_path) / (1024 ** 2)
print(f'\nDatabase size: {size_mb:.1f} MB')

conn.close()

Total rows: 60,303
NYSE stocks: 13,381
Domestic Common Stock: 26,649
ADR stocks: 4,354

Database size: 49543.9 MB


## 6. Summary

### What This Approach Does Differently:

**Old approach (slow):**
- 60K tickers × 4K dates = 241M rows
- Takes 10+ minutes to write
- Uses 10+ GB disk space
- 5-10 minutes to index

**New approach (fast):**
- Just 60K rows (ticker metadata only)
- Takes ~2 seconds to write
- Uses ~10 MB disk space
- ~1 second to index

### How to Use in Pipeline:

The custom loader will need to JOIN the ticker metadata with dates at query time:

```python
# In your custom loader, instead of:
SELECT * FROM SharadarTickersDaily WHERE Symbol = ? AND Date = ?

# Use:
SELECT t.* 
FROM SharadarTickers t
WHERE t.Symbol = ?
# Date filtering happens in Pipeline, metadata is static per symbol
```

### Database Class Definition:

```python
class SharadarTickers(Database):
    CODE = "fundamentals"
    LOOKBACK_WINDOW = 1  # Metadata doesn't change, use latest
    
    exchange = Column(str)
    category = Column(str)
    is_adr = Column(bool)
    location = Column(str)
    sector = Column(str)
    industry = Column(str)
    sicsector = Column(str)
    sicindustry = Column(str)
    scalemarketcap = Column(str)
```

### Usage in Pipeline:

```python
exchange = SharadarTickers.exchange.latest
category = SharadarTickers.category.latest
is_adr = SharadarTickers.is_adr.latest

base_universe = (
    exchange.in_(['NYSE', 'NASDAQ', 'NYSEMKT']) &
    (category == 'Domestic Common Stock') &
    ~is_adr
)
```