A simple tool for downloading and storing historical equity data in a local SQLite database. Currently supports equities and major indices via YFinance.
- Downloads historical OHLCV data from YFinance
- Stores it in a centralized SQLite database (
~/.finbase/timeseries.db) - Tracks index constituents (S&P 500, DOW 30, NASDAQ-100, FTSE 100, DAX)
- Provides a Python API for querying the data
- Index Constituent Tracking: Historical point-in-time composition for 5 major indices
- Smart Loading: Skips existing data, resumable downloads
- Rate Limiting: Conservative throttling to avoid hitting YFinance limits
- DataClient API: Simple read API for use by other projects
| Index | Constituents | Country | Data Source |
|---|---|---|---|
| S&P 500 | 503 | πΊπΈ US | Wikipedia |
| DOW 30 | 30 | πΊπΈ US | Wikipedia |
| NASDAQ-100 | 101 | πΊπΈ US | Wikipedia |
| FTSE 100 | 100 | π¬π§ UK | Wikipedia |
| DAX | 41 | π©πͺ Germany | Wikipedia |
- YFinance: Equity and index data
git clone https://github.com/shoom1/finbase.git
cd finbase
conda env create -f environment.yml
conda activate finbasegit clone https://github.com/shoom1/finbase.git
cd finbase
pip install -e .
# Or with extras
pip install -e ".[dev,dashboard]"# Creates ~/.finbase/timeseries.db and ~/.finbaserc
python scripts/setup_database.py --init# Get current index memberships from Wikipedia
python scripts/setup_database.py --update-index SP500
python scripts/setup_database.py --update-index DOW30
# Or update all at once
python scripts/setup_database.py --update-all-indices# Load price data for all DOW30 constituents
python scripts/setup_database.py --load-index-data DOW30
# Load SP500 from 2020 (faster than full history)
python scripts/setup_database.py --load-index-data SP500 --index-start-date 2020-01-01
# Test with first 10 stocks
python scripts/setup_database.py --load-index-data SP500 --index-max-symbols 10from finbase import DataClient
client = DataClient()
# Get closing prices for portfolio
portfolio = ['AAPL', 'MSFT', 'GOOGL', 'AMZN']
prices = client.get_closes(portfolio, start='2020-01-01')
# Get all DOW30 constituents
dow30 = client.get_index_constituents('DOW30')
dow30_prices = client.get_closes(dow30['symbol'].tolist())
# Calculate returns
returns = prices.pct_change()finbase/
βββ src/ # Source code
β βββ client/ # DataClient API for external projects
β βββ config/ # Configuration management
β βββ data/
β β βββ database/ # TimeSeriesDB, IndexDB, schema
β β βββ loaders/ # EquityLoader (YFinance)
β β βββ parsers/ # Wikipedia parsers
β β βββ risk_factor_groups/ # Risk factor group management
β β βββ validators/ # Data validation
β βββ dashboard/ # Optional Streamlit dashboard
β βββ utils/ # Logging utilities
β
βββ scripts/ # Command-line scripts
β βββ setup_database.py # Main data loading script
β
βββ data/ # Data files (created on init)
β βββ risk_factor_groups/ # JSON group definitions
β βββ index_configs/ # Index configuration files
β
βββ examples/ # Usage examples
β βββ client_api_examples.py
β βββ index_management_example.py
β βββ load_index_data_example.py
β
βββ tests/ # Unit tests
βββ docs/ # Quick start guides
User space (created on init):
~/.finbase/
βββ timeseries.db # SQLite database (shared with other projects)
~/.finbaserc # User configuration (YAML)
- QUICK_START_INDEX_DATA.md - Loading index data guide
- QUICKSTART_INDEX_MANAGEMENT.md - Managing indices
- DASHBOARD.md - Running the web dashboard
- CHANGELOG.md - Version history
risk_factors: Master table with metadata
- symbol, asset_class, asset_subclass
- description, country, currency, sector
- data_source (yfinance, fred, etc.)
- frequency, start_date, end_date
timeseries_data: OHLCV price data
- risk_factor_id (FK), date
- open, high, low, close, adj_close, volume
- Optimized indexes for fast queries
indices: Index metadata
- index_code, index_name, country
- data_source, last_updated
index_constituents: Temporal membership tracking
- index_id, symbol, effective_date, end_date
- Slowly changing dimension pattern for historical queries
The recommended way to access data from external projects:
from finbase import DataClient
client = DataClient()
# Discovery
stats = client.get_stats()
symbols = client.list_symbols(asset_class='equity', sector='Technology')
info = client.get_symbol_info('AAPL')
# Data Retrieval (long format)
df = client.get_data(['AAPL', 'MSFT'], start='2020-01-01')
# Data Retrieval (wide format for analysis)
prices = client.get_closes(['AAPL', 'MSFT'], start='2020-01-01')
# Index Queries
sp500 = client.get_index_constituents('SP500')
sp500_2020 = client.get_index_constituents('SP500', as_of_date='2020-01-01')
# Bulk Retrieval
tech_stocks = client.get_by_sector('Technology')See examples/client_api_examples.py for comprehensive usage.
Create a config file in data/index_configs/:
{
"index_code": "FTSE250",
"index_name": "FTSE 250",
"url": "https://en.wikipedia.org/wiki/FTSE_250_Index",
"country": "GB",
"asset_class": "equity",
"data_source": "wikipedia",
"constituents_table": {
"table_index": 2,
"column_mapping": {
"Company": "company_name",
"Ticker": "symbol"
}
}
}Then run: python scripts/setup_database.py --update-index FTSE250
Create JSON files in data/risk_factor_groups/:
{
"group_name": "tech_giants",
"asset_class": "equity",
"asset_subclass": "stock",
"data_source": "yfinance",
"frequency": "daily",
"risk_factors": [
{
"symbol": "AAPL",
"description": "Apple Inc.",
"country": "US",
"currency": "USD",
"sector": "Technology"
}
]
}# Install dashboard dependencies
pip install -e ".[dashboard]"
# Run Streamlit dashboard
streamlit run dashboard_app.py# Install dev dependencies
pip install -e ".[dev]"
# Run tests
pytest
# With coverage
pytest --cov=src tests/- Core database system
- Index management (5 major indices)
- DataClient API
- Smart loading with rate limiting
- Streamlit dashboard
This project is licensed under the MIT License - see the LICENSE file for details.