Skip to content

kbadinger/cherry-picker

Repository files navigation

🍒 Cherry Picker - Historical Quantitative Trading System

Java Year Status License Purpose

A snapshot of retail algorithmic trading before it became mainstream

OverviewArchitectureFeaturesDocumentationBuildingHistorical Context


⚠️ Important Notice

This is a historical archive from 2006-2007, preserved for educational purposes.

This codebase contains outdated libraries with known security vulnerabilities and practices that were acceptable in 2006 but are now considered insecure. It is NOT suitable for production use without significant modernization.

Why archive this? To document the evolution of retail quantitative trading and demonstrate algorithmic approaches that predated their mainstream adoption.


📖 Overview

Cherry Picker is a Java-based automated stock trading analysis system built independently in 2006-2007. It implements ensemble forecasting using multiple statistical models, processes intraday market data at 15-second intervals, and generates professional PDF reports with time-series visualizations.

What Makes This Historically Interesting?

In 2006-2007, this represented cutting-edge retail trading technology:

Concept Status in 2006 Industry Adoption
✅ Ensemble forecasting Novel for retail traders Mainstream by 2010s
✅ Adaptive rolling windows Rare in personal systems Standard by 2012+
✅ Automated trading systems Primarily institutional Retail adoption 2015+
✅ High-frequency data analysis Innovative at 15-sec granularity Common by 2018+
✅ Systematic approach Growing among quants Ubiquitous by 2020+

The achievement: Building a complete end-to-end quantitative pipeline (data → analysis → forecasting → visualization) as a solo developer, before algorithmic trading platforms became accessible to retail traders.


🏗️ Architecture

┌─────────────────────────────────────────────────────────────────┐
│                    Market Data Source                           │
│                  (SQL Server Database)                          │
│          15-second interval tick data with metadata             │
└────────────────┬────────────────────────────────────────────────┘
                 │
                 ▼
┌─────────────────────────────────────────────────────────────────┐
│              CpForecasting.java - Ensemble Engine               │
├─────────────────────────────────────────────────────────────────┤
│  ┌──────────────────┐  ┌──────────────────┐  ┌──────────────┐  │
│  │ Simple Exp.      │  │ Double Exp.      │  │ Moving Avg   │  │
│  │ Smoothing (SES)  │  │ Smoothing (DES)  │  │ Models (MAM) │  │
│  └──────────────────┘  └──────────────────┘  └──────────────┘  │
│  ┌──────────────────┐  ┌──────────────────┐                    │
│  │ Weighted Moving  │  │ Polynomial       │                    │
│  │ Averages (WMA)   │  │ Regression (BRR) │                    │
│  └──────────────────┘  └──────────────────┘                    │
│                                                                 │
│  • Rolling window adaptation (10-200 observations)             │
│  • Multi-horizon forecasting (5 periods ahead)                 │
│  • Custom weighting schemes                                    │
└────────────────┬────────────────────────────────────────────────┘
                 │
                 ▼
┌─────────────────────────────────────────────────────────────────┐
│               Predictions Database Table                        │
│          (cp_processed_results_st)                             │
│     Stores all model outputs for evaluation                     │
└────────────────┬────────────────────────────────────────────────┘
                 │
                 ▼
┌─────────────────────────────────────────────────────────────────┐
│          CpCharting.java - Report Generator                     │
├─────────────────────────────────────────────────────────────────┤
│  • JFreeChart: Time-series visualization                        │
│  • iText PDF: Multi-page document generation                   │
│  • 4 charts per page, 100 pages per file                        │
│  • Min/Max markers for entry/exit signals                       │
│  • Bookmark navigation by symbol                                │
└────────────────┬────────────────────────────────────────────────┘
                 │
                 ▼
┌─────────────────────────────────────────────────────────────────┐
│              Professional Trading Reports                       │
│                  (Multi-file PDFs)                             │
│        Ready for analysis and trading decisions                 │
└─────────────────────────────────────────────────────────────────┘

Component Details

1️⃣ CpForecasting.java - The Forecasting Engine

  • Purpose: Generate predictions using multiple statistical models
  • Input: Historical price data from cp_processed_results table
  • Processing:
    • Maintains rolling windows of observations (configurable per model)
    • Removes oldest observations as new data arrives (concept drift handling)
    • Runs all models in parallel on same dataset
    • Forecasts H=5 periods ahead (~75 seconds)
  • Output: Predictions stored in cp_processed_results_st
  • Lines of Code: ~545

2️⃣ CpCharting.java - The Visualization Engine

  • Purpose: Generate PDF reports with time-series charts
  • Input: Chart data from cp_chartvalues table
  • Processing:
    • Queries all symbols and dates in database
    • Generates 4 charts per page
    • Adds markers for min/max price points
    • Creates bookmarks for navigation
    • Splits into multiple files at 100 pages
  • Output: Professional PDF reports (cpcharts1.pdf, cpcharts2.pdf, ...)
  • Lines of Code: ~605

3️⃣ Configuration System

  • Location: cpconfig/ directory
  • Format: XML files using SIXX serialization
  • Contents: 150+ configuration files containing:
    • Symbol-specific parameters
    • Trading time windows
    • Price thresholds
    • Skip flags for problematic symbols
  • Purpose: Fine-tune system behavior per stock

✨ Key Features

Ensemble Forecasting Methodology

Rather than relying on a single model, Cherry Picker runs 5 different algorithms simultaneously:

// Model implementations from CpForecasting.java

1. Simple Exponential Smoothing (SES)
   → Best for: Mean-reverting price actionα parameter auto-optimized via getBestFitModel()

2. Double Exponential Smoothing (DES)
   → Best for: Trending markets with momentumCaptures both level and rate of change

3. Moving Average Models (MAM)
   → Best for: Noise reduction in volatile dataConfigurable periods: 10, 25, 50, 100, 150

4. Weighted Moving Averages (WMA)
   → Best for: Emphasizing recent observationsCustom weight schemes: [0.1754, 0.1754, 0.1754, 0.0877, ...]

5. Polynomial Regression (BRR)
   → Best for: Nonlinear pattern detection5th degree polynomial fitting

Why ensemble? Different models excel in different market conditions. Running all models provides:

  • Robustness against model-specific failures
  • Diversity of perspectives on price movements
  • Ability to select best performer ex-post

Adaptive Learning via Rolling Windows

// From CpForecasting.java, lines 274-305
if (ctr > (periodstouse + 5)) {
    DataPoint rdp = (DataPoint) observedData.toArray()[0];
    observedData.remove(rdp);  // Remove oldest observation

    // Re-index all points to maintain temporal consistency
    ctr = 0;
    Iterator it = observedData.iterator();
    while (it.hasNext()) {
        ctr = ctr + 1;
        Observation dpf = (Observation) it.next();
        dpf.setIndependentValue("t", ctr);
    }
}

This approach:

  • Automatically discards stale data
  • Maintains fixed-size observation window
  • Adapts to changing market conditions (concept drift)
  • Predates modern online learning frameworks by years

High-Granularity Data Processing

// Millisecond-precision timestamps
observations.add(new Millisecond(rs.getTimestamp("entrydate")),
                 rs.getFloat("value_value"));
  • Granularity: 15-second intervals
  • Precision: Millisecond timestamps
  • Volume: Thousands of observations per symbol per day
  • Context: Appropriate for 2006 retail trading (vs microsecond HFT today)

Automated Professional Reporting

// CpCharting.java generates multi-page PDFs with:
- 4 charts per page (optimized layout)
- 100 pages per file (manageable size)
- Custom pagination with headers/footers
- Bookmark navigation by symbol
- Min/Max price markers with color coding
- Automated filename generation

📚 Documentation

This repository includes comprehensive documentation:

Document Purpose Audience
README.md (this file) Overview and getting started Everyone
PROJECT_OVERVIEW.md Deep technical analysis (10,000+ words) Technical deep-dive
DATABASE_SCHEMA.md Complete database design Database/backend devs
CONFIGURATION.md XML configuration format guide Configuration management
SECURITY.md Security considerations Security-conscious users
CONTRIBUTING.md Contribution guidelines Contributors
LICENSE MIT License + disclaimers Legal compliance

🔧 Building & Running

Prerequisites

# Required software
- Java 1.4+ JDK (tested with 1.4/1.5)
- Microsoft SQL Server 2000/2005
- Windows OS (hardcoded paths, though fixable)

# Required libraries (all in javajars/)
- OpenForecast 0.4.0
- JFreeChart 1.0.1
- jcommon 1.0.5
- iText 1.4.3
- Microsoft JDBC drivers (sqljdbc.jar, mssqlserver.jar, msbase.jar, msutil.jar)

Database Setup

-- Create database
CREATE DATABASE cherrypicker;

-- Create tables (see DATABASE_SCHEMA.md for complete DDL)
-- Key tables: cp_processed_results, cp_processed_results_st, cp_chartvalues

-- Import sample data
-- (Historical data not included in repository)

Compilation

cd javajars

# Compile forecasting engine
javac -classpath "OpenForecast-0.4.0.jar:sqljdbc.jar:mssqlserver.jar:msbase.jar:msutil.jar:." \
  CpForecasting.java

# Compile charting engine
javac -classpath "itext-1.4.3.jar:jfreechart-1.0.1.jar:jcommon-1.0.5.jar:sqljdbc.jar:servlet.jar:." \
  CpCharting.java

Note: Use semicolons (;) instead of colons (:) as classpath separator on Windows.

Running

# Set database credentials via environment variables (recommended)
export DB_URL="jdbc:microsoft:sqlserver://localhost:1433;databasename=cherrypicker"
export DB_USER="your_username"
export DB_PASSWORD="your_password"
export OUTPUT_DIR="/path/to/output/"

# Run forecasting for date range
java -classpath "OpenForecast-0.4.0.jar:sqljdbc.jar:mssqlserver.jar:msbase.jar:msutil.jar:shiftone-jrat.jar:." \
  CpForecasting "10/25/05" "10/25/05"

# Generate PDF charts
java -classpath "itext-1.4.3.jar:jfreechart-1.0.1.jar:jcommon-1.0.5.jar:sqljdbc.jar:servlet.jar:." \
  CpCharting cpcharts

Output:

  • Forecasts written to cp_processed_results_st table
  • PDFs generated: cpcharts1.pdf, cpcharts2.pdf, etc.

📊 Technology Stack

Core Technologies

Language:   Java 1.4/1.5 (2002-2004 era)
Database:   Microsoft SQL Server 2000/2005
Platform:   Windows (hardcoded C:\ paths)
Build:      Manual javac (pre-Maven/Gradle)
VCS:        File-based backups (pre-Git ubiquity)

Dependencies

Library Version Purpose License
OpenForecast 0.4.0 Statistical forecasting models LGPL
JFreeChart 1.0.1 Chart generation LGPL
jcommon 1.0.5 JFreeChart dependency LGPL
iText 1.4.3 PDF generation MPL/LGPL
MS JDBC Driver 1.0 Database connectivity Proprietary

Note: These libraries are from 2005-2006 and have known security vulnerabilities. Dependencies are not included in this repository due to licensing. Users must obtain them independently.


🕰️ Historical Context

Market Environment (2006-2007)

Understanding the era this was built in:

Trading Landscape:

  • 📈 Pre-financial crisis bull market
  • 💰 Commission costs: $7-10/trade (vs $0 today)
  • 🤖 Algorithmic trading: Primarily institutional
  • 📊 Real-time data: Just becoming accessible to retail
  • 🎯 Market efficiency: More exploitable patterns than today

Technology Landscape:

  • ☕ Java 5 was cutting-edge (released 2004)
  • 🗄️ SQL Server 2005 was latest version
  • 💻 Most developers used Eclipse or NetBeans
  • 📦 Maven was new (2004), Gradle didn't exist
  • 🌐 Stack Overflow didn't exist until 2008
  • 🔧 Git was brand new (2005), SVN was standard

Development Challenges:

  • Finding documentation for statistical forecasting
  • Managing classpath dependencies manually
  • Debugging JDBC connection issues without good tooling
  • No online communities for troubleshooting
  • Limited examples of retail algo trading systems

What Was Novel in 2006

Ensemble methods in retail trading - Most retail traders used single indicators

Adaptive learning - Rolling windows weren't standard practice

Automated end-to-end pipeline - Most analysis was manual

High-frequency data processing - 15-second granularity was ambitious

Systematic approach - Data-driven decisions vs discretionary trading


🎯 What This Demonstrates

Technical Skills

Software Engineering:

  • ✅ Object-oriented design in Java
  • ✅ Complex library integration (12 dependencies)
  • ✅ Database design for time-series data
  • ✅ Multi-threaded data processing concepts
  • ✅ File I/O and resource management

Quantitative Analysis:

  • ✅ Statistical forecasting methodology
  • ✅ Time-series analysis techniques
  • ✅ Model parameter optimization
  • ✅ Signal generation from predictions
  • ✅ Understanding of financial market dynamics

Data Engineering:

  • ✅ Schema design for high-frequency data
  • ✅ ETL pipeline for market data
  • ✅ Query optimization for large datasets
  • ✅ Batch processing architecture

Domain Expertise:

  • ✅ Market microstructure understanding
  • ✅ Intraday price dynamics
  • ✅ Statistical forecasting in finance
  • ✅ Trading signal generation

Soft Skills

Self-Directed Learning:

  • Taught myself statistical forecasting while building system
  • Navigated complex library integration without Stack Overflow
  • Bridged multiple domains independently

End-to-End Ownership:

  • Requirements gathering (implicit)
  • System architecture and design
  • Implementation and testing
  • Deployment and operation

Problem-Solving:

  • Selected appropriate models for stock price data
  • Handled timestamp precision across layers
  • Optimized database queries for performance
  • Debugged complex issues in production

⚠️ Known Limitations

Being transparent about what could be improved:

Security Issues ❌

❌ Hardcoded database credentials (now removed for archive)
❌ SQL injection vulnerabilities in dynamic queries
❌ No input validation
❌ Outdated libraries with known CVEs
❌ No encryption for sensitive data

Architecture Limitations ❌

❌ No formal backtesting framework
❌ Missing risk management (stop-loss, position sizing)
❌ Transaction costs not modeled
❌ Single feature (price only, no volume/volatility)
❌ No systematic model selection logic
❌ Hardcoded parameters throughout

Code Quality Issues ❌

❌ Massive methods (490+ lines)
❌ Limited error handling
❌ Commented-out code left in place
❌ No unit tests
❌ Magic numbers
❌ Windows-specific paths

What I'd Do Differently Today

If rebuilding in 2025:

# Modern Python implementation

Technology Stack:
- Language: Python 3.12+
- Data: pandas, numpy
- ML: scikit-learn, TensorFlow/PyTorch
- Backtesting: Backtrader, Zipline
- Database: PostgreSQL or TimescaleDB
- Execution: Alpaca/IB API
- Infrastructure: Docker, Kubernetes

Architecture:
- Microservices for each component
- Event-driven with message queues
- Cloud-native (AWS Lambda, etc.)
- Real-time streaming (Kafka)
- Proper CI/CD pipeline

Improvements:
✅ Feature engineering (volume, volatility, sentiment)
✅ Deep learning models (LSTM, Transformers)
✅ Comprehensive backtesting with walk-forwardRisk management (Kelly criterion, stop-losses)
✅ Transaction cost modelingModel selection via cross-validationA/B testing frameworkMonitoring and alertingUnit tests with >80% coverage

📈 Performance Considerations

Optimization Techniques Used

// 1. Prepared Statements (prevents SQL injection, improves performance)
insertForecast = connection.prepareStatement(
    "insert into cp_processed_results_st (...) values (?, ?, ?, ?, ?, ?, ?, 50, ?)"
);

// 2. Batch Processing (single query retrieves all data)
String searchUserQuery = "select ... order by symbol, trandate, entrydate";
ResultSet rs = stmt.executeQuery(searchUserQuery);

// 3. In-Memory Processing (no disk I/O during forecasting)
DataSet observedData = new DataSet();
model.init(observedData);

// 4. File Splitting (manageable PDF sizes)
if (page == 100) {
    document.close();
    // Create new file
}

Scalability Characteristics

Dimension Capability Bottleneck
Symbols Sequential processing Single-threaded
Date Range Configurable Memory for large ranges
Chart Generation Thousands per run Disk I/O for PDFs
Forecasting Real-time capable Model training time

Modern Improvement: Parallelize symbol processing, use distributed computing for backtesting.


🎓 Educational Value

What You Can Learn

1. Classical Time-Series Forecasting

  • Exponential smoothing theory and practice
  • Moving average techniques
  • Regression-based forecasting
  • Rolling window adaptation

2. Financial Data Processing

  • Handling tick-by-tick data
  • Time-series database design
  • Signal generation from predictions
  • Backtesting concepts (implicit)

3. Java Software Engineering (2006 era)

  • JDBC database connectivity
  • Third-party library integration
  • PDF generation techniques
  • Chart rendering pipelines

4. Evolution of Trading Technology

  • Compare 2006 approaches to 2025 methods
  • See what concepts remain relevant
  • Understand why certain practices were retired
  • Appreciate modern tooling improvements

Academic Use

This codebase could support:

  • Finance courses: Example of quantitative trading system
  • Statistics courses: Applied time-series forecasting
  • Software engineering courses: Legacy code analysis
  • History of computing: Evolution of retail FinTech

🤝 Contributing

Status: This is a historical archive and is not actively maintained.

  • ❌ No bug fixes planned
  • ❌ No feature additions
  • ❌ No support provided
  • ✅ Documentation improvements welcome
  • ✅ Historical context additions welcome

If you find this interesting:

  • ⭐ Star the repository
  • 🔍 Study the code
  • 💬 Open discussions for historical questions
  • 📝 Cite in academic work if useful

See CONTRIBUTING.md for more details.


📜 License

This project is licensed under the MIT License - see the LICENSE file for details.

Important: This license includes additional disclaimers specific to this historical archive:

  • ⚠️ Not suitable for production use
  • ⚠️ Contains known security vulnerabilities
  • ⚠️ No warranties or support
  • ⚠️ Educational purposes only

Third-party libraries (OpenForecast, JFreeChart, iText, JDBC drivers) have their own licenses and are not included.


🙏 Acknowledgments

Built independently using:

  • OpenForecast library for statistical forecasting models
  • JFreeChart for professional chart generation
  • iText for PDF document creation
  • Microsoft SQL Server for data storage
  • Microsoft JDBC Driver for database connectivity

Special thanks to the open-source community of 2006 for making these libraries available.


📬 Contact & Links

Repository: https://github.com/kbadinger/cherry-picker

Author: Kevin Badinger

Created: 2006-2007 Archived: 2025 Purpose: Historical documentation and educational reference

Related Documentation:


🏆 Project Statistics

📊 Codebase Metrics:
   Lines of Code:        ~1,150
   Java Files:           2 primary classes
   Configuration Files:  150+ XML files
   Dependencies:         12 JAR files (~13MB)
   Database Tables:      3 primary tables

🎯 Functionality:
   Forecasting Models:   5 distinct algorithms
   Data Granularity:     15-second intervals
   Chart Layouts:        4 per page
   PDF Batch Size:       100 pages per file
   Date Range:           Configurable (tested 2005-2007)

⏱️ Development:
   Timeline:             2006-2007 (exact dates unknown)
   Team Size:            1 (solo developer)
   Architecture:         Designed from scratch
   Testing:              Manual validation via PDF reports

🌟 Star History

If you find this historical archive interesting or educational, please consider starring the repository!


A snapshot of retail algorithmic trading before it became mainstream

"Built when algorithmic trading was the domain of hedge funds, not hobbyists."


MIT License Java Historical Educational

Made with ☕ in 2006-2007 | Archived with 📚 in 2025

About

No description, website, or topics provided.

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages