🍒 Cherry Picker - Historical Quantitative Trading System

A snapshot of retail algorithmic trading before it became mainstream

Overview • Architecture • Features • Documentation • Building • Historical Context

⚠️ Important Notice

This is a historical archive from 2006-2007, preserved for educational purposes.

This codebase contains outdated libraries with known security vulnerabilities and practices that were acceptable in 2006 but are now considered insecure. It is NOT suitable for production use without significant modernization.

Why archive this? To document the evolution of retail quantitative trading and demonstrate algorithmic approaches that predated their mainstream adoption.

📖 Overview

Cherry Picker is a Java-based automated stock trading analysis system built independently in 2006-2007. It implements ensemble forecasting using multiple statistical models, processes intraday market data at 15-second intervals, and generates professional PDF reports with time-series visualizations.

What Makes This Historically Interesting?

In 2006-2007, this represented cutting-edge retail trading technology:

Concept	Status in 2006	Industry Adoption
✅ Ensemble forecasting	Novel for retail traders	Mainstream by 2010s
✅ Adaptive rolling windows	Rare in personal systems	Standard by 2012+
✅ Automated trading systems	Primarily institutional	Retail adoption 2015+
✅ High-frequency data analysis	Innovative at 15-sec granularity	Common by 2018+
✅ Systematic approach	Growing among quants	Ubiquitous by 2020+

The achievement: Building a complete end-to-end quantitative pipeline (data → analysis → forecasting → visualization) as a solo developer, before algorithmic trading platforms became accessible to retail traders.

🏗️ Architecture

┌─────────────────────────────────────────────────────────────────┐
│                    Market Data Source                           │
│                  (SQL Server Database)                          │
│          15-second interval tick data with metadata             │
└────────────────┬────────────────────────────────────────────────┘
                 │
                 ▼
┌─────────────────────────────────────────────────────────────────┐
│              CpForecasting.java - Ensemble Engine               │
├─────────────────────────────────────────────────────────────────┤
│  ┌──────────────────┐  ┌──────────────────┐  ┌──────────────┐  │
│  │ Simple Exp.      │  │ Double Exp.      │  │ Moving Avg   │  │
│  │ Smoothing (SES)  │  │ Smoothing (DES)  │  │ Models (MAM) │  │
│  └──────────────────┘  └──────────────────┘  └──────────────┘  │
│  ┌──────────────────┐  ┌──────────────────┐                    │
│  │ Weighted Moving  │  │ Polynomial       │                    │
│  │ Averages (WMA)   │  │ Regression (BRR) │                    │
│  └──────────────────┘  └──────────────────┘                    │
│                                                                 │
│  • Rolling window adaptation (10-200 observations)             │
│  • Multi-horizon forecasting (5 periods ahead)                 │
│  • Custom weighting schemes                                    │
└────────────────┬────────────────────────────────────────────────┘
                 │
                 ▼
┌─────────────────────────────────────────────────────────────────┐
│               Predictions Database Table                        │
│          (cp_processed_results_st)                             │
│     Stores all model outputs for evaluation                     │
└────────────────┬────────────────────────────────────────────────┘
                 │
                 ▼
┌─────────────────────────────────────────────────────────────────┐
│          CpCharting.java - Report Generator                     │
├─────────────────────────────────────────────────────────────────┤
│  • JFreeChart: Time-series visualization                        │
│  • iText PDF: Multi-page document generation                   │
│  • 4 charts per page, 100 pages per file                        │
│  • Min/Max markers for entry/exit signals                       │
│  • Bookmark navigation by symbol                                │
└────────────────┬────────────────────────────────────────────────┘
                 │
                 ▼
┌─────────────────────────────────────────────────────────────────┐
│              Professional Trading Reports                       │
│                  (Multi-file PDFs)                             │
│        Ready for analysis and trading decisions                 │
└─────────────────────────────────────────────────────────────────┘

Component Details

1️⃣ CpForecasting.java - The Forecasting Engine

Purpose: Generate predictions using multiple statistical models
Input: Historical price data from cp_processed_results table
Processing:
- Maintains rolling windows of observations (configurable per model)
- Removes oldest observations as new data arrives (concept drift handling)
- Runs all models in parallel on same dataset
- Forecasts H=5 periods ahead (~75 seconds)
Output: Predictions stored in cp_processed_results_st
Lines of Code: ~545

2️⃣ CpCharting.java - The Visualization Engine

Purpose: Generate PDF reports with time-series charts
Input: Chart data from cp_chartvalues table
Processing:
- Queries all symbols and dates in database
- Generates 4 charts per page
- Adds markers for min/max price points
- Creates bookmarks for navigation
- Splits into multiple files at 100 pages
Output: Professional PDF reports (cpcharts1.pdf, cpcharts2.pdf, ...)
Lines of Code: ~605

3️⃣ Configuration System

Location: cpconfig/ directory
Format: XML files using SIXX serialization
Contents: 150+ configuration files containing:
- Symbol-specific parameters
- Trading time windows
- Price thresholds
- Skip flags for problematic symbols
Purpose: Fine-tune system behavior per stock

✨ Key Features

Ensemble Forecasting Methodology

Rather than relying on a single model, Cherry Picker runs 5 different algorithms simultaneously:

// Model implementations from CpForecasting.java

1. Simple Exponential Smoothing (SES)
   → Best for: Mean-reverting price action
   → α parameter auto-optimized via getBestFitModel()

2. Double Exponential Smoothing (DES)
   → Best for: Trending markets with momentum
   → Captures both level and rate of change

3. Moving Average Models (MAM)
   → Best for: Noise reduction in volatile data
   → Configurable periods: 10, 25, 50, 100, 150

4. Weighted Moving Averages (WMA)
   → Best for: Emphasizing recent observations
   → Custom weight schemes: [0.1754, 0.1754, 0.1754, 0.0877, ...]

5. Polynomial Regression (BRR)
   → Best for: Nonlinear pattern detection
   → 5th degree polynomial fitting

Why ensemble? Different models excel in different market conditions. Running all models provides:

Robustness against model-specific failures
Diversity of perspectives on price movements
Ability to select best performer ex-post

Adaptive Learning via Rolling Windows

// From CpForecasting.java, lines 274-305
if (ctr > (periodstouse + 5)) {
    DataPoint rdp = (DataPoint) observedData.toArray()[0];
    observedData.remove(rdp);  // Remove oldest observation

    // Re-index all points to maintain temporal consistency
    ctr = 0;
    Iterator it = observedData.iterator();
    while (it.hasNext()) {
        ctr = ctr + 1;
        Observation dpf = (Observation) it.next();
        dpf.setIndependentValue("t", ctr);
    }
}

This approach:

Automatically discards stale data
Maintains fixed-size observation window
Adapts to changing market conditions (concept drift)
Predates modern online learning frameworks by years

High-Granularity Data Processing

// Millisecond-precision timestamps
observations.add(new Millisecond(rs.getTimestamp("entrydate")),
                 rs.getFloat("value_value"));

Granularity: 15-second intervals
Precision: Millisecond timestamps
Volume: Thousands of observations per symbol per day
Context: Appropriate for 2006 retail trading (vs microsecond HFT today)

Automated Professional Reporting

// CpCharting.java generates multi-page PDFs with:
- 4 charts per page (optimized layout)
- 100 pages per file (manageable size)
- Custom pagination with headers/footers
- Bookmark navigation by symbol
- Min/Max price markers with color coding
- Automated filename generation

📚 Documentation

This repository includes comprehensive documentation:

Document	Purpose	Audience
README.md (this file)	Overview and getting started	Everyone
PROJECT_OVERVIEW.md	Deep technical analysis (10,000+ words)	Technical deep-dive
DATABASE_SCHEMA.md	Complete database design	Database/backend devs
CONFIGURATION.md	XML configuration format guide	Configuration management
SECURITY.md	Security considerations	Security-conscious users
CONTRIBUTING.md	Contribution guidelines	Contributors
LICENSE	MIT License + disclaimers	Legal compliance

🔧 Building & Running

Prerequisites

# Required software
- Java 1.4+ JDK (tested with 1.4/1.5)
- Microsoft SQL Server 2000/2005
- Windows OS (hardcoded paths, though fixable)

# Required libraries (all in javajars/)
- OpenForecast 0.4.0
- JFreeChart 1.0.1
- jcommon 1.0.5
- iText 1.4.3
- Microsoft JDBC drivers (sqljdbc.jar, mssqlserver.jar, msbase.jar, msutil.jar)

Database Setup

-- Create database
CREATE DATABASE cherrypicker;

-- Create tables (see DATABASE_SCHEMA.md for complete DDL)
-- Key tables: cp_processed_results, cp_processed_results_st, cp_chartvalues

-- Import sample data
-- (Historical data not included in repository)

Compilation

cd javajars

# Compile forecasting engine
javac -classpath "OpenForecast-0.4.0.jar:sqljdbc.jar:mssqlserver.jar:msbase.jar:msutil.jar:." \
  CpForecasting.java

# Compile charting engine
javac -classpath "itext-1.4.3.jar:jfreechart-1.0.1.jar:jcommon-1.0.5.jar:sqljdbc.jar:servlet.jar:." \
  CpCharting.java

Note: Use semicolons (;) instead of colons (:) as classpath separator on Windows.

Running

# Set database credentials via environment variables (recommended)
export DB_URL="jdbc:microsoft:sqlserver://localhost:1433;databasename=cherrypicker"
export DB_USER="your_username"
export DB_PASSWORD="your_password"
export OUTPUT_DIR="/path/to/output/"

# Run forecasting for date range
java -classpath "OpenForecast-0.4.0.jar:sqljdbc.jar:mssqlserver.jar:msbase.jar:msutil.jar:shiftone-jrat.jar:." \
  CpForecasting "10/25/05" "10/25/05"

# Generate PDF charts
java -classpath "itext-1.4.3.jar:jfreechart-1.0.1.jar:jcommon-1.0.5.jar:sqljdbc.jar:servlet.jar:." \
  CpCharting cpcharts

Output:

Forecasts written to cp_processed_results_st table
PDFs generated: cpcharts1.pdf, cpcharts2.pdf, etc.

📊 Technology Stack

Core Technologies

Language:   Java 1.4/1.5 (2002-2004 era)
Database:   Microsoft SQL Server 2000/2005
Platform:   Windows (hardcoded C:\ paths)
Build:      Manual javac (pre-Maven/Gradle)
VCS:        File-based backups (pre-Git ubiquity)

Dependencies

Library	Version	Purpose	License
OpenForecast	0.4.0	Statistical forecasting models	LGPL
JFreeChart	1.0.1	Chart generation	LGPL
jcommon	1.0.5	JFreeChart dependency	LGPL
iText	1.4.3	PDF generation	MPL/LGPL
MS JDBC Driver	1.0	Database connectivity	Proprietary

Note: These libraries are from 2005-2006 and have known security vulnerabilities. Dependencies are not included in this repository due to licensing. Users must obtain them independently.

🕰️ Historical Context

Market Environment (2006-2007)

Understanding the era this was built in:

Trading Landscape:

📈 Pre-financial crisis bull market
💰 Commission costs: $7-10/trade (vs $0 today)
🤖 Algorithmic trading: Primarily institutional
📊 Real-time data: Just becoming accessible to retail
🎯 Market efficiency: More exploitable patterns than today

Technology Landscape:

☕ Java 5 was cutting-edge (released 2004)
🗄️ SQL Server 2005 was latest version
💻 Most developers used Eclipse or NetBeans
📦 Maven was new (2004), Gradle didn't exist
🌐 Stack Overflow didn't exist until 2008
🔧 Git was brand new (2005), SVN was standard

Development Challenges:

Finding documentation for statistical forecasting
Managing classpath dependencies manually
Debugging JDBC connection issues without good tooling
No online communities for troubleshooting
Limited examples of retail algo trading systems

What Was Novel in 2006

✅ Ensemble methods in retail trading - Most retail traders used single indicators

✅ Adaptive learning - Rolling windows weren't standard practice

✅ Automated end-to-end pipeline - Most analysis was manual

✅ High-frequency data processing - 15-second granularity was ambitious

✅ Systematic approach - Data-driven decisions vs discretionary trading

🎯 What This Demonstrates

Technical Skills

Software Engineering:

✅ Object-oriented design in Java
✅ Complex library integration (12 dependencies)
✅ Database design for time-series data
✅ Multi-threaded data processing concepts
✅ File I/O and resource management

Quantitative Analysis:

✅ Statistical forecasting methodology
✅ Time-series analysis techniques
✅ Model parameter optimization
✅ Signal generation from predictions
✅ Understanding of financial market dynamics

Data Engineering:

✅ Schema design for high-frequency data
✅ ETL pipeline for market data
✅ Query optimization for large datasets
✅ Batch processing architecture

Domain Expertise:

✅ Market microstructure understanding
✅ Intraday price dynamics
✅ Statistical forecasting in finance
✅ Trading signal generation

Soft Skills

Self-Directed Learning:

Taught myself statistical forecasting while building system
Navigated complex library integration without Stack Overflow
Bridged multiple domains independently

End-to-End Ownership:

Requirements gathering (implicit)
System architecture and design
Implementation and testing
Deployment and operation

Problem-Solving:

Selected appropriate models for stock price data
Handled timestamp precision across layers
Optimized database queries for performance
Debugged complex issues in production

⚠️ Known Limitations

Being transparent about what could be improved:

Security Issues ❌

❌ Hardcoded database credentials (now removed for archive)
❌ SQL injection vulnerabilities in dynamic queries
❌ No input validation
❌ Outdated libraries with known CVEs
❌ No encryption for sensitive data

Architecture Limitations ❌

❌ No formal backtesting framework
❌ Missing risk management (stop-loss, position sizing)
❌ Transaction costs not modeled
❌ Single feature (price only, no volume/volatility)
❌ No systematic model selection logic
❌ Hardcoded parameters throughout

Code Quality Issues ❌

❌ Massive methods (490+ lines)
❌ Limited error handling
❌ Commented-out code left in place
❌ No unit tests
❌ Magic numbers
❌ Windows-specific paths

What I'd Do Differently Today

If rebuilding in 2025:

# Modern Python implementation

Technology Stack:
- Language: Python 3.12+
- Data: pandas, numpy
- ML: scikit-learn, TensorFlow/PyTorch
- Backtesting: Backtrader, Zipline
- Database: PostgreSQL or TimescaleDB
- Execution: Alpaca/IB API
- Infrastructure: Docker, Kubernetes

Architecture:
- Microservices for each component
- Event-driven with message queues
- Cloud-native (AWS Lambda, etc.)
- Real-time streaming (Kafka)
- Proper CI/CD pipeline

Improvements:
✅ Feature engineering (volume, volatility, sentiment)
✅ Deep learning models (LSTM, Transformers)
✅ Comprehensive backtesting with walk-forward
✅ Risk management (Kelly criterion, stop-losses)
✅ Transaction cost modeling
✅ Model selection via cross-validation
✅ A/B testing framework
✅ Monitoring and alerting
✅ Unit tests with >80% coverage

📈 Performance Considerations

Optimization Techniques Used

// 1. Prepared Statements (prevents SQL injection, improves performance)
insertForecast = connection.prepareStatement(
    "insert into cp_processed_results_st (...) values (?, ?, ?, ?, ?, ?, ?, 50, ?)"
);

// 2. Batch Processing (single query retrieves all data)
String searchUserQuery = "select ... order by symbol, trandate, entrydate";
ResultSet rs = stmt.executeQuery(searchUserQuery);

// 3. In-Memory Processing (no disk I/O during forecasting)
DataSet observedData = new DataSet();
model.init(observedData);

// 4. File Splitting (manageable PDF sizes)
if (page == 100) {
    document.close();
    // Create new file
}

Scalability Characteristics

Dimension	Capability	Bottleneck
Symbols	Sequential processing	Single-threaded
Date Range	Configurable	Memory for large ranges
Chart Generation	Thousands per run	Disk I/O for PDFs
Forecasting	Real-time capable	Model training time

Modern Improvement: Parallelize symbol processing, use distributed computing for backtesting.

🎓 Educational Value

What You Can Learn

1. Classical Time-Series Forecasting

Exponential smoothing theory and practice
Moving average techniques
Regression-based forecasting
Rolling window adaptation

2. Financial Data Processing

Handling tick-by-tick data
Time-series database design
Signal generation from predictions
Backtesting concepts (implicit)

3. Java Software Engineering (2006 era)

JDBC database connectivity
Third-party library integration
PDF generation techniques
Chart rendering pipelines

4. Evolution of Trading Technology

Compare 2006 approaches to 2025 methods
See what concepts remain relevant
Understand why certain practices were retired
Appreciate modern tooling improvements

Academic Use

This codebase could support:

Finance courses: Example of quantitative trading system
Statistics courses: Applied time-series forecasting
Software engineering courses: Legacy code analysis
History of computing: Evolution of retail FinTech

🤝 Contributing

Status: This is a historical archive and is not actively maintained.

❌ No bug fixes planned
❌ No feature additions
❌ No support provided
✅ Documentation improvements welcome
✅ Historical context additions welcome

If you find this interesting:

⭐ Star the repository
🔍 Study the code
💬 Open discussions for historical questions
📝 Cite in academic work if useful

See CONTRIBUTING.md for more details.

📜 License

This project is licensed under the MIT License - see the LICENSE file for details.

Important: This license includes additional disclaimers specific to this historical archive:

⚠️ Not suitable for production use
⚠️ Contains known security vulnerabilities
⚠️ No warranties or support
⚠️ Educational purposes only

Third-party libraries (OpenForecast, JFreeChart, iText, JDBC drivers) have their own licenses and are not included.

🙏 Acknowledgments

Built independently using:

OpenForecast library for statistical forecasting models
JFreeChart for professional chart generation
iText for PDF document creation
Microsoft SQL Server for data storage
Microsoft JDBC Driver for database connectivity

Special thanks to the open-source community of 2006 for making these libraries available.

📬 Contact & Links

Repository: https://github.com/kbadinger/cherry-picker

Author: Kevin Badinger

Created: 2006-2007 Archived: 2025 Purpose: Historical documentation and educational reference

Related Documentation:

🏆 Project Statistics

📊 Codebase Metrics:
   Lines of Code:        ~1,150
   Java Files:           2 primary classes
   Configuration Files:  150+ XML files
   Dependencies:         12 JAR files (~13MB)
   Database Tables:      3 primary tables

🎯 Functionality:
   Forecasting Models:   5 distinct algorithms
   Data Granularity:     15-second intervals
   Chart Layouts:        4 per page
   PDF Batch Size:       100 pages per file
   Date Range:           Configurable (tested 2005-2007)

⏱️ Development:
   Timeline:             2006-2007 (exact dates unknown)
   Team Size:            1 (solo developer)
   Architecture:         Designed from scratch
   Testing:              Manual validation via PDF reports

🌟 Star History

If you find this historical archive interesting or educational, please consider starring the repository!

A snapshot of retail algorithmic trading before it became mainstream

"Built when algorithmic trading was the domain of hedge funds, not hobbyists."

Made with ☕ in 2006-2007 | Archived with 📚 in 2025

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
cpconfig		cpconfig
docs		docs
javajars		javajars
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
Cherry Picker.ico		Cherry Picker.ico
LICENSE		LICENSE
PROJECT_OVERVIEW.md		PROJECT_OVERVIEW.md
README.md		README.md
SECURITY.md		SECURITY.md
cp_process_live.csv		cp_process_live.csv

Folders and files

Latest commit

History

Repository files navigation

🍒 Cherry Picker - Historical Quantitative Trading System

⚠️ Important Notice

📖 Overview

What Makes This Historically Interesting?

🏗️ Architecture

Component Details

1️⃣ CpForecasting.java - The Forecasting Engine

2️⃣ CpCharting.java - The Visualization Engine

3️⃣ Configuration System

✨ Key Features

Ensemble Forecasting Methodology

Adaptive Learning via Rolling Windows

High-Granularity Data Processing

Automated Professional Reporting

📚 Documentation

🔧 Building & Running

Prerequisites

Database Setup

Compilation

Running

📊 Technology Stack

Core Technologies

Dependencies

🕰️ Historical Context

Market Environment (2006-2007)

What Was Novel in 2006

🎯 What This Demonstrates

Technical Skills

Soft Skills

⚠️ Known Limitations

Security Issues ❌

Architecture Limitations ❌

Code Quality Issues ❌

What I'd Do Differently Today

📈 Performance Considerations

Optimization Techniques Used

Scalability Characteristics

🎓 Educational Value

What You Can Learn

Academic Use

🤝 Contributing

📜 License

🙏 Acknowledgments

📬 Contact & Links

🏆 Project Statistics

🌟 Star History

About

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages