# China Stock Data Library - Comprehensive Example

This notebook demonstrates how to use the China Stock Data library after the recent refactoring. 

## Features Covered:
- Stock historical data fetching
- Real-time stock data
- Stock information
- Chip distribution analysis
- Index data and components
- Market sentiment analysis
- Data visualization and analysis

Let's explore the capabilities of our improved stock data fetching system!

## 1. Import Required Libraries

First, let's import all the necessary libraries for data analysis and visualization.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime, timedelta
import warnings
warnings.filterwarnings('ignore')

# Import our refactored stock data library
from china_stock_data import StockData, StockMarket
from china_stock_data.fetchers import stock_fetchers, index_fetchers, market_fetchers

# Configure plotting
plt.rcParams['figure.figsize'] = (12, 6)
plt.rcParams['font.sans-serif'] = ['SimHei']  # Support Chinese characters
plt.rcParams['axes.unicode_minus'] = False

print("All libraries imported successfully!")
print(f"Available stock fetchers: {[f.name for f in stock_fetchers]}")
print(f"Available index fetchers: {[f.name for f in index_fetchers]}")
print(f"Available market fetchers: {[f.name for f in market_fetchers]}")

## 2. Basic Stock Data Usage

Let's start by fetching data for a popular stock - Moutai (贵州茅台) with symbol "600519".

In [None]:
# Create a StockData instance for Moutai
moutai = StockData(
    symbol="600519",
    start_date="2023-01-01",
    end_date="2023-12-31",
    period="daily",
    adjust="qfq"  # Forward adjusted
)

print(f"Created StockData instance for: {moutai.symbol}")
print(f"Date range: {moutai.start_date} to {moutai.end_date}")
print(f"Available fetchers: {list(moutai.fetchers.keys())}")

### 2.1 Historical Price Data (K-line Data)

Get historical price data including OHLCV and calculated technical indicators.

In [None]:
# Get historical data using the new fetcher system
hist_data = moutai.get_data("kline")

print("Historical Data Shape:", hist_data.shape)
print("\nColumn Names:", list(hist_data.columns))
print("\nFirst 5 rows:")
print(hist_data.head())

print("\nLast 5 rows:")
print(hist_data.tail())

# Access calculated factors using the improved fetcher
print(f"\nCalculated Factors:")
try:
    print(f"Highest Price: {moutai['最高']}")
    print(f"Lowest Price: {moutai['最低']}")
    print(f"Average Price: {moutai['平均']}")
    print(f"Weighted Average: {moutai['加权平均']}")
    print(f"Price Change: {moutai['涨跌额']}")
    print(f"Price Change %: {moutai['涨跌幅']:.2f}%")
except KeyError as e:
    print(f"Factor not available: {e}")

In [None]:
# Create price visualization
if not hist_data.empty:
    plt.figure(figsize=(14, 8))
    
    # Convert date column if needed
    if '日期' in hist_data.columns:
        hist_data['日期'] = pd.to_datetime(hist_data['日期'])
        date_col = '日期'
    else:
        date_col = hist_data.index
    
    # Plot price data
    plt.subplot(2, 1, 1)
    plt.plot(hist_data[date_col], hist_data['收盘'], label='Close Price', linewidth=2)
    plt.plot(hist_data[date_col], hist_data['开盘'], label='Open Price', alpha=0.7)
    plt.plot(hist_data[date_col], hist_data['最高'], label='High Price', alpha=0.5)
    plt.plot(hist_data[date_col], hist_data['最低'], label='Low Price', alpha=0.5)
    plt.title('Moutai (600519) - Price Trends', fontsize=14)
    plt.ylabel('Price (CNY)')
    plt.legend()
    plt.grid(True, alpha=0.3)
    
    # Plot volume
    plt.subplot(2, 1, 2)
    plt.bar(hist_data[date_col], hist_data['成交量'], alpha=0.7, color='orange')
    plt.title('Trading Volume')
    plt.ylabel('Volume')
    plt.xlabel('Date')
    plt.xticks(rotation=45)
    plt.grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.show()
else:
    print("No historical data available for plotting")

### 2.2 Stock Information

Get detailed company information and financial metrics.

In [None]:
# Get stock information
info_data = moutai.get_data("info")

print("Stock Information:")
print("=" * 50)
if not info_data.empty:
    print(info_data.to_string())
    
    # Display key metrics if available
    if len(info_data) > 0:
        print("\nKey Information Summary:")
        print("-" * 30)
        for index, row in info_data.iterrows():
            print(f"{row.iloc[0]}: {row.iloc[1]}")
else:
    print("No stock information available")

### 2.3 Real-time Trading Data

Get current bid/ask prices and trading information.

In [None]:
# Get real-time bid/ask data
realtime_data = moutai.get_data("bid_ask")

print("Real-time Trading Data:")
print("=" * 50)
if not realtime_data.empty:
    print(realtime_data.to_string())
    
    # Create a simple visualization of bid/ask spread
    if '买一' in realtime_data.columns and '卖一' in realtime_data.columns:
        bid_price = realtime_data['买一'].iloc[0] if len(realtime_data) > 0 else 0
        ask_price = realtime_data['卖一'].iloc[0] if len(realtime_data) > 0 else 0
        spread = ask_price - bid_price
        
        print(f"\nBid-Ask Analysis:")
        print(f"Bid Price: ¥{bid_price}")
        print(f"Ask Price: ¥{ask_price}")
        print(f"Spread: ¥{spread:.3f}")
        print(f"Spread %: {(spread/bid_price)*100:.3f}%" if bid_price > 0 else "N/A")
else:
    print("No real-time data available (may be outside trading hours)")

### 2.4 Chip Distribution Analysis

Analyze the chip distribution for technical analysis insights.

In [None]:
# Get chip distribution data
chip_data = moutai.get_data("chip")

print("Chip Distribution Data:")
print("=" * 50)
if not chip_data.empty:
    print(f"Data shape: {chip_data.shape}")
    print(f"Columns: {list(chip_data.columns)}")
    print("\nFirst 10 rows:")
    print(chip_data.head(10))
    
    # Visualize chip distribution if data is available
    if len(chip_data) > 0:
        plt.figure(figsize=(12, 6))
        
        # Plot chip distribution
        if '价格' in chip_data.columns and '成本分布' in chip_data.columns:
            plt.subplot(1, 2, 1)
            plt.plot(chip_data['价格'], chip_data['成本分布'])
            plt.title('Chip Distribution by Price')
            plt.xlabel('Price (CNY)')
            plt.ylabel('Chip Distribution')
            plt.grid(True, alpha=0.3)
        
        # Plot any other relevant columns
        if len(chip_data.columns) > 2:
            plt.subplot(1, 2, 2)
            for i, col in enumerate(chip_data.columns[2:5]):  # Plot first 3 additional columns
                plt.plot(chip_data.index, chip_data[col], label=col)
            plt.title('Additional Chip Metrics')
            plt.xlabel('Index')
            plt.ylabel('Value')
            plt.legend()
            plt.grid(True, alpha=0.3)
        
        plt.tight_layout()
        plt.show()
else:
    print("No chip distribution data available")

## 3. Market and Index Data

Now let's explore the market-level data using our improved StockMarket class.

In [None]:
# Create StockMarket instance
market = StockMarket(symbol="SH000001", index="000300")  # CSI 300 index

print(f"Created StockMarket instance")
print(f"Symbol: {market.symbol}")
print(f"Index: {market.index}")
print(f"Key: {market.key()}")
print(f"Available fetchers: {list(market.fetchers.keys())}")

### 3.1 Index Components

Get the constituent stocks of an index (e.g., CSI 300).

In [None]:
# Get index components
components = market.get_data("index_components")

print("Index Components Data:")
print("=" * 50)
if not components.empty:
    print(f"Total components: {len(components)}")
    print(f"Columns: {list(components.columns)}")
    print("\nFirst 10 components:")
    print(components.head(10))
    
    # Analyze sector distribution if available
    if '行业' in components.columns:
        sector_counts = components['行业'].value_counts()
        print(f"\nTop 10 sectors by stock count:")
        print(sector_counts.head(10))
        
        # Visualize sector distribution
        plt.figure(figsize=(12, 6))
        plt.subplot(1, 2, 1)
        sector_counts.head(10).plot(kind='bar')
        plt.title('Top 10 Sectors in Index')
        plt.xlabel('Sector')
        plt.ylabel('Number of Stocks')
        plt.xticks(rotation=45)
        
        # Pie chart for top 5 sectors
        plt.subplot(1, 2, 2)
        plt.pie(sector_counts.head(5), labels=sector_counts.head(5).index, autopct='%1.1f%%')
        plt.title('Top 5 Sectors Distribution')
        
        plt.tight_layout()
        plt.show()
else:
    print("No index components data available")

### 3.2 Market Sentiment Analysis

Analyze market sentiment using news sentiment data (new feature!).

In [None]:
# Get market sentiment data
sentiment_data = market.get_data("market_sentiment")

print("Market Sentiment Data:")
print("=" * 50)
if not sentiment_data.empty:
    print(f"Data shape: {sentiment_data.shape}")
    print(f"Columns: {list(sentiment_data.columns)}")
    print("\nSample data:")
    print(sentiment_data.head())
    
    # Visualize sentiment trends if date column exists
    if '日期' in sentiment_data.columns or 'date' in sentiment_data.columns:
        date_col = '日期' if '日期' in sentiment_data.columns else 'date'
        sentiment_data[date_col] = pd.to_datetime(sentiment_data[date_col])
        
        # Find sentiment-related columns
        sentiment_cols = [col for col in sentiment_data.columns if 
                         any(keyword in col.lower() for keyword in ['sentiment', '情绪', 'score', '指数'])]
        
        if sentiment_cols:
            plt.figure(figsize=(14, 6))
            for col in sentiment_cols[:3]:  # Plot first 3 sentiment columns
                plt.plot(sentiment_data[date_col], sentiment_data[col], label=col, marker='o')
            
            plt.title('Market Sentiment Trends')
            plt.xlabel('Date')
            plt.ylabel('Sentiment Score')
            plt.legend()
            plt.grid(True, alpha=0.3)
            plt.xticks(rotation=45)
            plt.tight_layout()
            plt.show()
            
            # Calculate sentiment statistics
            print(f"\nSentiment Statistics:")
            for col in sentiment_cols:
                print(f"{col}:")
                print(f"  Mean: {sentiment_data[col].mean():.3f}")
                print(f"  Std:  {sentiment_data[col].std():.3f}")
                print(f"  Min:  {sentiment_data[col].min():.3f}")
                print(f"  Max:  {sentiment_data[col].max():.3f}")
else:
    print("No market sentiment data available")

### 3.3 US Stock Index Data

Get US market index data for international comparison.

In [None]:
# Get US index data (S&P 500)
us_index_data = market.get_data("us_index")

print("US Index Data (S&P 500):")
print("=" * 50)
if not us_index_data.empty:
    print(f"Data shape: {us_index_data.shape}")
    print(f"Columns: {list(us_index_data.columns)}")
    print("\nLatest data:")
    print(us_index_data.tail())
    
    # Plot US index trends
    if '日期' in us_index_data.columns or 'date' in us_index_data.columns:
        date_col = '日期' if '日期' in us_index_data.columns else 'date'
        us_index_data[date_col] = pd.to_datetime(us_index_data[date_col])
        
        plt.figure(figsize=(14, 6))
        
        # Plot price data
        price_cols = [col for col in us_index_data.columns if 
                     any(keyword in col for keyword in ['close', '收盘', 'price', '价格'])]
        
        if price_cols:
            plt.subplot(1, 2, 1)
            plt.plot(us_index_data[date_col], us_index_data[price_cols[0]], 
                    linewidth=2, color='blue', label='S&P 500')
            plt.title('S&P 500 Index Trend')
            plt.xlabel('Date')
            plt.ylabel('Index Value')
            plt.grid(True, alpha=0.3)
            plt.xticks(rotation=45)
            
        # Plot volume if available
        volume_cols = [col for col in us_index_data.columns if 
                      any(keyword in col for keyword in ['volume', '成交量', 'vol'])]
        
        if volume_cols:
            plt.subplot(1, 2, 2)
            plt.plot(us_index_data[date_col], us_index_data[volume_cols[0]], 
                    color='orange', label='Volume')
            plt.title('S&P 500 Trading Volume')
            plt.xlabel('Date')
            plt.ylabel('Volume')
            plt.grid(True, alpha=0.3)
            plt.xticks(rotation=45)
        
        plt.tight_layout()
        plt.show()
        
        # Calculate performance metrics
        if price_cols:
            latest_price = us_index_data[price_cols[0]].iloc[-1]
            first_price = us_index_data[price_cols[0]].iloc[0]
            return_pct = ((latest_price - first_price) / first_price) * 100
            
            print(f"\nS&P 500 Performance:")
            print(f"Period Return: {return_pct:.2f}%")
            print(f"Latest Value: {latest_price:.2f}")
else:
    print("No US index data available")

## 4. Advanced Analysis Examples

Let's perform some advanced analysis using the data we've collected.

### 4.1 Technical Indicator Calculation

Calculate common technical indicators from the historical data.

In [None]:
# Calculate technical indicators
def calculate_sma(data, window):
    """Simple Moving Average"""
    return data.rolling(window=window).mean()

def calculate_rsi(data, window=14):
    """Relative Strength Index"""
    delta = data.diff()
    gain = (delta.where(delta > 0, 0)).rolling(window=window).mean()
    loss = (-delta.where(delta < 0, 0)).rolling(window=window).mean()
    rs = gain / loss
    return 100 - (100 / (1 + rs))

def calculate_bollinger_bands(data, window=20, num_std=2):
    """Bollinger Bands"""
    sma = calculate_sma(data, window)
    std = data.rolling(window=window).std()
    upper_band = sma + (std * num_std)
    lower_band = sma - (std * num_std)
    return upper_band, sma, lower_band

# Apply technical analysis to our Moutai data
if not hist_data.empty and '收盘' in hist_data.columns:
    print("Calculating Technical Indicators...")
    
    # Calculate indicators
    hist_data['SMA_20'] = calculate_sma(hist_data['收盘'], 20)
    hist_data['SMA_50'] = calculate_sma(hist_data['收盘'], 50)
    hist_data['RSI'] = calculate_rsi(hist_data['收盘'])
    
    upper_bb, middle_bb, lower_bb = calculate_bollinger_bands(hist_data['收盘'])
    hist_data['BB_Upper'] = upper_bb
    hist_data['BB_Middle'] = middle_bb
    hist_data['BB_Lower'] = lower_bb
    
    print("Technical indicators calculated successfully!")
    
    # Plot technical analysis chart
    fig, axes = plt.subplots(3, 1, figsize=(15, 12))
    
    # Price and Moving Averages
    axes[0].plot(hist_data.index, hist_data['收盘'], label='Close Price', linewidth=2)
    axes[0].plot(hist_data.index, hist_data['SMA_20'], label='SMA 20', alpha=0.7)
    axes[0].plot(hist_data.index, hist_data['SMA_50'], label='SMA 50', alpha=0.7)
    axes[0].fill_between(hist_data.index, hist_data['BB_Upper'], hist_data['BB_Lower'], 
                        alpha=0.2, color='gray', label='Bollinger Bands')
    axes[0].set_title('Moutai - Price and Moving Averages')
    axes[0].set_ylabel('Price (CNY)')
    axes[0].legend()
    axes[0].grid(True, alpha=0.3)
    
    # RSI
    axes[1].plot(hist_data.index, hist_data['RSI'], color='orange', linewidth=2)
    axes[1].axhline(y=70, color='r', linestyle='--', alpha=0.7, label='Overbought (70)')
    axes[1].axhline(y=30, color='g', linestyle='--', alpha=0.7, label='Oversold (30)')
    axes[1].set_title('Relative Strength Index (RSI)')
    axes[1].set_ylabel('RSI')
    axes[1].set_ylim(0, 100)
    axes[1].legend()
    axes[1].grid(True, alpha=0.3)
    
    # Volume
    axes[2].bar(hist_data.index, hist_data['成交量'], alpha=0.7, color='purple')
    axes[2].set_title('Trading Volume')
    axes[2].set_ylabel('Volume')
    axes[2].set_xlabel('Date')
    axes[2].grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.show()
    
    # Print current indicator values
    latest_data = hist_data.iloc[-1]
    print(f"\nLatest Technical Indicators:")
    print(f"Price: ¥{latest_data['收盘']:.2f}")
    print(f"SMA 20: ¥{latest_data['SMA_20']:.2f}")
    print(f"SMA 50: ¥{latest_data['SMA_50']:.2f}")
    print(f"RSI: {latest_data['RSI']:.2f}")
    print(f"Bollinger Upper: ¥{latest_data['BB_Upper']:.2f}")
    print(f"Bollinger Lower: ¥{latest_data['BB_Lower']:.2f}")
else:
    print("No historical data available for technical analysis")

### 4.2 Multi-Stock Comparison

Compare multiple stocks using our improved fetcher system.

In [None]:
# Compare multiple popular stocks
stocks_to_compare = {
    "600519": "贵州茅台",  # Moutai
    "000858": "五粮液",    # Wuliangye
    "600036": "招商银行",  # China Merchants Bank
    "000001": "平安银行"   # Ping An Bank
}

comparison_data = {}
comparison_returns = {}

print("Fetching data for multiple stocks...")
for symbol, name in stocks_to_compare.items():
    try:
        stock = StockData(
            symbol=symbol,
            start_date="2023-01-01",
            end_date="2023-12-31",
            period="daily",
            adjust="qfq"
        )
        
        hist = stock.get_data("kline")
        if not hist.empty and '收盘' in hist.columns:
            comparison_data[name] = hist['收盘']
            
            # Calculate returns
            first_price = hist['收盘'].iloc[0]
            last_price = hist['收盘'].iloc[-1]
            total_return = ((last_price - first_price) / first_price) * 100
            comparison_returns[name] = total_return
            
            print(f"✓ {name} ({symbol}): {total_return:.2f}% return")
        else:
            print(f"✗ {name} ({symbol}): No data available")
            
    except Exception as e:
        print(f"✗ {name} ({symbol}): Error - {e}")

# Create comparison visualizations
if comparison_data:
    # Normalize all prices to start at 100 for comparison
    comparison_df = pd.DataFrame(comparison_data)
    normalized_df = comparison_df.div(comparison_df.iloc[0]) * 100
    
    plt.figure(figsize=(15, 10))
    
    # Normalized price comparison
    plt.subplot(2, 2, 1)
    for stock in normalized_df.columns:
        plt.plot(normalized_df.index, normalized_df[stock], label=stock, linewidth=2)
    plt.title('Normalized Price Comparison (Starting at 100)')
    plt.ylabel('Normalized Price')
    plt.legend()
    plt.grid(True, alpha=0.3)
    
    # Returns bar chart
    plt.subplot(2, 2, 2)
    returns_df = pd.Series(comparison_returns)
    colors = ['green' if x > 0 else 'red' for x in returns_df.values]
    plt.bar(returns_df.index, returns_df.values, color=colors, alpha=0.7)
    plt.title('Total Returns Comparison (%)')
    plt.ylabel('Return (%)')
    plt.xticks(rotation=45)
    plt.grid(True, alpha=0.3)
    
    # Volatility comparison (rolling std)
    plt.subplot(2, 2, 3)
    rolling_vol = comparison_df.pct_change().rolling(window=20).std() * np.sqrt(252) * 100
    for stock in rolling_vol.columns:
        plt.plot(rolling_vol.index, rolling_vol[stock], label=stock, alpha=0.7)
    plt.title('20-Day Rolling Volatility (Annualized %)')
    plt.ylabel('Volatility (%)')
    plt.legend()
    plt.grid(True, alpha=0.3)
    
    # Correlation heatmap
    plt.subplot(2, 2, 4)
    correlation_matrix = comparison_df.pct_change().corr()
    sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', center=0,
                square=True, linewidths=0.5)
    plt.title('Price Correlation Matrix')
    
    plt.tight_layout()
    plt.show()
    
    # Statistical summary
    print(f"\nStatistical Summary:")
    print("=" * 50)
    returns_series = comparison_df.pct_change().dropna()
    stats_summary = pd.DataFrame({
        'Mean Daily Return (%)': returns_series.mean() * 100,
        'Volatility (%)': returns_series.std() * np.sqrt(252) * 100,
        'Sharpe Ratio': (returns_series.mean() / returns_series.std()) * np.sqrt(252),
        'Max Drawdown (%)': ((comparison_df / comparison_df.cummax() - 1).min()) * 100
    })
    print(stats_summary.round(3))
else:
    print("No comparison data available")

## 5. Summary and Next Steps

### What We've Accomplished

In this comprehensive example, we've demonstrated the full capabilities of the improved China Stock Data library:

#### ✅ **Data Fetching Capabilities**
- **Historical Data**: OHLCV data with calculated technical factors
- **Real-time Data**: Bid/ask prices and current trading information  
- **Company Information**: Fundamental company data and metrics
- **Chip Distribution**: Technical analysis data for price distribution
- **Index Data**: Components and constituent analysis
- **Market Sentiment**: News sentiment analysis (new feature!)
- **International Data**: US market indices for comparison

#### ✅ **Technical Analysis**
- Moving averages (SMA 20, SMA 50)
- Relative Strength Index (RSI)
- Bollinger Bands
- Volume analysis
- Price volatility calculations

#### ✅ **Advanced Analytics**
- Multi-stock performance comparison
- Correlation analysis
- Risk metrics (volatility, Sharpe ratio, max drawdown)
- Normalized price comparisons

#### ✅ **Improved Architecture**
- Modular fetcher system organized by category
- Type-safe code with comprehensive annotations
- Robust error handling and data validation
- Consistent API across all data types

### Next Steps for Your Analysis

1. **Portfolio Analysis**: Use the multi-stock comparison to build and analyze portfolios
2. **Backtesting**: Implement trading strategies using the historical data
3. **Risk Management**: Expand the risk metrics and implement VaR calculations
4. **Machine Learning**: Use the data for predictive modeling and algorithm trading
5. **Real-time Monitoring**: Build dashboards using the real-time data capabilities

### Key Features of the Refactored Library

- **Better Organization**: Fetchers organized by category (stock/, index/, market/)
- **Enhanced Type Safety**: Comprehensive type annotations throughout
- **Improved Error Handling**: Robust exception handling and data validation
- **Extended Functionality**: New market sentiment analysis capabilities
- **Better Testing**: Comprehensive test suite with 44+ test cases
- **Documentation**: Complete English documentation and examples

The library is now production-ready for serious financial analysis and algorithmic trading applications!