# Factor Management System

This tutorial demonstrates how to use the Frozen Factor Management System, a comprehensive framework for managing quantitative factors in financial analysis. The system provides:

- **Factor Registration**: Register and manage factors with metadata, dependencies, and lifecycle tracking
- **Dependency Management**: Handle complex factor dependencies with automatic validation and visualization
- **Lifecycle Management**: Track factor development stages from draft to production
- **Storage & Retrieval**: Efficient storage and retrieval of factor data with multiple backends
- **Performance Monitoring**: Monitor factor performance and health

## Key Components

1. **FactorManager**: Main interface for factor operations
2. **FactorRegistry**: Manages factor metadata and dependencies
3. **LifecycleManager**: Handles factor status transitions and versioning
4. **Storage Handlers**: Backend storage implementations (DuckDB, etc.)

Let's start by setting up the environment and exploring the system capabilities.

## Environment Setup

First, we need to import the necessary modules and set up our data providers and loaders. The Factor Management System integrates with Frozen's data infrastructure to provide seamless factor computation and storage.

In [None]:
from frozen.factor.library.inventory import FactorManager
from frozen.factor.expression import *
from frozen.data.etl.dataload import DataLoadManager, DatabaseTypes
from frozen.data.provider.factory import ProviderFactory, ProviderTypes
from frozen.data.provider import TickerType
from frozen.utils import GL

Now we initialize the data provider (Tushare for market data) and data loader (DuckDB for storage). We also enable logging to see the system's operations.

In [None]:
provider = ProviderFactory.create_data_feed(ProviderTypes.TUSHARE)
dataloader = DataLoadManager(DatabaseTypes.DUCKDB)
GL.unmute()

Create the FactorManager instance with DuckDB as the storage backend. The system automatically loads any existing factors from the database.

In [None]:
factor_manager = FactorManager(DatabaseTypes.DUCKDB)

[INFO] frozen.factor.library.inventory [inventory.py:431]: Loaded 5 factors from database


## 1. Factor Registration

Factor registration is the foundation of the management system. Each factor is registered with metadata including name, description, category, tags, and dependencies. The system automatically generates unique identifiers (UIDs) and tracks the factor's lifecycle status.

### 1.1 Basic Factor Registration

Let's start by registering basic factors without dependencies. These are typically raw data factors like price and volume that serve as building blocks for more complex factors.

In [None]:
factor_manager.register_factor(
        name="close",
        description="daily close price",
        category="volume-price",
        tags=["basic", "price"]
    )

factor_manager.register_factor(
        name="volume",
        description="daily volume",
        category="volume-price",
        tags=["basic", "volume"]
    )

[INFO] frozen.factor.library.lifecycle [lifecycle.py:304]: Initialized lifecycle for factor '7d6808fb_9eab_45be_87db_54e31e88df56' with status development
[INFO] frozen.factor.library.inventory [inventory.py:204]: Factor 'close' registered successfully with UID 7d6808fb_9eab_45be_87db_54e31e88df56 and status development
[INFO] frozen.factor.library.lifecycle [lifecycle.py:304]: Initialized lifecycle for factor '4cb9d8dc_e5e6_4b97_af48_dbb63d306654' with status development
[INFO] frozen.factor.library.inventory [inventory.py:204]: Factor 'volume' registered successfully with UID 4cb9d8dc_e5e6_4b97_af48_dbb63d306654 and status development


'4cb9d8dc_e5e6_4b97_af48_dbb63d306654'

### 1.2 Registering Dependent Factors

Now we'll register factors that depend on other factors. The system automatically tracks these dependencies and ensures proper execution order. Notice how factors like 'returns' depend on 'close', and 'vwap' depends on both 'close' and 'volume'.

In [5]:
# Factor with dependency
factor_manager.register_factor(
    name="returns",
    description="daily returns",
    dependencies=["close"],
    category="basic",
    tags=["basic"]
)
    
factor_manager.register_factor(
    name="ma5",
    description="5-day moving average",
    dependencies=["close"],
    category="technical",
    tags=["moving_average", "technical"]
)
    
factor_manager.register_factor(
    name="vwap",
    description="volume-weighted average price",
    dependencies=["close", "volume"],
    category="technical",
    tags=["volume", "price", "weighted"]
)

[INFO] frozen.factor.library.lifecycle [lifecycle.py:304]: Initialized lifecycle for factor '5696ca0b_de34_45e2_a29a_dc1c22f15e49' with status development
[INFO] frozen.factor.library.inventory [inventory.py:204]: Factor 'returns' registered successfully with UID 5696ca0b_de34_45e2_a29a_dc1c22f15e49 and status development
[INFO] frozen.factor.library.lifecycle [lifecycle.py:304]: Initialized lifecycle for factor '6eb9bdef_773f_40b8_b8a9_02d6bf61c594' with status development
[INFO] frozen.factor.library.inventory [inventory.py:204]: Factor 'ma5' registered successfully with UID 6eb9bdef_773f_40b8_b8a9_02d6bf61c594 and status development
[INFO] frozen.factor.library.lifecycle [lifecycle.py:304]: Initialized lifecycle for factor '2856c44f_9a70_460f_b8ec_ef7b5c078e74' with status development
[INFO] frozen.factor.library.inventory [inventory.py:204]: Factor 'vwap' registered successfully with UID 2856c44f_9a70_460f_b8ec_ef7b5c078e74 and status development


'2856c44f_9a70_460f_b8ec_ef7b5c078e74'

### 1.3 Error Detection and Validation

The system includes built-in validation to prevent logical errors. For example, it prevents registering factors with duplicate names and blocks removal of factors that other factors depend on. These commented examples show what would happen if you tried these operations.

In [6]:
# Register factor with same name
# This should raise an error
# factor_manager.register_factor(
#     name="close",
#     description=""
# )

In [17]:
# Remove factor where other factors depend on
# This will raise an error
# factor_manager.registry.remove_factor("close")

### 1.4 Factor Metadata and Statistics

Once factors are registered, you can query various metadata and statistics about your factor library. This includes factor summaries, execution orders, dependency information, and validation status.

**Factor Summary**: Get an overview of all registered factors with their UIDs, status, version, category, and author information.

In [6]:
factor_manager.registry.get_factor_summary()

{'close': {'uid': 'dca52787_20dc_4cc4_950a_065fd79e0d52',
  'status': 'development',
  'version': '1.0.0',
  'category': 'volume-price',
  'author': ''},
 'volume': {'uid': '84aff8d1_9997_49e4_b292_adafa4eceb1a',
  'status': 'development',
  'version': '1.0.0',
  'category': 'volume-price',
  'author': ''},
 'returns': {'uid': 'b882a592_ce1c_4a70_a7e8_c264b58cb616',
  'status': 'development',
  'version': '1.0.0',
  'category': 'basic',
  'author': ''},
 'ma5': {'uid': '2ef97207_e704_47cf_a28e_27f4496228f6',
  'status': 'development',
  'version': '1.0.0',
  'category': 'technical',
  'author': ''},
 'vwap': {'uid': 'c1dc083c_c86b_4197_9a26_4c33c2d588f9',
  'status': 'development',
  'version': '1.0.0',
  'category': 'technical',
  'author': ''}}

**Execution Order**: The system automatically determines the correct order for computing factors based on their dependencies. This ensures that dependent factors are computed after their prerequisites.

In [7]:
factor_manager.registry.get_execution_order()

['fb6c37b5_07f4_4257_8bd4_75a5bc5dc30b',
 'd8f035c5_3868_4f01_9ffc_2d55cc4f1bb7',
 'b7a25a73_926d_4a37_a8a8_d9451946fe26',
 '707cf7ea_f05d_4bba_95ea_ca341c4ad975',
 '3dfba257_8e3e_4a69_9e96_41fbf3f2f4f2']

**Factor Listing**: List all factor names or filter by category. You can also retrieve detailed metadata for specific factors.

In [6]:
factor_manager.registry.get_all_factor_names()

['close', 'volume', 'returns', 'ma5', 'vwap']

**Individual Factor Information**: Retrieve detailed metadata for specific factors by name or UID.

In [12]:
factor_manager.registry.list_factors(return_uids=False)

['close', 'volume', 'returns', 'ma5', 'vwap']

**Pipeline Validation**: Check if a factor pipeline is ready for production by validating dependencies and lifecycle status.

In [10]:
factor_manager.registry.list_factors(category="volume-price", return_uids=False)

['close', 'volume']

In [8]:
factor_manager.registry.get_factor_by_name("close")

FactorMetadata(name='close', uid='factor_a266e428_9c36_4ff1_a158_db0d927d8eb6', description='daily close price', dependencies=[], category='volume-price', tags=['basic', 'price'], created_time=datetime.datetime(2025, 9, 13, 19, 39, 11, 863780), updated_time=datetime.datetime(2025, 9, 13, 19, 39, 11, 920785), lifecycle_status=<FactorLifecycleStatus.DRAFT: 'draft'>, version=FactorVersion(major=1, minor=0, patch=0), author='', maintainer='', last_performance_check=None, performance_score=None, is_active=True, monitoring_enabled=False, namespace='default')

In [15]:
factor_manager.registry.get_factor_by_uid("factor_469c6895_4281_4e8b_b826_7b7383565412")

FactorMetadata(name='vwap', uid='factor_469c6895_4281_4e8b_b826_7b7383565412', description='volume-weighted average price', dependencies=['factor_a266e428_9c36_4ff1_a158_db0d927d8eb6', 'factor_8fad08b5_2745_4b7c_8458_dbb5969ca77f'], category='technical', tags=['volume', 'price', 'weighted'], created_time=datetime.datetime(2025, 9, 13, 19, 39, 13, 191101), updated_time=datetime.datetime(2025, 9, 13, 19, 39, 13, 226900), lifecycle_status=<FactorLifecycleStatus.DRAFT: 'draft'>, version=FactorVersion(major=1, minor=0, patch=0), author='', maintainer='', last_performance_check=None, performance_score=None, is_active=True, monitoring_enabled=False, namespace='default')

In [9]:
factor_manager.registry.get_factor_info("ma5")

FactorMetadata(name='ma5', uid='factor_c3895200_c73b_42d1_981e_eddfa313be76', description='5-day moving average', dependencies=['factor_666872bc_033b_4f86_8245_eaf52b6f7875'], category='technical', tags=['moving_average', 'technical'], created_time=datetime.datetime(2025, 9, 13, 18, 27, 24, 561829), updated_time=datetime.datetime(2025, 9, 13, 18, 27, 24, 561830), lifecycle_status=<FactorLifecycleStatus.DRAFT: 'draft'>, version=FactorVersion(major=1, minor=0, patch=0), author='', maintainer='', last_performance_check=None, performance_score=None, is_active=True, monitoring_enabled=False, namespace='default')

In [12]:
# This will fail because other factors depend on close
# factor_manager.registry.remove_factor("close")

# if indeed needed, use force=True to remove the factor
# factor_manager.registry.remove_factor("close", force=True)

In [8]:
factor_manager.registry.validate_factor_pipeline(["close", "vwap"])

 'factors': {'close': {'lifecycle_status': 'development',
   'production_ready': False,
   'needs_attention': False,
   'dependencies_ready': True},
  'volume': {'lifecycle_status': 'development',
   'production_ready': False,
   'needs_attention': False,
   'dependencies_ready': True},
  'vwap': {'lifecycle_status': 'development',
   'production_ready': False,
   'needs_attention': False,
   'dependencies_ready': False}},
 'issues': ["Dependency 'close' of factor 'vwap' is not production ready",
  "Dependency 'volume' of factor 'vwap' is not production ready"]}

### 1.5 Factor Dependency Visualization

The system provides powerful visualization tools to understand factor relationships and dependencies. This is crucial for managing complex factor libraries and ensuring proper computation order.

**Graph Statistics**: Get comprehensive statistics about the factor dependency graph including total factors, edges, categories, and dependency metrics.

In [20]:
factor_manager.registry.get_graph_statistics()

{'total_factors': 8,
 'total_nodes': 8,
 'total_edges': 6,
 'categories': {'volume-price': 2, 'basic': 1, 'technical': 2, 'default': 3},
 'dependency_stats': {'max_in_degree': 2,
  'max_out_degree': 3,
  'avg_in_degree': 0.75,
  'avg_out_degree': 0.75,
  'isolated_nodes': 0}}

**Basic Dependency Graph**: Generate a simple visualization of factor dependencies and save it as a PNG file.

In [5]:
factor_manager.registry.visualize_dependency_graph()

[INFO] frozen.factor.library.inventory [inventory.py:563]: Dependency graph visualized and saved to dependency_graph.png


**Advanced Dependency Graph**: Create a more sophisticated visualization with different layout algorithms (spring, circular, etc.) and optional interactive features.

In [None]:
factor_manager.registry.visualize_dependency_graph_advanced(layout="spring")

[INFO] frozen.factor.library.inventory [inventory.py:858]: Advanced dependency graph saved to dependency_graph_advanced.png


**Subgraph Visualization**: Visualize a subset of factors and their relationships, useful for focusing on specific parts of a large factor library.

In [4]:
# factor_manager.registry.visualize_dependency_graph_advanced(layout="spring", interactive=True)

**Graph Data Export**: Export the dependency graph data to JSON format for external analysis or integration with other tools.

In [6]:
factor_manager.registry.visualize_subgraph(["ma5", "volume"])

[INFO] frozen.factor.library.inventory [inventory.py:944]: Subgraph visualization saved to subgraph.png


In [7]:
factor_manager.registry.export_graph_data()

[INFO] frozen.factor.library.inventory [inventory.py:991]: Graph data exported to dependency_graph.json


## 2. Factor Lifecycle Management

The lifecycle management system tracks factors through different stages of development and deployment. Each factor progresses through states like development, review, active, deprecated, and archived, with automatic versioning and monitoring capabilities.

First, we import the lifecycle status enum to work with different factor states.

In [13]:
from frozen.factor.library.inventory import FactorLifecycleStatus

**Dependency Status Check**: Check the lifecycle status of factors that other factors depend on. This helps ensure that dependent factors are only activated when their prerequisites are production-ready.

In [23]:
factor_manager.registry.get_factor_dependencies_status("vwap")

{'close': {'status': 'active', 'version': '1.0.1', 'production_ready': True},
 'volume': {'status': 'development',
  'version': '1.0.0',
  'production_ready': False}}

**Status Transition**: Move factors between lifecycle states. Here we transition the 'close' factor to 'active' status, which enables monitoring and makes it available for production use.

In [15]:
factor_manager.transition_factor_status("close", FactorLifecycleStatus.ACTIVE)

[INFO] frozen.factor.library.lifecycle [lifecycle.py:383]: Enabled monitoring for factor 'close'
[INFO] frozen.factor.library.lifecycle [lifecycle.py:374]: Factor 'close' transitioned from active to active
[INFO] frozen.factor.library.handlers.duck [duck.py:417]: Saved lifecycle metadata for factor 'close'


True

**Version Management**: Create new versions of factors with automatic versioning. The system supports semantic versioning (major.minor.patch) and tracks version history.

In [17]:
factor_manager.create_factor_version("close", "patch", "close = close + 1e-8")

[INFO] frozen.factor.library.lifecycle [lifecycle.py:191]: Created version 1.0.3 for factor close
[INFO] frozen.factor.library.lifecycle [lifecycle.py:419]: Created version 1.0.3 for factor 'close'


FactorVersion(major=1, minor=0, patch=3)

**Lifecycle Information**: Retrieve detailed lifecycle information including status history, version history, and monitoring settings for a specific factor.

In [20]:
factor_manager.registry.get_lifecycle_info("close")

{'status': 'active',
 'version': '1.0.1',
 'status_history': [{'from_status': 'development',
   'to_status': 'development',
   'timestamp': '2025-09-14T19:09:49.303587',
   'operator': 'system',
   'reason': 'Factor lifecycle initialization',
   'metadata': {}},
  {'from_status': 'development',
   'to_status': 'review',
   'timestamp': '2025-09-14T19:09:49.305058',
   'operator': 'system',
   'reason': '',
   'metadata': {}},
  {'from_status': 'review',
   'to_status': 'active',
   'timestamp': '2025-09-14T19:25:08.507745',
   'operator': 'system',
   'reason': '',
   'metadata': {}}],
 'monitoring_enabled': True,
 'alert_contacts': [],
 'version_history': [{'version': '1.0.1',
   'created_at': '2025-09-14T19:13:58.199082',
   'operator': 'system',
   'parent_version': '1.0.0'}],
 'parent_version': None}

**Bulk Operations**: Update multiple factors to the same lifecycle status simultaneously. This is useful for promoting a set of related factors to production together.

In [19]:
factor_manager.registry.bulk_status_update(["close", "vwap"], FactorLifecycleStatus.ACTIVE)

[INFO] frozen.factor.library.lifecycle [lifecycle.py:383]: Enabled monitoring for factor 'close'
[INFO] frozen.factor.library.lifecycle [lifecycle.py:374]: Factor 'close' transitioned from review to active
[INFO] frozen.factor.library.handlers.duck [duck.py:417]: Saved lifecycle metadata for factor 'close'
[INFO] frozen.factor.library.lifecycle [lifecycle.py:304]: Initialized lifecycle for factor 'vwap' with status development
[INFO] frozen.factor.library.lifecycle [lifecycle.py:383]: Enabled monitoring for factor 'vwap'
[INFO] frozen.factor.library.lifecycle [lifecycle.py:374]: Factor 'vwap' transitioned from development to active
[INFO] frozen.factor.library.handlers.duck [duck.py:417]: Saved lifecycle metadata for factor 'vwap'


{'close': True, 'vwap': True}

**Lifecycle Dashboard**: Get a comprehensive overview of all factors' lifecycle status, including statistics and factors that need attention.

In [21]:
factor_manager.get_lifecycle_dashboard()

{'summary': {'total_factors': 5,
  'total_nodes': 5,
  'total_edges': 4,
  'categories': {'volume-price': 2, 'basic': 1, 'technical': 2},
  'dependency_stats': {'max_in_degree': 2,
   'max_out_degree': 3,
   'avg_in_degree': 0.8,
   'avg_out_degree': 0.8,
   'isolated_nodes': 0}},
 'factors_by_status': {'development': ['4cb9d8dc_e5e6_4b97_af48_dbb63d306654',
   '5696ca0b_de34_45e2_a29a_dc1c22f15e49',
   '6eb9bdef_773f_40b8_b8a9_02d6bf61c594'],
  'review': [],
  'active': ['7d6808fb_9eab_45be_87db_54e31e88df56',
   '2856c44f_9a70_460f_b8ec_ef7b5c078e74'],
  'deprecated': [],
  'archived': []},
 'factors_needing_attention': [],
 'recent_transitions': []}

**Portfolio Export**: Export the entire factor portfolio including metadata, lifecycle information, and dependencies to a JSON file for backup or external analysis.

In [20]:
factor_manager.export_factor_portfolio("factor_portfolio.json")

[INFO] frozen.factor.library.inventory [inventory.py:1598]: Factor portfolio exported to factor_portfolio.json


{'export_timestamp': '2025-09-14T23:13:19.675674',
 'total_factors': 5,
 'statistics': {'total_factors': 5,
  'total_nodes': 5,
  'total_edges': 4,
  'categories': {'volume-price': 2, 'basic': 1, 'technical': 2},
  'dependency_stats': {'max_in_degree': 2,
   'max_out_degree': 3,
   'avg_in_degree': 0.8,
   'avg_out_degree': 0.8,
   'isolated_nodes': 0}},
 'factors': {'close': {'basic_info': {'name': 'close',
    'uid': '7d6808fb_9eab_45be_87db_54e31e88df56',
    'description': 'daily close price',
    'category': 'volume-price',
    'tags': ['basic', 'price'],
    'author': '',
    'created_time': '2025-09-14T19:09:38.609855',
    'updated_time': '2025-09-14T23:11:04.256155'},
   'lifecycle': {'status': 'active',
    'version': '1.0.3',
    'is_production_ready': True,
    'needs_attention': False},
   'dependencies': [],
   'detailed_lifecycle': {'status': 'active',
    'version': '1.0.3',
    'status_history': [{'from_status': 'development',
      'to_status': 'development',
      't

## 3. Factor Inventory Storage

The storage system provides multiple ways to store and retrieve factor data. It supports different storage backends (DuckDB, etc.) and offers various methods for factor computation and storage, including direct factor objects, string expressions, and computation functions.

First, we load the universe of stocks and market data that we'll use for factor computation and storage.

In [4]:
universe = provider.get_instrument_list(TickerType.LISTED_STOCK)

Load the market data (OHLCV) for the specified universe and date range. This data will be used as the foundation for factor computation.

In [5]:
open, high, low, close, volume, amount = dataloader.load_volume_price("stock_daily_hfq", col=("open", "high", "low", "close", "volume", "amount"), universe=universe, start_date="20230101", end_date="20231231")

### 3.1 Storage by Factor Object

Store factors directly as Factor objects. This method is useful when you already have computed factor data and want to store it in the factor library.

In [6]:
factors = {
    "close": Factor(close)
}

Create a dictionary of Factor objects and store them in the database. The system automatically handles the storage format and metadata.

In [7]:
factor_manager.store_factors(factors=factors, table_name="basic_factors", force_recompute=True)

[INFO] frozen.factor.library.inventory [inventory.py:1308]: Storing factor 'close'
[INFO] frozen.factor.library.handlers.duck [duck.py:177]: Factor 'close' data updated.
[INFO] frozen.factor.library.inventory [inventory.py:1311]: Factor 'close' stored successfully


Retrieve the stored factor data from the database. The system returns the factor as a Factor object that can be used for further computation.

In [None]:
factor_manager.handler.read_factor(table_name="basic_factors", factor_name="close")

<frozen.factor.expression.base.Factor at 0x149e8ee90>

### 3.2 Storage by String Expression

Store factors using string expressions. This method allows you to define factors using mathematical expressions and automatically compute them from the provided variables.

In [10]:
factor_strings = {
    "volume": "volume"
}

Define factor expressions as strings. The system will parse these expressions and compute the factors using the provided additional variables.

In [11]:
factor_manager.store_string_factors(factor_strings=factor_strings, additional_vars={"volume": Factor(volume)}, table_name="basic_factors", force_recompute=True)

[INFO] frozen.factor.library.inventory [inventory.py:1347]: Computing factor 'volume' from string expression
[INFO] frozen.factor.library.handlers.duck [duck.py:166]: Column 'volume' added.
[INFO] frozen.factor.library.handlers.duck [duck.py:177]: Factor 'volume' data updated.
[INFO] frozen.factor.library.inventory [inventory.py:1371]: Factor 'volume' computed from string and stored successfully


Store factors computed from string expressions. The system automatically evaluates the expressions and stores the computed results.

In [12]:
factor_manager.handler.read_factor(table_name="basic_factors", factor_name="volume")

<frozen.factor.expression.base.Factor at 0x1492a03d0>

Verify that the factor was stored correctly by reading it back from the database.

### 3.3 Storage by Computation Function

Store factors using Python functions. This method provides the most flexibility for complex factor computations and allows you to define custom logic for factor calculation.

Define a computation function for the moving average factor. This function takes the close price as input and returns the 5-day moving average.

In [9]:
def compute_ma5(close):
    return ts_mean(close, 5)


Create a dictionary mapping factor names to their computation functions. The system will use these functions to compute and store the factors.

In [10]:
factor_functions = {
    "ma5": compute_ma5
}

Compute and store factors using the provided functions. The system automatically handles the computation process and stores the results in the database.

In [11]:
factor_manager.compute_and_store_factors(
            factor_functions=factor_functions,
            table_name='basic_factors',
            force_recompute=True
        )

[INFO] frozen.factor.library.inventory [inventory.py:1421]: Computing factor 'ma5'
[INFO] frozen.factor.library.handlers.duck [duck.py:166]: Column 'ma5' added.
[INFO] frozen.factor.library.handlers.duck [duck.py:177]: Factor 'ma5' data updated.
[INFO] frozen.factor.library.inventory [inventory.py:1445]: Factor 'ma5' computed and stored successfully


**Smart Backfill**: Automatically backfill historical data for all registered factors. This ensures that your factor library has complete historical coverage for analysis and backtesting.

In [7]:
factor_manager.smart_backfill(
    table_name="basic_factors",
    target_days=3650
)

[INFO] frozen.factor.library.inventory [inventory.py:1481]: Backfilling 5 factors: ['close', 'volume', 'returns', 'ma5', 'vwap']
