Skip to content

First Steps

Raphael Constantinis edited this page Jul 23, 2025 · 1 revision

First Steps

Welcome to the entropic_measurement library! This guide will help you get started with measuring entropy and applying bias correction techniques to your data.

Quick Start

Installation

First, ensure you have the library installed:

pip install entropic-measurement

Basic Measurement Class Instantiation

The core of the library is the Measurement class, which handles data collection and basic statistical operations:

from entropic_measurement import Measurement

# Create a measurement instance with your data
data = [1, 2, 3, 4, 5, 2, 3, 1, 4, 2]  # Example discrete data
measurement = Measurement(data)

# Basic statistics
print(f"Sample size: {measurement.size}")
print(f"Unique values: {measurement.unique_count}")
print(f"Data range: {measurement.range}")

EntropyEstimator Usage

The EntropyEstimator class provides various entropy estimation methods:

from entropic_measurement import EntropyEstimator

# Initialize the estimator
estimator = EntropyEstimator(measurement)

# Calculate different entropy estimates
naive_entropy = estimator.naive()  # Simple frequency-based
mle_entropy = estimator.mle()      # Maximum likelihood estimate
jackknife_entropy = estimator.jackknife()  # Jackknife bias correction

print(f"Naive entropy: {naive_entropy:.3f}")
print(f"MLE entropy: {mle_entropy:.3f}")
print(f"Jackknife entropy: {jackknife_entropy:.3f}")

Bias Correction Workflow

Entropy estimates can be biased, especially with small sample sizes. Here's a typical bias correction workflow:

from entropic_measurement import Measurement, EntropyEstimator, BiasCorrector

# 1. Prepare your data
data = your_discrete_data  # Replace with your actual data
measurement = Measurement(data)

# 2. Initialize estimator and bias corrector
estimator = EntropyEstimator(measurement)
corrector = BiasCorrector(estimator)

# 3. Apply bias correction methods
corrected_entropy = corrector.apply_correction(
    method='bootstrap',  # Options: 'bootstrap', 'jackknife', 'analytical'
    iterations=1000      # For bootstrap methods
)

# 4. Compare results
original = estimator.naive()
print(f"Original estimate: {original:.3f}")
print(f"Bias-corrected estimate: {corrected_entropy:.3f}")
print(f"Bias correction: {corrected_entropy - original:.3f}")

Working with Continuous Data

For continuous data, you'll need to discretize first:

from entropic_measurement import Measurement, discretize

# Continuous data example
continuous_data = [1.23, 2.45, 1.67, 3.21, 2.89, 1.45]

# Discretize using binning
discrete_data = discretize(continuous_data, method='equal_width', bins=10)
measurement = Measurement(discrete_data)

# Proceed with entropy estimation
estimator = EntropyEstimator(measurement)
entropy = estimator.jackknife()

Key Concepts

Sample Size Considerations

  • Small samples (n < 100): Use bias correction methods like jackknife or bootstrap
  • Medium samples (100 ≤ n < 1000): Analytical corrections may suffice
  • Large samples (n ≥ 1000): Naive estimates are typically reliable

Choosing Estimation Methods

  • Naive: Fast but biased for small samples
  • MLE: Unbiased but requires knowledge of the true distribution
  • Jackknife: Good general-purpose bias correction
  • Bootstrap: Robust for complex bias patterns

Data Quality

  • Ensure your data represents the distribution you want to measure
  • Check for outliers that might affect discretization
  • Consider the appropriate number of bins for continuous data

Next Steps

Once you're comfortable with these basics, explore these advanced topics:

Common Pitfalls

  1. Over-binning continuous data: Too many bins can lead to sparse data and biased estimates
  2. Under-binning: Too few bins lose important distributional information
  3. Ignoring sample size: Small samples require bias correction
  4. Wrong estimation method: Choose methods appropriate for your data characteristics

Getting Help


This guide provides a foundation for using the entropic_measurement library. For production use, always validate your results and consider the specific requirements of your application.

Clone this wiki locally