-
Notifications
You must be signed in to change notification settings - Fork 0
First Steps
Welcome to the entropic_measurement library! This guide will help you get started with measuring entropy and applying bias correction techniques to your data.
First, ensure you have the library installed:
pip install entropic-measurementThe core of the library is the Measurement class, which handles data collection and basic statistical operations:
from entropic_measurement import Measurement
# Create a measurement instance with your data
data = [1, 2, 3, 4, 5, 2, 3, 1, 4, 2] # Example discrete data
measurement = Measurement(data)
# Basic statistics
print(f"Sample size: {measurement.size}")
print(f"Unique values: {measurement.unique_count}")
print(f"Data range: {measurement.range}")The EntropyEstimator class provides various entropy estimation methods:
from entropic_measurement import EntropyEstimator
# Initialize the estimator
estimator = EntropyEstimator(measurement)
# Calculate different entropy estimates
naive_entropy = estimator.naive() # Simple frequency-based
mle_entropy = estimator.mle() # Maximum likelihood estimate
jackknife_entropy = estimator.jackknife() # Jackknife bias correction
print(f"Naive entropy: {naive_entropy:.3f}")
print(f"MLE entropy: {mle_entropy:.3f}")
print(f"Jackknife entropy: {jackknife_entropy:.3f}")Entropy estimates can be biased, especially with small sample sizes. Here's a typical bias correction workflow:
from entropic_measurement import Measurement, EntropyEstimator, BiasCorrector
# 1. Prepare your data
data = your_discrete_data # Replace with your actual data
measurement = Measurement(data)
# 2. Initialize estimator and bias corrector
estimator = EntropyEstimator(measurement)
corrector = BiasCorrector(estimator)
# 3. Apply bias correction methods
corrected_entropy = corrector.apply_correction(
method='bootstrap', # Options: 'bootstrap', 'jackknife', 'analytical'
iterations=1000 # For bootstrap methods
)
# 4. Compare results
original = estimator.naive()
print(f"Original estimate: {original:.3f}")
print(f"Bias-corrected estimate: {corrected_entropy:.3f}")
print(f"Bias correction: {corrected_entropy - original:.3f}")For continuous data, you'll need to discretize first:
from entropic_measurement import Measurement, discretize
# Continuous data example
continuous_data = [1.23, 2.45, 1.67, 3.21, 2.89, 1.45]
# Discretize using binning
discrete_data = discretize(continuous_data, method='equal_width', bins=10)
measurement = Measurement(discrete_data)
# Proceed with entropy estimation
estimator = EntropyEstimator(measurement)
entropy = estimator.jackknife()- Small samples (n < 100): Use bias correction methods like jackknife or bootstrap
- Medium samples (100 ≤ n < 1000): Analytical corrections may suffice
- Large samples (n ≥ 1000): Naive estimates are typically reliable
- Naive: Fast but biased for small samples
- MLE: Unbiased but requires knowledge of the true distribution
- Jackknife: Good general-purpose bias correction
- Bootstrap: Robust for complex bias patterns
- Ensure your data represents the distribution you want to measure
- Check for outliers that might affect discretization
- Consider the appropriate number of bins for continuous data
Once you're comfortable with these basics, explore these advanced topics:
- Advanced Estimation Methods: Learn about sophisticated entropy estimators
- Bias Correction Techniques: Deep dive into different correction approaches
- Working with Time Series: Apply entropy methods to temporal data
- Comparative Analysis: Tools for comparing entropy across datasets
- Performance Optimization: Speed up computations for large datasets
- Over-binning continuous data: Too many bins can lead to sparse data and biased estimates
- Under-binning: Too few bins lose important distributional information
- Ignoring sample size: Small samples require bias correction
- Wrong estimation method: Choose methods appropriate for your data characteristics
- Check the API Reference for detailed function documentation
- See Examples for more use cases
- Visit the FAQ for common questions
- Report issues on our GitHub repository
This guide provides a foundation for using the entropic_measurement library. For production use, always validate your results and consider the specific requirements of your application.