Skip to content

rskworld/statsmodels-statistical

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Statsmodels Statistical Modeling

Statistical modeling with Statsmodels including regression analysis, time series models, hypothesis testing, and statistical tests.

Description

This project demonstrates Statsmodels, a library for statistical modeling and econometrics in Python. It covers linear and generalized linear models, time series analysis, hypothesis testing, statistical tests, and diagnostic tools. Perfect for statistical analysis and econometric modeling.

Features

  • Linear and GLM regression - OLS, GLM with multiple families, comprehensive diagnostics
  • Time series analysis - ARIMA, SARIMA, exponential smoothing, decomposition, forecasting
  • Advanced time series - Auto ARIMA selection, SARIMA models, comprehensive stationarity tests
  • Hypothesis testing - T-tests, ANOVA, chi-square, normality tests, non-parametric tests
  • Statistical diagnostics - Multicollinearity, heteroscedasticity, autocorrelation, influential points
  • Econometric modeling - VAR, VARMAX, cointegration tests, impulse response functions, Granger causality
  • Model selection - Stepwise selection, model comparison, information criteria
  • Model evaluation - Cross-validation, time series CV, multiple metrics, learning curves
  • Feature selection - VIF-based removal, correlation filtering
  • Data preprocessing - Missing value handling, outlier detection/removal, scaling, stationarity transformation
  • Visualization utilities - Comprehensive plotting functions for all analyses
  • Bayesian statistics - Bayesian inference, posterior distributions, Bayes factors
  • Panel data analysis - Fixed effects, random effects, Hausman test
  • Model persistence - Save/load models, model serialization, metadata management
  • Automated reporting - Generate comprehensive reports in TXT and HTML formats
  • Performance benchmarking - Model comparison, execution time profiling, memory usage

Technologies

  • Python 3.8+
  • Statsmodels
  • Pandas
  • NumPy
  • Matplotlib
  • Seaborn
  • Scikit-learn
  • SciPy
  • Jupyter Notebook

Installation

pip install -r requirements.txt

Usage

Linear Regression

from regression_analysis import LinearRegressionModel

# Create and fit model
model = LinearRegressionModel()
model.fit(X, y)
model.summary()

Time Series Analysis

from time_series_analysis import TimeSeriesModel

# Create and fit time series model
ts_model = TimeSeriesModel()
ts_model.fit(data)
ts_model.forecast(steps=10)

Hypothesis Testing

from hypothesis_testing import StatisticalTests

# Perform statistical tests
tests = StatisticalTests()
tests.t_test(data)
tests.chi_square_test(data)

Model Selection

from model_selection import ModelSelection

# Compare multiple models
selector = ModelSelection()
comparison = selector.compare_models(X, y, models_dict)

# Stepwise feature selection
features, model = selector.stepwise_selection(X, y)

Model Evaluation

from model_evaluation import ModelEvaluation

# Cross-validation
evaluator = ModelEvaluation()
cv_results = evaluator.cross_validate(X, y, model_func, cv_folds=5)

# Calculate metrics
metrics = evaluator.calculate_metrics(y_true, y_pred)

Advanced Time Series

from advanced_time_series import SARIMAModel, AutoARIMA

# SARIMA model
sarima = SARIMAModel()
sarima.fit(data, order=(1,1,1), seasonal_order=(1,1,1,12))

# Auto ARIMA selection
auto_arima = AutoARIMA()
best_model = auto_arima.auto_select(data)

Data Preprocessing

from data_preprocessing import DataPreprocessor

# Handle missing values and outliers
preprocessor = DataPreprocessor()
cleaned_data = preprocessor.remove_outliers(data)
scaled_data = preprocessor.scale_data(data, method='standard')

Visualization

from visualization_utils import StatisticalVisualizations

# Create comprehensive plots
viz = StatisticalVisualizations()
viz.plot_correlation_matrix(data)
viz.plot_residual_analysis(residuals, fitted_values)

Bayesian Statistics

from bayesian_statistics import BayesianAnalysis

# Bayesian t-test
result = BayesianAnalysis.bayesian_ttest(sample1, sample2)

# Bayesian linear regression
bayesian_result = BayesianAnalysis.bayesian_linear_regression(X, y)

Panel Data Analysis

from panel_data_analysis import PanelDataAnalysis

# Prepare and analyze panel data
panel = PanelDataAnalysis()
panel.prepare_panel_data(df, 'entity', 'time', ['X1', 'X2', 'y'])
fe_model = panel.fixed_effects_regression('y', ['X1', 'X2'])

Model Persistence

from model_persistence import ModelPersistence

# Save and load models
persistence = ModelPersistence()
persistence.save_model(model, 'my_model', metadata={'r_squared': 0.95})
loaded_model, metadata = persistence.load_model('saved_models/my_model.pkl')

Automated Reporting

from automated_reporting import AutomatedReport

# Generate comprehensive reports
reporter = AutomatedReport()
reporter.generate_regression_report(model, X, y)
reporter.save_report('analysis_report', format='html')

Performance Benchmarking

from performance_benchmarking import PerformanceBenchmark

# Benchmark model performance
benchmark = PerformanceBenchmark()
comparison = benchmark.compare_models(models_dict, X, y)

Project Structure

statsmodels-statistical/
├── README.md
├── requirements.txt
├── LICENSE
├── index.html
├── regression_analysis.py          # Linear and GLM regression
├── time_series_analysis.py         # Basic time series models
├── advanced_time_series.py         # SARIMA, Auto ARIMA
├── hypothesis_testing.py          # Statistical tests
├── statistical_diagnostics.py     # Model diagnostics
├── econometric_modeling.py        # VAR, cointegration
├── model_selection.py             # Model comparison, stepwise selection
├── model_evaluation.py            # Cross-validation, metrics
├── data_preprocessing.py          # Data cleaning, scaling
├── visualization_utils.py         # Advanced plotting
├── bayesian_statistics.py         # Bayesian inference
├── panel_data_analysis.py         # Panel data models
├── model_persistence.py           # Model saving/loading
├── automated_reporting.py          # Report generation
├── performance_benchmarking.py     # Performance profiling
├── notebooks/
│   ├── 01_linear_regression.ipynb
│   ├── 02_time_series.ipynb
│   ├── 03_hypothesis_testing.ipynb
│   └── 04_econometric_modeling.ipynb
├── data/
│   └── sample_data.csv
└── examples/
    ├── regression_example.py
    ├── time_series_example.py
    └── hypothesis_testing_example.py

Author

RSK World

License

This project is provided as educational material for statistical modeling and analysis.

About

This project demonstrates Statsmodels, a library for statistical modeling and econometrics in Python. It covers linear and generalized linear models, time series analysis, hypothesis testing, statistical tests, and diagnostic tools. Perfect for statistical analysis and econometric modeling.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors