Statsmodels Statistical Modeling

Statistical modeling with Statsmodels including regression analysis, time series models, hypothesis testing, and statistical tests.

Description

This project demonstrates Statsmodels, a library for statistical modeling and econometrics in Python. It covers linear and generalized linear models, time series analysis, hypothesis testing, statistical tests, and diagnostic tools. Perfect for statistical analysis and econometric modeling.

Features

Linear and GLM regression - OLS, GLM with multiple families, comprehensive diagnostics
Time series analysis - ARIMA, SARIMA, exponential smoothing, decomposition, forecasting
Advanced time series - Auto ARIMA selection, SARIMA models, comprehensive stationarity tests
Hypothesis testing - T-tests, ANOVA, chi-square, normality tests, non-parametric tests
Statistical diagnostics - Multicollinearity, heteroscedasticity, autocorrelation, influential points
Econometric modeling - VAR, VARMAX, cointegration tests, impulse response functions, Granger causality
Model selection - Stepwise selection, model comparison, information criteria
Model evaluation - Cross-validation, time series CV, multiple metrics, learning curves
Feature selection - VIF-based removal, correlation filtering
Data preprocessing - Missing value handling, outlier detection/removal, scaling, stationarity transformation
Visualization utilities - Comprehensive plotting functions for all analyses
Bayesian statistics - Bayesian inference, posterior distributions, Bayes factors
Panel data analysis - Fixed effects, random effects, Hausman test
Model persistence - Save/load models, model serialization, metadata management
Automated reporting - Generate comprehensive reports in TXT and HTML formats
Performance benchmarking - Model comparison, execution time profiling, memory usage

Technologies

Python 3.8+
Statsmodels
Pandas
NumPy
Matplotlib
Seaborn
Scikit-learn
SciPy
Jupyter Notebook

Installation

pip install -r requirements.txt

Usage

Linear Regression

from regression_analysis import LinearRegressionModel

# Create and fit model
model = LinearRegressionModel()
model.fit(X, y)
model.summary()

Time Series Analysis

from time_series_analysis import TimeSeriesModel

# Create and fit time series model
ts_model = TimeSeriesModel()
ts_model.fit(data)
ts_model.forecast(steps=10)

Hypothesis Testing

from hypothesis_testing import StatisticalTests

# Perform statistical tests
tests = StatisticalTests()
tests.t_test(data)
tests.chi_square_test(data)

Model Selection

from model_selection import ModelSelection

# Compare multiple models
selector = ModelSelection()
comparison = selector.compare_models(X, y, models_dict)

# Stepwise feature selection
features, model = selector.stepwise_selection(X, y)

Model Evaluation

from model_evaluation import ModelEvaluation

# Cross-validation
evaluator = ModelEvaluation()
cv_results = evaluator.cross_validate(X, y, model_func, cv_folds=5)

# Calculate metrics
metrics = evaluator.calculate_metrics(y_true, y_pred)

Advanced Time Series

from advanced_time_series import SARIMAModel, AutoARIMA

# SARIMA model
sarima = SARIMAModel()
sarima.fit(data, order=(1,1,1), seasonal_order=(1,1,1,12))

# Auto ARIMA selection
auto_arima = AutoARIMA()
best_model = auto_arima.auto_select(data)

Data Preprocessing

from data_preprocessing import DataPreprocessor

# Handle missing values and outliers
preprocessor = DataPreprocessor()
cleaned_data = preprocessor.remove_outliers(data)
scaled_data = preprocessor.scale_data(data, method='standard')

Visualization

from visualization_utils import StatisticalVisualizations

# Create comprehensive plots
viz = StatisticalVisualizations()
viz.plot_correlation_matrix(data)
viz.plot_residual_analysis(residuals, fitted_values)

Bayesian Statistics

from bayesian_statistics import BayesianAnalysis

# Bayesian t-test
result = BayesianAnalysis.bayesian_ttest(sample1, sample2)

# Bayesian linear regression
bayesian_result = BayesianAnalysis.bayesian_linear_regression(X, y)

Panel Data Analysis

from panel_data_analysis import PanelDataAnalysis

# Prepare and analyze panel data
panel = PanelDataAnalysis()
panel.prepare_panel_data(df, 'entity', 'time', ['X1', 'X2', 'y'])
fe_model = panel.fixed_effects_regression('y', ['X1', 'X2'])

Model Persistence

from model_persistence import ModelPersistence

# Save and load models
persistence = ModelPersistence()
persistence.save_model(model, 'my_model', metadata={'r_squared': 0.95})
loaded_model, metadata = persistence.load_model('saved_models/my_model.pkl')

Automated Reporting

from automated_reporting import AutomatedReport

# Generate comprehensive reports
reporter = AutomatedReport()
reporter.generate_regression_report(model, X, y)
reporter.save_report('analysis_report', format='html')

Performance Benchmarking

from performance_benchmarking import PerformanceBenchmark

# Benchmark model performance
benchmark = PerformanceBenchmark()
comparison = benchmark.compare_models(models_dict, X, y)

Project Structure

statsmodels-statistical/
├── README.md
├── requirements.txt
├── LICENSE
├── index.html
├── regression_analysis.py          # Linear and GLM regression
├── time_series_analysis.py         # Basic time series models
├── advanced_time_series.py         # SARIMA, Auto ARIMA
├── hypothesis_testing.py          # Statistical tests
├── statistical_diagnostics.py     # Model diagnostics
├── econometric_modeling.py        # VAR, cointegration
├── model_selection.py             # Model comparison, stepwise selection
├── model_evaluation.py            # Cross-validation, metrics
├── data_preprocessing.py          # Data cleaning, scaling
├── visualization_utils.py         # Advanced plotting
├── bayesian_statistics.py         # Bayesian inference
├── panel_data_analysis.py         # Panel data models
├── model_persistence.py           # Model saving/loading
├── automated_reporting.py          # Report generation
├── performance_benchmarking.py     # Performance profiling
├── notebooks/
│   ├── 01_linear_regression.ipynb
│   ├── 02_time_series.ipynb
│   ├── 03_hypothesis_testing.ipynb
│   └── 04_econometric_modeling.ipynb
├── data/
│   └── sample_data.csv
└── examples/
    ├── regression_example.py
    ├── time_series_example.py
    └── hypothesis_testing_example.py

Author

RSK World

Website: https://rskworld.in
Email: help@rskworld.in
Phone: +91 93305 39277

License

This project is provided as educational material for statistical modeling and analysis.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Statsmodels Statistical Modeling

Description

Features

Technologies

Installation

Usage

Linear Regression

Time Series Analysis

Hypothesis Testing

Model Selection

Model Evaluation

Advanced Time Series

Data Preprocessing

Visualization

Bayesian Statistics

Panel Data Analysis

Model Persistence

Automated Reporting

Performance Benchmarking

Project Structure

Author

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data		data
examples		examples
notebooks		notebooks
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
FEATURES.md		FEATURES.md
LICENSE		LICENSE
PROJECT_INFO.md		PROJECT_INFO.md
PROJECT_SUMMARY.md		PROJECT_SUMMARY.md
README.md		README.md
RELEASE_NOTES_v1.0.0.md		RELEASE_NOTES_v1.0.0.md
UNIQUE_FEATURES.md		UNIQUE_FEATURES.md
advanced_time_series.py		advanced_time_series.py
automated_reporting.py		automated_reporting.py
bayesian_statistics.py		bayesian_statistics.py
data_preprocessing.py		data_preprocessing.py
econometric_modeling.py		econometric_modeling.py
hypothesis_testing.py		hypothesis_testing.py
index.html		index.html
model_evaluation.py		model_evaluation.py
model_persistence.py		model_persistence.py
model_selection.py		model_selection.py
panel_data_analysis.py		panel_data_analysis.py
performance_benchmarking.py		performance_benchmarking.py
regression_analysis.py		regression_analysis.py
requirements.txt		requirements.txt
statistical_diagnostics.py		statistical_diagnostics.py
statsmodels-statistical.png		statsmodels-statistical.png
time_series_analysis.py		time_series_analysis.py
visualization_utils.py		visualization_utils.py

Folders and files

Latest commit

History

Repository files navigation

Statsmodels Statistical Modeling

Description

Features

Technologies

Installation

Usage

Linear Regression

Time Series Analysis

Hypothesis Testing

Model Selection

Model Evaluation

Advanced Time Series

Data Preprocessing

Visualization

Bayesian Statistics

Panel Data Analysis

Model Persistence

Automated Reporting

Performance Benchmarking

Project Structure

Author

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages