Statistical modeling with Statsmodels including regression analysis, time series models, hypothesis testing, and statistical tests.
This project demonstrates Statsmodels, a library for statistical modeling and econometrics in Python. It covers linear and generalized linear models, time series analysis, hypothesis testing, statistical tests, and diagnostic tools. Perfect for statistical analysis and econometric modeling.
- Linear and GLM regression - OLS, GLM with multiple families, comprehensive diagnostics
- Time series analysis - ARIMA, SARIMA, exponential smoothing, decomposition, forecasting
- Advanced time series - Auto ARIMA selection, SARIMA models, comprehensive stationarity tests
- Hypothesis testing - T-tests, ANOVA, chi-square, normality tests, non-parametric tests
- Statistical diagnostics - Multicollinearity, heteroscedasticity, autocorrelation, influential points
- Econometric modeling - VAR, VARMAX, cointegration tests, impulse response functions, Granger causality
- Model selection - Stepwise selection, model comparison, information criteria
- Model evaluation - Cross-validation, time series CV, multiple metrics, learning curves
- Feature selection - VIF-based removal, correlation filtering
- Data preprocessing - Missing value handling, outlier detection/removal, scaling, stationarity transformation
- Visualization utilities - Comprehensive plotting functions for all analyses
- Bayesian statistics - Bayesian inference, posterior distributions, Bayes factors
- Panel data analysis - Fixed effects, random effects, Hausman test
- Model persistence - Save/load models, model serialization, metadata management
- Automated reporting - Generate comprehensive reports in TXT and HTML formats
- Performance benchmarking - Model comparison, execution time profiling, memory usage
- Python 3.8+
- Statsmodels
- Pandas
- NumPy
- Matplotlib
- Seaborn
- Scikit-learn
- SciPy
- Jupyter Notebook
pip install -r requirements.txtfrom regression_analysis import LinearRegressionModel
# Create and fit model
model = LinearRegressionModel()
model.fit(X, y)
model.summary()from time_series_analysis import TimeSeriesModel
# Create and fit time series model
ts_model = TimeSeriesModel()
ts_model.fit(data)
ts_model.forecast(steps=10)from hypothesis_testing import StatisticalTests
# Perform statistical tests
tests = StatisticalTests()
tests.t_test(data)
tests.chi_square_test(data)from model_selection import ModelSelection
# Compare multiple models
selector = ModelSelection()
comparison = selector.compare_models(X, y, models_dict)
# Stepwise feature selection
features, model = selector.stepwise_selection(X, y)from model_evaluation import ModelEvaluation
# Cross-validation
evaluator = ModelEvaluation()
cv_results = evaluator.cross_validate(X, y, model_func, cv_folds=5)
# Calculate metrics
metrics = evaluator.calculate_metrics(y_true, y_pred)from advanced_time_series import SARIMAModel, AutoARIMA
# SARIMA model
sarima = SARIMAModel()
sarima.fit(data, order=(1,1,1), seasonal_order=(1,1,1,12))
# Auto ARIMA selection
auto_arima = AutoARIMA()
best_model = auto_arima.auto_select(data)from data_preprocessing import DataPreprocessor
# Handle missing values and outliers
preprocessor = DataPreprocessor()
cleaned_data = preprocessor.remove_outliers(data)
scaled_data = preprocessor.scale_data(data, method='standard')from visualization_utils import StatisticalVisualizations
# Create comprehensive plots
viz = StatisticalVisualizations()
viz.plot_correlation_matrix(data)
viz.plot_residual_analysis(residuals, fitted_values)from bayesian_statistics import BayesianAnalysis
# Bayesian t-test
result = BayesianAnalysis.bayesian_ttest(sample1, sample2)
# Bayesian linear regression
bayesian_result = BayesianAnalysis.bayesian_linear_regression(X, y)from panel_data_analysis import PanelDataAnalysis
# Prepare and analyze panel data
panel = PanelDataAnalysis()
panel.prepare_panel_data(df, 'entity', 'time', ['X1', 'X2', 'y'])
fe_model = panel.fixed_effects_regression('y', ['X1', 'X2'])from model_persistence import ModelPersistence
# Save and load models
persistence = ModelPersistence()
persistence.save_model(model, 'my_model', metadata={'r_squared': 0.95})
loaded_model, metadata = persistence.load_model('saved_models/my_model.pkl')from automated_reporting import AutomatedReport
# Generate comprehensive reports
reporter = AutomatedReport()
reporter.generate_regression_report(model, X, y)
reporter.save_report('analysis_report', format='html')from performance_benchmarking import PerformanceBenchmark
# Benchmark model performance
benchmark = PerformanceBenchmark()
comparison = benchmark.compare_models(models_dict, X, y)statsmodels-statistical/
├── README.md
├── requirements.txt
├── LICENSE
├── index.html
├── regression_analysis.py # Linear and GLM regression
├── time_series_analysis.py # Basic time series models
├── advanced_time_series.py # SARIMA, Auto ARIMA
├── hypothesis_testing.py # Statistical tests
├── statistical_diagnostics.py # Model diagnostics
├── econometric_modeling.py # VAR, cointegration
├── model_selection.py # Model comparison, stepwise selection
├── model_evaluation.py # Cross-validation, metrics
├── data_preprocessing.py # Data cleaning, scaling
├── visualization_utils.py # Advanced plotting
├── bayesian_statistics.py # Bayesian inference
├── panel_data_analysis.py # Panel data models
├── model_persistence.py # Model saving/loading
├── automated_reporting.py # Report generation
├── performance_benchmarking.py # Performance profiling
├── notebooks/
│ ├── 01_linear_regression.ipynb
│ ├── 02_time_series.ipynb
│ ├── 03_hypothesis_testing.ipynb
│ └── 04_econometric_modeling.ipynb
├── data/
│ └── sample_data.csv
└── examples/
├── regression_example.py
├── time_series_example.py
└── hypothesis_testing_example.py
RSK World
- Website: https://rskworld.in
- Email: help@rskworld.in
- Phone: +91 93305 39277
This project is provided as educational material for statistical modeling and analysis.