Statistics Open For All the Python Library.
sofastats_lib is a Python library for statistical analysis and reporting
based on the design of the SOFA Statistics package.
The goal was keep all the tried-and-true elements of the original package but without any of the original design mistakes.
We also wanted to make SOFA more friendly for beginners.
Let us know if you think we succeeded grant@sofastatistics.com.
The sofastats distribution package sits on top of sofastats_lib distribution package and adds a web GUI.
See How UX Can Improve Your Python Project by Grant and Charlotte Paton-Simpson
for how we're trying to improve the User Experience (UX) for SOFA users, and for a peek at the GUI.
Configure a Design object e.g. a CrossTabDesign, a SimpleBarChartDesign, or an AnovaDesign.
See API reference for Charts, Tables, and Statistical Tests
for the full list.
Use the Design's make_output() method to make the output as an HTML file.
If the Design is for a statistical test (vs a chart or table) you can also use the to_result() method
to generate a results dataclass
which you can extract details from or just print.
There are three main settings types:
- Inputs - where is the data source for the design? What sort of data source is it?
- Outputs - where should the output go and what should it look like?
- Analysis Details - which variables are involved and is there any special sorting required - for example, so that '<20' is before '20-39' even though it is the other way round in the default alphabetical order (see Sorting).
There are three alternatives so you need to select one and provide the necessary details:
- CSV - data will be ingested into internal sofastats SQLite database
(
source_table_nameoptional - later analyses might be referring to that ingested table so you might as well give it a friendly name) cur,database_engine_name, andsource_table_name- or just a
source_table_name(assumed to be using internal sofastats SQLite database)
Full API here: API reference for Charts, Tables, and Statistical Tests
The main setting needed is the specific file location for the HTML output. This is optional, but you may want to control where the file goes.
Full API here: API reference for Charts, Tables, and Statistical Tests
For example, the SimpleBarChartDesign requires variable_name at the minimum.
This is the hardest step, and we've tried really hard to provide useful information in the output
to make this as easy as possible. If you can think of better ways of providing output or explaining results
let us know at grant@sofastatistics.com.
from sofastats.conf.main import SortOrder
from sofastats.output.charts.bar import SimpleBarChartDesign
chart_design = SimpleBarChartDesign(
csv_file_path='/path/to/csv',
output_file_path= '/path/to/output/demo_simple_bar_chart_from_csv.html',
output_title="Simple Bar Chart (Frequencies)",
show_in_web_browser=True,
sort_orders_yaml_file_path='/path/to/sort_orders_yaml_file_path',
style_name='default',
category_field_name='Age Group',
category_sort_order=SortOrder.CUSTOM,
rotate_x_labels=False,
show_borders=False,
show_n_records=True,
x_axis_font_size=12,
)
chart_design.make_output()from sofastats.conf.main import SortOrder
from sofastats.output.stats.anova import AnovaDesign
stats_design = AnovaDesign(
csv_file_path='/path/to/csv',
output_file_path='/path/to/output/demo_anova_age_by_country.html',
output_title='ANOVA',
show_in_web_browser=True,
sort_orders_yaml_file_path='/path/to/sort_orders_yaml_file_path',
style_name='prestige_screen',
grouping_field_name='Country',
group_values=['South Korea', 'NZ', 'USA'],
measure_field_name='Age',
high_precision_required=False,
decimal_points=3,
)
stats_design.make_output()
print(stats_design.to_result())See sofastats_lib/examples
Install the sofastats_examples library and run the demo scripts - fake data and a sort order YAML file are included so you can see the code in operation for every chart, table, and statistical report Design type.
API reference for Charts, Tables, and Statistical Tests
The best choice for a modern Python package manager if you don't have an existing preference is uv.
Start by installing uv as per: Installing uv
Create a project. If you’re using uv, the commands would be:
cd /my/projects/folder
For example:
cd ~/projectsThen initialise project. Continuing with uv:
uv init my_project_name
For example:
We can check what uv has made by looking at contents of the new project folder, in this case using the tree command:
As you can see, a lot of boilerplate has been set up.
Don’t worry if you can’t use the tree command on your machine – we just wanted to show you what has been made by uv.
Inside the demo folder, add sofastats_lib to the demo project. Here’s how you do it with uv:
Now sofastats_lib is installed as a library ready to use somewhere under demo like
demo/.venv/lib/python3.13/site-packages/sofastats_lib/


