Tea is a domain specific programming language that automates statistical test selection and execution. Tea is currently written in/for Python.
Tea has an academic research paper.
Users provide 5 pieces of information:
- the dataset of interest,
- the variables in the dataset they want to analyze,
- the study design (e.g., independent, dependent variables),
- the assumptions they make about the data based on domain knowledge(e.g., a variable is normally distributed), and
- a hypothesis.
Tea then "compiles" these into logical constraints to select valid statistical tests. Tests are considered valid if and only if all the assumptions they make about the data (e.g., normal distribution, equal variance between groups, etc.) hold. Tea then finally executes the valid tests.
Tea currently provides a module to conduct Null Hypothesis Significance Testing (NHST).
We are actively working on expanding the kinds of analyses Tea can support. Some ideas we have: linear modeling and Bayesian inference.
pip install tealang
See community examples here. If you have trouble using Tea with your use case, feel free to open an issue, and we'll try to help.
Step through a more guided, thorough documentation and a worked example.
For now, please cite:
article{JunEtAl2019:Tea,
title={Tea: A High-level Language and Runtime System for Automating Statistical Analysis},
author={Jun, Eunice and Daum, Maureen and Roesch, Jared and Chasins, Sarah E. and Berger, Emery D. and Just, Rene and Reinecke, Katharina},
journal={Proceedings of the 32nd Annual ACM Symposium on User Interface Software and Technology (UIST)},
year={2019}
}
Tea is currently a research prototype. Our constraint solver is based on statistical texts (see our paper for more info).
If you find any bugs, please let us know (email Eunice at emjun [at] cs.washington.edu)!
This is great! We're excited to have new collaborators. :)
To contribute code, please see docs and gudielines and open an issue or pull request.
If you want to use Tea for a project, talk about Tea's design, or anything else, please get in touch: emjun [at] cs.washington.edu!
Please find more information at our website.
Please reach out! We are nice :) Email Eunice at emjun [at] cs.washington.edu!
Python is a common language for data science. We hope Tea can easily integrate into user workflows.
Tea accepts data either as a CSV or a Pandas DataFrame. Tea asumes data is in "long format."