quickstatandeda

quickstatandeda is a Python library for quick and automatic exploratory data analysis and preliminary statistics analysis. The outputs of the main edaFeatures() function are a folder of visualizations and a html file that contains all analyses. This library is built based on mainstream libraries like numpy, pandas, scipy, statsmodel, matplotlib, and seaborn.

Make sure the data types of your input dataframe are correctly converted! Use pd.to_datetime() and astype() functions to convert the data type. Here is a simple example:

import pandas as pd
x = pd.read_csv('xxx.csv')

x['string_column'] = x['string_column'].astype('string')
x['int_column'] = x['int_column'].astype('int')
x['float_column'] = x['float_column'].astype('float')
x['date_time_column'] = pd.to_datetime(x['date_time_column'])
x['binary_column'] = x['binary_column'].replace({'True':True, 'False':False}).astype('bool')
x['categorical_column'] = x['categorical_column'].astype('category')
x['date_column'] = pd.to_datetime(x['date_column'])
x['datetime_column'] = pd.to_datetime(x['datetime_column'])
x['datetime_tz_column'] = x['datetime_column'].dt.tz_localize('UTC')

Note that the t tests are conducted only for binary variable (columns with data type object and have only two unique values). If you have categorical variables with unique values greater than 2, please try to pd.get_dummies() and loc[] functions to convert them to binary ones. Here is a simple example:

import pandas as pd

df = pd.DataFrame({
    'a':['a','b','c']
    })

df = pd.get_dummies(data=df)

df.loc[df.a==1,'a'] = 'a'
df.loc[df.a==0, 'a'] = 'not a'

Installation

Use the package manager pip to install quickstatandeda.

python3 -m pip install quickstatandeda

If there are some version conflicts, try creating a new virtual environment or use pip install --upgrade <package_name> to upgrade the required package.

Usage

Here is a simple example to generate an analysis report using the edaFeatures function:

import pandas as pd
from quickstatandeda import edaFeatures

x = pd.read_csv('xxx.csv')
y = 'target_column'
id = 'id_column_for_paired_t_test'
save_path = 'path_to_save_the_output_files'
significant_level = 0.05
file_name = 'name_of_the_output_html_file'

edaFeatures(x, y, id, save_path, significant_level, file_name)

The outputs are structured as following:

├── <file_name>.html
├── _visuals
│   ├── <plot1>.png
│   ├── <plot2>.png
│   ├── <plot3>.png
│   └── ...

A visuals folder is created automatically to save all the visuals used in the html output file, and both the html file and the visuals folder are presented in the save_path input parameter.

Contributing

If you find a bug 🐛 or want to make some major or minor changes, please open an issue in the GitHub repository to discuss. You are also more than welcome to contact me directly. Please feel free to fork the project, make any changes, and submit and pull request if you want to make some major changes.

Note that a simple test file is provided in the test folder. After making changes, you can simply run pytest test/ at the main folder level to test the package script. It might take more than 8 minutes to test the package.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
.github/workflows		.github/workflows
quickstatandeda		quickstatandeda
test		test
.DS_Store		.DS_Store
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

quickstatandeda

Installation

Usage

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

quickstatandeda

Installation

Usage

Contributing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages