Dataprep lets you prepare your data using a single library with a few lines of code.
Currently, you can use
- Collect data from common data sources (through
- Do your exploratory data analysis (through
- ...more modules are coming
pip install dataprep
Examples & Usages
The following examples can give you an impression of what dataprep can do:
- Documentation: Data Connector
- Documentation: EDA
- EDA Case Study: Titanic
- EDA Case Study: House Price
There are common tasks during the exploratory data analysis stage, like a quick look at the columnar distribution, or understanding the correlations between columns.
The EDA module categorizes these EDA tasks into functions helping you finish EDA tasks with a single function call.
- Want to understand the distributions for each DataFrame column? Use
- Want to understand the correlation between columns? Use
- Or, if you want to understand the impact of the missing values for each column, use
- You can drill down to get more information by given
plot_missinga column name. E.g. for
Don't forget to checkout the examples folder for detailed demonstration!
You can download Yelp business search result into a pandas DataFrame, using two lines of code, without taking deep looking into the Yelp documentation! Moreover, Data Connector will automatically do the pagination for you so that you can specify the desire count of the returned results without even considering the count-per-request restriction from the API.
The code requests 120 records even though Yelp restricts you can only fetch 50 per request.
There are many ways to contribute to Dataprep.
- Submit bugs and help us verify fixes as they are checked in.
- Review the source code changes.
- Engage with other Dataprep users and developers on StackOverflow.
- Help each other in the Dataprep Community Discord and Mail list & Forum.
- Contribute bug fixes.
- Providing use cases and writing down your user experience.
Please take a look at our wiki for development documentations!