Skip to content

Agricultural dataset validated using python code for usage. Building a data pipeline that will ingest and clean data with the press of a button.

Notifications You must be signed in to change notification settings

ngangawairimu/Data-Validation-using-python

Repository files navigation

Data Validation Project

Objective:

To validate the MD_agric_df dataset against weather station data, ensuring its accuracy and reliability for agricultural insights.

Key Steps:

Data Pipeline Development:

Built an automated data pipeline for seamless ingestion and cleaning of the MD_agric_df and weather datasets, significantly enhancing code readability and maintainability.

Hypothesis Testing:

Conducted hypothesis testing to evaluate the representation of the MD_agric_df dataset against actual weather conditions, focusing on both means and variances of the distributions. This involved:

Creating a null hypothesis. Cleaning and importing the MD_agric_df dataset. Mapping and comparing it with nearby weather station data. Performing t-tests to interpret results and validate data reliability. Data Quality Checks: Implemented rigorous data validation tests using Python and pytest, checking for:

Correct DataFrame shapes.

Valid column names. Non-negative elevation values. Valid crop types and positive rainfall measurements.

Tools Used:

Python, Pandas, pytest, Jupyter Notebook for exploratory data analysis.

About

Agricultural dataset validated using python code for usage. Building a data pipeline that will ingest and clean data with the press of a button.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published