To validate the MD_agric_df dataset against weather station data, ensuring its accuracy and reliability for agricultural insights.
Built an automated data pipeline for seamless ingestion and cleaning of the MD_agric_df and weather datasets, significantly enhancing code readability and maintainability.
Conducted hypothesis testing to evaluate the representation of the MD_agric_df dataset against actual weather conditions, focusing on both means and variances of the distributions. This involved:
Creating a null hypothesis. Cleaning and importing the MD_agric_df dataset. Mapping and comparing it with nearby weather station data. Performing t-tests to interpret results and validate data reliability. Data Quality Checks: Implemented rigorous data validation tests using Python and pytest, checking for:
Valid column names. Non-negative elevation values. Valid crop types and positive rainfall measurements.
Python, Pandas, pytest, Jupyter Notebook for exploratory data analysis.