
### **1. General Data Integrity**
#### **Test Case 1.1: Missing Values**
- **Objective**: Check for missing values in critical fields.
- **Critical Fields**: `farmer_name`, `program_year`, `plantation_type_dense_fruit`, `total_land_area_acre`, `area_f4f_acre`, `water_available`, `electricity_available`.
- **Expected Result**: No missing values in critical fields.

#### **Test Case 1.2: Duplicate Records**
- **Objective**: Verify that there are no duplicate entries for the same `uid` or combination of `farmer_name` and `District`.
- **Expected Result**: Each `uid` is unique, and no duplicate combinations of `farmer_name` and `District`.

#### **Test Case 1.3: Data Type Validation**
- **Objective**: Ensure all columns have the correct data types.
  - `program_year`: Integer
  - `farmer_name`: String
  - `total_land_area_acre` and `area_f4f_acre`: Numeric
- **Expected Result**: Data types are consistent with column descriptions.

---

### **2. Logical Consistency**
#### **Test Case 2.1: Total Land Area vs. Plantation Area**
- **Objective**: Verify that `area_f4f_acre` does not exceed `total_land_area_acre`.
- **Expected Result**: For every record, `area_f4f_acre ≤ total_land_area_acre`.

#### **Test Case 2.2: Water and Electricity Availability**
- **Objective**: Validate `water_available` and `electricity_available` fields contain valid values (`Yes` or `No`).
- **Expected Result**: Only `Yes` or `No` values are present.

#### **Test Case 2.3: Plant Count Validation**
- **Objective**: Ensure the total plant count across all species is between 350 and 450.
- **Columns to Check**: Sum of columns like `bhendi`, `shirish`, `ain`, etc.
- **Expected Result**: Total plant count lies between 350 and 450.

---

### **3. Program Workflow Validation**
#### **Test Case 3.1: Stage Progression**
- **Objective**: Verify that the program stages are completed in the correct sequence.
  - Payment collected before training (`payment_status` should precede `training_status`).
  - Training completed before contract (`training_status = Completed` before `contract_status = Signed`).
  - Distribution happens after contract.
- **Expected Result**: Workflow follows the expected sequence.

#### **Test Case 3.2: Dates Validation**
- **Objective**: Ensure all recorded dates follow a logical order.
  - `payment_date` ≤ `training_date` ≤ `contract_date` ≤ `distribution_date` ≤ `plantation_date`.
- **Expected Result**: Dates are sequential and do not violate logical order.

---

### **4. Specific Field Validations**
#### **Test Case 4.1: Program Year Validation**
- **Objective**: Ensure `program_year` is within a valid range (e.g., 2020–2025).
- **Expected Result**: `program_year` falls within the specified range.

#### **Test Case 4.2: UID Format**
- **Objective**: Validate that `uid` follows a specific pattern (e.g., `id_<number>`).
- **Expected Result**: All `uid` values match the required format.

#### **Test Case 4.3: District and Block Validation**
- **Objective**: Verify that `District` and `Block` values are valid entries from a predefined list.
- **Expected Result**: All `District` and `Block` values exist in the reference list.

---

### **5. Document Upload Status**
#### **Test Case 5.1: Document Upload Completeness**
- **Objective**: Ensure all required documents are uploaded.
  - Columns like `document_upload_status` should be `Complete` for the record to proceed.
- **Expected Result**: `document_upload_status = Complete` for all valid records.

---

### **6. Summary Checks**
#### **Test Case 6.1: Farmer Data Completeness**
- **Objective**: Verify that no critical data is missing for any farmer.
- **Critical Columns**: `farmer_name`, `District`, `total_land_area_acre`, `area_f4f_acre`, `water_available`, `electricity_available`.
- **Expected Result**: Records with missing critical data are flagged.

#### **Test Case 6.2: Logical Area Usage**
- **Objective**: Ensure the sum of `area_f4f_acre` across all farmers does not exceed the sum of `total_land_area_acre`.
- **Expected Result**: Aggregate plantation area ≤ aggregate total land area.

In [1]:
# import essential library
import pandas as pd

In [2]:
# importing dataset
df = pd.read_excel("q1_data.xlsx")
df.head()

Unnamed: 0,sr_no,uid,program_year,farmer_name,plantation_type_dense_fruit,total_land_area_acre,area_f4f_acre,District,Block,water_available,...,bhendi,shirish,ain,pimpal,vad,tamhan,waval,palas,babhul,bakul
0,1,id_1,2023,farmer_b,Fruit Tree,5,4,A,p,Yes,...,1,0,1,1,1,0,0,1,0,1
1,2,id_2,2023,farmer_c,Fruit Tree,5,30,A,p,Yes,...,1,0,1,1,1,1,0,0,0,1
2,3,id_3,2023,farmer_d,Fruit Tree,7,5,A,p,Yes,...,0,2,0,3,0,0,3,3,0,3
3,4,id_4,2023,farmer_e,Fruit Tree,9,4,A,p,Yes,...,0,4,0,4,0,0,3,3,0,4
4,5,id_5,2023,farmer_f,Fruit Tree,7,4,A,p,Yes,...,3,3,3,0,0,3,0,3,0,3


In [3]:
# checking the length of DataFrame
len(df)

55

In [4]:
# checking data integrity
# checking null values
df.isnull().sum().head(35)

sr_no                                     0
uid                                       0
program_year                              0
farmer_name                               0
plantation_type_dense_fruit               0
total_land_area_acre                      0
area_f4f_acre                             0
District                                  0
Block                                     0
water_available                           0
electricity_available                     0
kml_uploaded                              0
contract uploaded                         0
land_record_uploaded                      0
cc_training_uploaded?                     0
soil_sample_collected?                    0
cc_training_date                          0
drone_ortho_taken                         0
farmer_payment_collected                  0
farmer_payment_date                      35
amount                                   35
mode_collection_cash_upi_banktransfer    35
contract_date                   

In [5]:
# checking datatypes
df.dtypes.head(30)

sr_no                                             int64
uid                                              object
program_year                                      int64
farmer_name                                      object
plantation_type_dense_fruit                      object
total_land_area_acre                              int64
area_f4f_acre                                     int64
District                                         object
Block                                            object
water_available                                  object
electricity_available                            object
kml_uploaded                                     object
contract uploaded                                object
land_record_uploaded                             object
cc_training_uploaded?                            object
soil_sample_collected?                           object
cc_training_date                         datetime64[ns]
drone_ortho_taken                               

In [7]:
# checking Logical consistency
# Total land area vs plantation Area
df = df[df['total_land_area_acre']>df['area_f4f_acre']]