# Data Visualization

This notebook focuses on visualizing the survey data to provide insights into the socioeconomic and agricultural characteristics of the surveyed households.

Additional visualizations have been provided along the data analyisis in previous notebooks.

Overview of the visualization process in this notebook:

Renaming variables: standardizing and renaming columns for clarity in presentation and figures.

Exploratory figures: creating visualizations that summarize land use, production, household demographics, and other key survey variables.

Access analysis: generating plots to explore relationships between access to agri-climate information and advisory services, and factors such as education level and maize yield.

In [33]:
# import libraries
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import pandas as pd
import plotly.express as px

Load cleaned merged dataset

In [34]:
merged = pd.read_csv('../Data/cleaned/merged_std_all.csv')
merged.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 25 columns):
 #   Column                         Non-Null Count  Dtype  
---  ------                         --------------  -----  
 0   hhid                           100 non-null    int64  
 1   land_total_acres               100 non-null    float64
 2   land_cultivated_acres          100 non-null    float64
 3   maize_harvest_qty              100 non-null    float64
 4   maize_unit                     100 non-null    object 
 5   livestock_owned                100 non-null    object 
 6   livestock_types                100 non-null    object 
 7   usedom                         100 non-null    object 
 8   climate_info_received          100 non-null    object 
 9   hh_head_gender                 100 non-null    object 
 10  hh_size_total                  100 non-null    int64  
 11  hh_head_education              100 non-null    object 
 12  latitude                       100 non-null    floa

The initial step to visualize the data is to rename the variables in the dataset so they look smooth and clear on the plots

In [35]:
# Dictionary with spaces
rename_dict_spaces = {
    'hhid': 'Household ID',
    'land_total_acres': 'Total Land Acres',
    'land_cultivated_acres': 'Cultivated Land Acres',
    'maize_harvest_qty': 'Maize Harvest Quantity',
    'maize_unit': 'Maize Quantity Unit',
    'livestock_owned': 'Owns Livestock',
    'livestock_types': 'Livestock Types',
    'usedom': 'Organic Matter Used',
    'climate_info_received': 'Received Climate Info',
    'hh_head_gender': 'Household Head Gender',
    'hh_size_total': 'Household Size',
    'hh_head_education': 'Household Head Education',
    'latitude': 'Latitude',
    'longitude': 'Longitude',
    'orgfert_quantity_2024': 'Organic Fertilizer Quantity 2024',
    'maize_harvest_qty_kg': 'Maize Harvested (Kg)',
    'climate_info_received_std': 'Climate Info Received Standardized',
    'maize_yield_ton_acre': 'Maize Yield (Ton/Acre)',
    'orgfert_qty_2024_kg_total': 'Total Organic Fertilizer (Kg) 2024',
    'orgfert_names_2024': 'Organic Fertilizer Names 2024',
    'orgfert_qty_2024_kg': 'Organic Fertilizer Qty (Kg) 2024',
    'climate_info_received_num': 'Climate Info Received (Binary)',
    'livestock_owned_bin': 'Owns Livestock (Binary)',
    'hh_head_education_class': 'Houshold Head Education Classified',
    'orgfert_qty_2024_kg_total_std': 'Total Organic Fertilizer (Kg) Standaridized'
}

# Apply renaming
merged.rename(columns=rename_dict_spaces, inplace=True)

# Check result
print(merged.columns)


Index(['Household ID', 'Total Land Acres', 'Cultivated Land Acres',
       'Maize Harvest Quantity', 'Maize Quantity Unit', 'Owns Livestock',
       'Livestock Types', 'Organic Matter Used', 'Received Climate Info',
       'Household Head Gender', 'Household Size', 'Household Head Education',
       'Latitude', 'Longitude', 'Organic Fertilizer Quantity 2024',
       'Maize Harvested (Kg)', 'Climate Info Received Standardized',
       'Maize Yield (Ton/Acre)', 'Total Organic Fertilizer (Kg) 2024',
       'Organic Fertilizer Names 2024', 'Organic Fertilizer Qty (Kg) 2024',
       'Total Organic Fertilizer (Kg) Standaridized',
       'Climate Info Received (Binary)', 'Owns Livestock (Binary)',
       'Houshold Head Education Classified'],
      dtype='object')


The next composed figure provides a comprehensive overview of the socioeconomic and agricultural characteristics of the surveyed households. 

The first row shows the distribution of household education levels. The household size distribution indicates the typical family size in the sample, and the gender distribution by education reveals differences in educational attainment between male and female household heads. 

The second row explores agricultural patterns in relation to households that receive climate information and agricultural advisory ("Yes"): total versus cultivated land shows how much land is actively farmed depending on whether households received advisory; maize yield versus cultivated land illustrates productivity trends; and maize yield versus organic fertilizer use highlights the relationship between input usage and crop performance. 

Overall, the plots demonstrate the interplay between household characteristics, land use, and farming outcomes, while also showing how access to advisory services may influence agricultural practices.

The user can interactively explore the data by hovering over the figures or legend.

In [36]:

# ---- Education Order ----
education_order = [
    'Primary',
    'Secondary',
    'Undergraduate degree',
    'Diploma',
    'Certificate',
    'Postgraduate degree'
]

merged['Household Head Education'] = pd.Categorical(
    merged['Household Head Education'],
    categories=education_order,
    ordered=True
)

# ---- Custom color mapping ----
climate_color_map = {'Yes': 'lightblue', 'No': 'orange'}
gender_color_map = {'Male': 'steelblue', 'Female': 'orange'}

# ---- Create Subplot Layout (2 rows × 3 cols) ----
fig = make_subplots(
    rows=2, cols=3,
    subplot_titles=[
        "Distribution of Education Levels",
        "Distribution of Household Size",
        "Household Gender by Education",
        "Cultivated vs Total Land by Climate Info",
        "Yield vs Cultivated Land by Climate Info",
        "Yield vs Organic Fertilizer by Climate Info"
    ]
)

# ================= ROW 1 =================

# Col 1: Distribution of Education Levels
edu_counts = merged['Household Head Education'].value_counts().reindex(education_order)
fig.add_trace(
    go.Bar(
        x=edu_counts.index,
        y=edu_counts.values,
        marker=dict(color='navajowhite', line=dict(width=1, color='black')),
        name="Education Levels"
    ),
    row=1, col=1
)

# Col 2: Distribution of Household Size
fig.add_trace(
    go.Histogram(
        x=merged['Household Size'],
        marker=dict(color='navajowhite', line=dict(width=1, color='black')),
        name="Household Size"
    ),
    row=1, col=2
)

# Col 3: Household Gender by Education
gender_counts = merged.groupby(
    ['Household Head Education', 'Household Head Gender']
).size().reset_index(name='Count')

for gender in gender_counts['Household Head Gender'].unique():
    subset = gender_counts[gender_counts['Household Head Gender'] == gender]
    fig.add_trace(
        go.Bar(
            x=subset['Household Head Education'],
            y=subset['Count'],
            name=gender,
            marker=dict(color=gender_color_map.get(gender, 'gray'),
                        line=dict(width=0.5, color='black'))
        ),
        row=1, col=3
    )

# ================= ROW 2 =================

# Col 1: Cultivated vs Total Land
for val in merged['Climate Info Received Standardized'].unique():
    subset = merged[merged['Climate Info Received Standardized'] == val]
    fig.add_trace(
        go.Scatter(
            x=subset['Total Land Acres'],
            y=subset['Cultivated Land Acres'],
            mode='markers',
            name=f"Cultivated land vs total land: {val}",
            marker=dict(color=climate_color_map.get(val, 'gray'),
                        size=6,
                        line=dict(width=0.5, color='black'))
        ),
        row=2, col=1
    )

# Col 2: Yield vs Cultivated Land
for val in merged['Climate Info Received Standardized'].unique():
    subset = merged[merged['Climate Info Received Standardized'] == val]
    fig.add_trace(
        go.Scatter(
            x=subset['Cultivated Land Acres'],
            y=subset['Maize Yield (Ton/Acre)'],
            mode='markers',
            name=f"Yield vs cultivated land: {val}",
            marker=dict(color=climate_color_map.get(val, 'gray'),
                        size=6,
                        line=dict(width=0.5, color='black'))
        ),
        row=2, col=2
    )

# Col 3: Yield vs Organic Fertilizer
for val in merged['Climate Info Received Standardized'].unique():
    subset = merged[merged['Climate Info Received Standardized'] == val]
    fig.add_trace(
        go.Scatter(
            x=subset['Total Organic Fertilizer (Kg) 2024'],
            y=subset['Maize Yield (Ton/Acre)'],
            mode='markers',
            name=f"Yield vs org. fertilizer: {val}",
            marker=dict(color=climate_color_map.get(val, 'gray'),
                        size=6,
                        line=dict(width=0.5, color='black'))
        ),
        row=2, col=3
    )

# ================= LAYOUT SETTINGS =================
fig.update_layout(
    plot_bgcolor='white',
    paper_bgcolor='white',
    title="Socioeconomic and Agricultural Data Overview",
    height=800, width=1400,
   legend=dict(
        x=1.1,        # slightly outside the right edge
        y=0.3,           # bottom
        xanchor='left',  # align legend's left side to x
        yanchor='bottom',
        bgcolor='rgba(255,255,255,0.8)',  # optional semi-transparent background
        bordercolor='white',
        borderwidth=1
    ),
    margin=dict(r=200)
)

# Axis labels
fig.update_xaxes(title_text="Education Level", row=1, col=1)
fig.update_yaxes(title_text="Number of Households", row=1, col=1)

fig.update_xaxes(title_text="Household Size", row=1, col=2)
fig.update_yaxes(title_text="Frequency", row=1, col=2)

fig.update_xaxes(title_text="Education Level", row=1, col=3)
fig.update_yaxes(title_text="Number of Households", row=1, col=3)

fig.update_xaxes(title_text="Total Land Acres", row=2, col=1)
fig.update_yaxes(title_text="Cultivated Land Acres", row=2, col=1)

fig.update_xaxes(title_text="Cultivated Land Acres", row=2, col=2)
fig.update_yaxes(title_text="Maize Yield (Ton/Acre)", row=2, col=2)

fig.update_xaxes(title_text="Organic Fertilizer (Kg)", row=2, col=3)
fig.update_yaxes(title_text="Maize Yield (Ton/Acre)", row=2, col=3)

# Black axis lines
fig.update_xaxes(showline=True, linecolor='black', mirror=True)
fig.update_yaxes(showline=True, linecolor='black', mirror=True)

fig.show()







The figure below illustrates the distribution of maize yield (tons per acre) across households, grouped by the education level of the household head. Each point represents a household, with colors indicating whether climate information and agricultural advisory was received (“Yes” in light blue, “No” in orange). This visualization highlights how maize productivity varies with education and access to advisory services.

In [37]:

# education order
education_order = [
    'Primary',
    'Secondary',
    'Undergraduate degree',
    'Diploma',
    'Certificate',
    'Postgraduate degree'
]

# Convert the column to categorical with the correct order
merged['Household Head Education'] = pd.Categorical(
    merged['Household Head Education'],
    categories=education_order,
    ordered=True
)

# Custom color mapping
color_map = {
    'Yes': 'lightblue',
    'No': 'orange'
}

# Boxplot: aligned points, ordered by education
fig = px.box(
    merged,
    x='Household Head Education',
    y='Maize Yield (Ton/Acre)',
    color='Climate Info Received Standardized',
    color_discrete_map=color_map,
    category_orders={'Household Head Education': education_order},
    hover_data=['Household ID', 'Total Land Acres', 'Cultivated Land Acres']
)

fig.update_layout(
    plot_bgcolor='white',
    title='Maize Yield, Household Head Education and Access Agri-Climatic Information ',
    xaxis_title='Household Head Education level',
    yaxis_title='Maize Yield (Ton/Acre)',
    legend_title='Climate Info Received',
    width=700,
    height=400
)
# Add black lines to axes
fig.update_xaxes(showline=True, linecolor='black', mirror=True)
fig.update_yaxes(showline=True, linecolor='black', mirror=True)

fig.show()


### Insights from the survey data

- **Agricultural practices and productivity:** There is a wide variability in **cultivated land acres** and **maize yield**. The scatter plots show that there is no clear positive correlation between **cultivated land acres** and **maize yield**, indicating that simply having more land doesn't guarantee a higher yield. This suggests that **land management practices** are more important than the size of the land itself. Business cases could be built around providing farmers with training on efficient farming techniques, such as crop rotation, soil management, or improved irrigation systems.

- **Role of climate information and agri-advisories:** Households that have access to **advisories** information** generally seem to have a more consistent and potentially higher maize yield, as indicated by the scatter plots. While not suggesting a causal link, the data shows that providing farmers with timely and accurate climate data could lead to more informed decisions regarding planting, irrigation, and harvesting, thereby improving crop yields. Policy recommendations should include investing in accessible weather forecasting services and developing easy-to-use platforms to disseminate climate information and agricultural advice to farmers.

- **Organic fertilizer usage:** The graph showing **Maize Yield vs Organic Fertilizer** suggests that while many households use organic fertilizer, there isn't a strong positive correlation between the amount used and the maize yield. This may indicate a lack of understanding regarding the correct application or type of organic fertilizer. Advisories could focus on educating farmers about proper fertilizer application rates and methods to maximize their effectiveness.

- **Education and its impact on livelihoods:** The most common education level among households is **Secondary**, followed by **Primary** and **Undergraduate degrees**. There is a noticeable gender disparity, with **males being significantly more educated** than females across all levels, particularly at the Secondary and Diploma levels. These results suggests the need for targeted gender-sensitive advisory services, to ensure equitable access to and benefits from information.

- **Impact of advisory services on maize yield:** The last boxplot figure indicates that households with access to advisory tend to have higher and less variable maize yields compared to those without. At the same time, the yield don't seem to be directly related to the education level of the household head. This is not suprising, as yield outcomes are influenced by many other factors, as shown in previously presented results. 


This notebook marks the end of the project. 
For more such informative notebooks, contact me. 🙂 