# Neophilic Gardener - Analysis 1

## Research question
Is gardening experience a mitigating strategy for food neophobia?

## Hypotheses
- **H1**: Higher gardening experience is associated with **lower levels of food neophobia**.
- **H2**: Higher gardening experience is associated with a **greater perceived reduction in neophobia**.

## Notes
- Perceived-reduction items (`g06q49`–`g06q52`) have high missingness; all H2 outputs report the effective sample size (N).


In [1]:
# Core imports
import warnings
from pathlib import Path

import numpy as np
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
import plotly.io as pio

from IPython.display import display
from scipy.stats import chi2_contingency, kendalltau, kruskal, mannwhitneyu, spearmanr
import statsmodels.formula.api as smf

warnings.filterwarnings("ignore")

# Visualization defaults
pio.templates.default = "plotly_white"
pd.set_option("display.max_columns", None)
pd.set_option("display.width", None)
pd.set_option("display.max_rows", 60)
pd.options.plotting.backend = "plotly"

# Output directory for figures
images_path = Path("../data/final_outputs/new_analysis_1")
images_path.mkdir(parents=True, exist_ok=True)
print(f"Figures will be saved to: {images_path.resolve()}")


Figures will be saved to: /media/nas-elias/pesquisas/papers/neophilic-gardener/data/final_outputs/new_analysis_1


## Data loading
We use the cleaned dataset produced by the data-cleaning notebook (`data/processed/data_cleaned.json`).

In [2]:
df_raw = pd.read_json("../data/processed/data_cleaned.json", lines=True)
print("Dataset shape:", df_raw.shape)
display(df_raw.head(3))


Dataset shape: (157, 89)


Unnamed: 0,id,date_birth,zip_code,urbanization_degree,gender,education_level,education_field,gardener,garden_type_communitary,garden_type_pedagogical,garden_type_other,garden_type_other_description,years_experience,frequency_experience,experience_description,g03q11,g03q12,g03q13,g03q14,g03q15,g03q16,g03q17,g03q18,g03q19,g03q20,recognition_grade,g03q21,g03q22,g03q23,g03q24,g03q25,g03q26,g03q27,g03q28,g03q29,g03q30,classification_grade,recall_plants,plants_valid,plants_total,recall_algae,algae_valid,algae_total,recall_mushrooms,mushrooms_valid,mushrooms_total,species_total,diet_type,consumption_plants_time_scale,consumption_plants_frequency,consumption_mushrooms_time_scale,consumption_mushrooms_frequency,consumption_algae_time_scale,consumption_algae_frequency,social_model_plants,social_model_mushrooms,social_model_algae,social_model_other,social_trigger_plants,social_trigger_mushrooms,social_trigger_algae,meals_friends,meals_family_older,meals_family_same_generation,meals_not_family_older,barrier_plant,barrier_mushroom,barrier_algae,barrier_other,g06q39,g06q40,g06q41,g06q42,g06q43,g06q44,g06q45,g06q46,g06q47,g06q48,g06q49,g06q50,g06q51,g06q52,extra_info,authorization_new_contact,interview_time,age,state,region
0,15,1986,59082110,Zona urbana,Mulher cis - pessoa que nasceu com sexo femini...,Mestrado concluído,Ciências Biológicas,Sim,Não,Sim,Não,,0.0,1.0,"Participei de algumas oficinas, mas não tive c...",Nori,Não sei,Não sei,Shitake,Portobello,Wakame,Kombu,Taioba,Shimeji,Mangaba,8,Alga,Não sei,Não sei,Cogumelo,Cogumelo,Alga,Alga,"Planta - incluindo frutas, verduras e legumes",Cogumelo,"Planta - incluindo frutas, verduras e legumes",8,"rambutan, cupuaçu, mangostim, pitaya, almeirão...","rambutan, cupuaçu, mangostim, pitaya, almeirão...",6,As algas que conheço já foram mencionadas nest...,0,0.0,Os cogumelos que conheço já foram mencionados ...,0,0,6,Onívora - Possuo uma alimentação diversificada...,Nenhum,0,Mês,1,Mês,2,Não consumi ou consumo esse tipo de alimento,Familiares da mesma idade ou mais novos do que eu,Familiares da mesma idade ou mais novos do que eu,,Familiares da mesma idade ou mais novos do que eu,Familiares da mesma idade ou mais novos do que eu,Familiares da mesma idade ou mais novos do que eu,0,0,16,0,Falta de conhecimento: não conheço esses alime...,Falta de acesso econômico: acho esse tipo de a...,Falta de acesso econômico: acho esse tipo de a...,,1 - Discordo totalmente,1 - Discordo totalmente,1 - Discordo totalmente,1 - Discordo totalmente,6 - Concordo bastante,5 - Concordo pouco,1 - Discordo totalmente,1 - Discordo totalmente,7 - Concordo totalmente,5 - Concordo pouco,,,,,Me considero uma pessoa muito interessada em n...,"Sim, podem me contatar!",1432.75,39,RN,Northeast
1,17,1997,59090324,Zona urbana,Homem cis - pessoa que nasceu com sexo masculi...,Mestrado concluído,Nutrição,Sim,Sim,Sim,Não,,7.0,5.0,"Participei de atividades de ensino, pesquisa e...",Nori,Beldroega,Tucumã,Shitake,Portobello,Wakame,Kombu,Taioba,Shimeji,Mangaba,10,Alga,"Planta - incluindo frutas, verduras e legumes","Planta - incluindo frutas, verduras e legumes",Cogumelo,"Planta - incluindo frutas, verduras e legumes",Alga,Alga,"Planta - incluindo frutas, verduras e legumes",Cogumelo,"Planta - incluindo frutas, verduras e legumes",9,"Urucum, taioba, ora-pro-nobis,fruta pão,jambú,...","urucum, ora-pro-nóbis, fruta-pão, jambu, cubiu...",14,Alga nori,0,0.0,"Shitake,shimeji, portubelo, champignon",0,0,14,Flexitariano - Tenho um cardápio flexível e co...,Semana,1,Mês,1,Mês,1,Professores ou pessoas de referência mais velh...,Familiares da mesma idade ou mais novos do que eu,Não consumi ou consumo esse tipo de alimento,,Familiares da mesma idade ou mais novos do que eu,Professores ou pessoas de referência mais velh...,Meus amigos ou colegas de trabalho,3,2,0,2,Falta de acesso físico: não é fácil de achar e...,Outro,Outro,,7 - Concordo totalmente,1 - Discordo totalmente,7 - Concordo totalmente,7 - Concordo totalmente,1 - Discordo totalmente,7 - Concordo totalmente,"4 - Nem discordo, nem concordo",7 - Concordo totalmente,2 - Discordo bastante,7 - Concordo totalmente,6 - Concordo bastante,6 - Concordo bastante,6 - Concordo bastante,1 - Discordo totalmente,,"Sim, podem me contatar!",871.56,28,RN,Northeast
2,19,1993,53200000,Zona urbana,Mulher cis - pessoa que nasceu com sexo femini...,Mestrado em curso,Ciências Biológicas,Sim,Não,Sim,Não,,1.0,3.0,Já respondi,Nori,Beldroega,Tucumã,Shitake,Portobello,Wakame,Kombu,Taioba,Shimeji,Mangaba,10,Alga,"Planta - incluindo frutas, verduras e legumes","Planta - incluindo frutas, verduras e legumes",Cogumelo,Cogumelo,Alga,Alga,"Planta - incluindo frutas, verduras e legumes",Cogumelo,"Planta - incluindo frutas, verduras e legumes",10,Já respondi,"couvinha, monguba, margaridinha-cosmos, ararut...",17,Já respondi,0,0.0,Já respondi,"orelha-de-judas, ostra-rosa, porcini",3,20,Onívora - Possuo uma alimentação diversificada...,Ano,5,Ano,1,Ano,3,Familiares mais velhos que eu,Outro,Outro,"No tempo que fui vegetariana pesquisei, por co...",Meus amigos ou colegas de trabalho,Professores ou pessoas de referência mais velh...,Meus amigos ou colegas de trabalho,3,7,0,0,Falta de acesso físico: não é fácil de achar e...,Falta de acesso econômico: acho esse tipo de a...,Falta de acesso físico: não é fácil de achar e...,,6 - Concordo bastante,1 - Discordo totalmente,3 - Discordo pouco,6 - Concordo bastante,2 - Discordo bastante,6 - Concordo bastante,1 - Discordo totalmente,2 - Discordo bastante,6 - Concordo bastante,7 - Concordo totalmente,,,,,,"Sim, podem me contatar!",1287.96,32,PE,Northeast


In [27]:
df_raw["g06q49"].value_counts(dropna=False)

g06q49
NaN                               122
6 - Concordo bastante              11
7 - Concordo totalmente             7
2 - Discordo bastante               4
3 - Discordo pouco                  4
4 - Nem discordo, nem concordo      4
5 - Concordo pouco                  4
1 - Discordo totalmente             1
Name: count, dtype: int64

In [28]:
df_raw["g06q50"].value_counts(dropna=False)

g06q50
NaN                               122
6 - Concordo bastante              15
7 - Concordo totalmente            11
4 - Nem discordo, nem concordo      4
5 - Concordo pouco                  4
2 - Discordo bastante               1
Name: count, dtype: int64

In [30]:
df_raw["g06q51"].value_counts(dropna=False)

g06q51
NaN                               122
5 - Concordo pouco                 11
6 - Concordo bastante               7
4 - Nem discordo, nem concordo      7
3 - Discordo pouco                  4
2 - Discordo bastante               3
1 - Discordo totalmente             2
7 - Concordo totalmente             1
Name: count, dtype: int64

In [31]:
df_raw["g06q52"].value_counts(dropna=False)

g06q52
NaN                               122
2 - Discordo bastante              10
1 - Discordo totalmente             6
4 - Nem discordo, nem concordo      5
6 - Concordo bastante               5
3 - Discordo pouco                  4
5 - Concordo pouco                  3
7 - Concordo totalmente             2
Name: count, dtype: int64

## Helper functions
These utility functions keep the analysis reproducible and readable.

In [3]:
def parse_likert_1_to_7(series: pd.Series) -> pd.Series:
    """Parse 1–7 Likert strings (e.g., '1 - Discordo totalmente') into integers.

    Args:
        series: Raw survey responses stored as strings.

    Returns:
        A numeric series with values in [1, 7] and NaN for non-parsable values.
    """
    extracted = series.astype(str).str.extract(r"^(?P<n>\d+)", expand=True)
    return pd.to_numeric(extracted["n"], errors="coerce")


def reverse_1_to_7(series: pd.Series) -> pd.Series:
    """Reverse-code a 1–7 Likert scale: 1↔7, 2↔6, 3↔5, 4↔4.

    Args:
        series: Numeric series on a 1–7 Likert scale.

    Returns:
        Reverse-coded numeric series.
    """
    return 8 - series


def safe_normalize_0_1(series: pd.Series) -> pd.Series:
    """Normalize a non-negative series to [0, 1] using max scaling.

    Args:
        series: Numeric series.

    Returns:
        A series scaled to [0, 1]. If max is 0, returns 0 for all non-missing values.
    """
    # Force numeric dtype; avoids statsmodels treating the regressor as categorical.
    s = pd.to_numeric(series, errors="coerce")
    max_val = s.max()
    if pd.isna(max_val) or max_val == 0:
        return s.fillna(0) * 0
    return s / max_val


def make_equal_frequency_bins_with_zero_as_none(series: pd.Series, n_bins_nonzero: int = 4) -> tuple[pd.Series, dict]:
    """Create experience bins: 'none' for zeros + equal-frequency bins for non-zero values.

    Args:
        series: Numeric series where 0 means no experience.
        n_bins_nonzero: Number of equal-frequency bins for non-zero values.

    Returns:
        A tuple of:
            - binned series with labels: none, bin_1..bin_k, missing
            - dict with bin ranges for non-zero bins
    """
    valid_mask = series.notna()
    zero_mask = (series == 0) & valid_mask
    non_zero_mask = (series > 0) & valid_mask

    binned = pd.Series(index=series.index, dtype="object")
    binned.loc[~valid_mask] = "missing"
    binned.loc[zero_mask] = "none"

    non_zero = series.loc[non_zero_mask].sort_values()
    bin_ranges: dict[str, dict] = {}

    if len(non_zero) == 0:
        return binned, bin_ranges

    n_samples = len(non_zero)
    samples_per_bin = n_samples // n_bins_nonzero
    remainder = n_samples % n_bins_nonzero

    start = 0
    for bin_num in range(1, n_bins_nonzero + 1):
        current_size = samples_per_bin + (1 if bin_num <= remainder else 0)
        end = start + current_size

        idx = non_zero.index[start:end]
        vals = non_zero.iloc[start:end]
        label = f"bin_{bin_num}"

        binned.loc[idx] = label
        bin_ranges[label] = {"min": float(vals.min()), "max": float(vals.max()), "count": int(current_size)}

        start = end

    return binned, bin_ranges


def descriptive_table_by_bin(df_in: pd.DataFrame, group_col: str, value_col: str, max_categories: int = 12) -> pd.DataFrame:
    """Create a by-group table with count and row-wise percentage.

    Args:
        df_in: Input dataframe.
        group_col: Grouping column (e.g., experience bin).
        value_col: Variable to tabulate within each group.
        max_categories: If exceeded, collapse long tail into 'Other (collapsed)'.

    Returns:
        A dataframe with entries formatted as 'count (pct%)'.
    """
    tab = pd.crosstab(df_in[group_col], df_in[value_col])

    if tab.shape[1] > max_categories:
        top_cols = tab.sum(axis=0).sort_values(ascending=False).head(max_categories - 1).index.tolist()
        other = tab.drop(columns=top_cols).sum(axis=1)
        tab = tab[top_cols].copy()
        tab["Other (collapsed)"] = other

    tab_pct = tab.div(tab.sum(axis=1), axis=0) * 100
    return tab.astype(int).astype(str) + " (" + tab_pct.round(1).astype(str) + "%)"


def cramers_v_from_crosstab(crosstab: pd.DataFrame) -> float:
    """Compute Cramér's V effect size from a contingency table.

    Args:
        crosstab: Contingency table (rows = groups, columns = categories).

    Returns:
        Cramér's V in [0, 1].
    """
    chi2, _, _, _ = chi2_contingency(crosstab)
    n = crosstab.to_numpy().sum()
    if n == 0:
        return np.nan
    r, k = crosstab.shape
    denom = min(r - 1, k - 1)
    if denom <= 0:
        return np.nan
    return float(np.sqrt((chi2 / n) / denom))


## Preprocessing and derived variables
We derive (1) a continuous garden experience score, (2) experience bins for descriptive comparisons, (3) food neophobia score, and (4) perceived neophobia change.

In [4]:
df = df_raw.copy()

# --- X: Garden experience score ---
df["gardener_binary"] = df["gardener"].replace({"Sim": 1, "Não": 0}).astype(float)

# Decision: years_experience outside [0, 50] => 0
years = pd.to_numeric(df["years_experience"], errors="coerce").fillna(0)
years = years.where((years >= 0) & (years <= 50), 0)
df["years_experience_clean"] = years

freq = pd.to_numeric(df["frequency_experience"], errors="coerce").fillna(0)
df["frequency_experience_clean"] = freq

df["garden_experience_raw"] = (df["gardener_binary"] * (df["years_experience_clean"] * df["frequency_experience_clean"])).astype(float)
df["garden_experience_score"] = safe_normalize_0_1(df["garden_experience_raw"]).astype(float)

df["experience_bin"], experience_bin_ranges = make_equal_frequency_bins_with_zero_as_none(df["garden_experience_score"], n_bins_nonzero=4)

print("Garden experience score summary:")
display(df["garden_experience_score"].describe())
print("Experience bin counts:")
display(df["experience_bin"].value_counts(dropna=False))


Garden experience score summary:


count    157.000000
mean       0.047492
std        0.121287
min        0.000000
25%        0.000000
50%        0.000000
75%        0.037500
max        1.000000
Name: garden_experience_score, dtype: float64

Experience bin counts:


experience_bin
none     92
bin_1    17
bin_4    16
bin_2    16
bin_3    16
Name: count, dtype: int64

In [5]:
# --- Y1: Food neophobia score (10–70) ---
neophilic_items = ["g06q39", "g06q42", "g06q44", "g06q47", "g06q48"]
neophobic_items = ["g06q40", "g06q41", "g06q43", "g06q45", "g06q46"]

for col in neophilic_items + neophobic_items:
    df[f"{col}_num"] = parse_likert_1_to_7(df[col])

for col in neophilic_items:
    df[f"{col}_rev"] = reverse_1_to_7(df[f"{col}_num"])

df["food_neophobia_score"] = df[[f"{c}_rev" for c in neophilic_items]].sum(axis=1) + df[[f"{c}_num" for c in neophobic_items]].sum(axis=1)

neophobia_mean = df["food_neophobia_score"].mean()
neophobia_sd = df["food_neophobia_score"].std()


def classify_neophobia(score: float) -> str:
    if pd.isna(score):
        return "missing"
    if score < (neophobia_mean - neophobia_sd):
        return "neophilic"
    if score > (neophobia_mean + neophobia_sd):
        return "neophobic"
    return "neutral"


df["food_neophobia_group"] = df["food_neophobia_score"].apply(classify_neophobia)

print("Food neophobia score summary (expected 10–70):")
display(df["food_neophobia_score"].describe())


Food neophobia score summary (expected 10–70):


count    157.000000
mean      31.713376
std       10.391103
min       11.000000
25%       24.000000
50%       32.000000
75%       39.000000
max       61.000000
Name: food_neophobia_score, dtype: float64

In [6]:
df["food_neophobia_group"].value_counts(dropna=False)

food_neophobia_group
neutral      107
neophobic     27
neophilic     23
Name: count, dtype: int64

In [7]:
# --- Y2: Perceived reduction in neophobia (0 to -12) ---
for col in ["g06q49", "g06q50", "g06q51", "g06q52"]:
    df[f"{col}_num"] = parse_likert_1_to_7(df[col])

df["g06q49_rev"] = reverse_1_to_7(df["g06q49_num"])
df["g06q50_rev"] = reverse_1_to_7(df["g06q50_num"])

df["delta_neophilic"] = df["g06q50_rev"] - df["g06q49_rev"]
df["delta_neophobic"] = df["g06q52_num"] - df["g06q51_num"]
df["perceived_neophobia_change"] = df["delta_neophilic"] + df["delta_neophobic"]

# Spec: positive values are not meaningful; clip to 0
df.loc[df["perceived_neophobia_change"] > 0, "perceived_neophobia_change"] = 0

print("Perceived neophobia change summary (expected 0 to -12):")
display(df["perceived_neophobia_change"].describe())

Perceived neophobia change summary (expected 0 to -12):


count    35.000000
mean     -2.314286
std       2.564135
min      -8.000000
25%      -4.000000
50%      -1.000000
75%       0.000000
max       0.000000
Name: perceived_neophobia_change, dtype: float64

## Experience distribution (score + bins)

In [8]:
fig_hist = px.histogram(df, x="garden_experience_score", nbins=30, title="Distribution of garden experience score (0–1)", labels={"garden_experience_score": "Garden experience score (normalized)"}, color_discrete_sequence=["seagreen"])
fig_hist.update_layout(bargap=0.02, showlegend=False)
fig_hist.show()

# Save
fig_hist.write_html(images_path / "experience_score_hist.html")
fig_hist.write_image(images_path / "experience_score_hist.png", width=800, height=500, scale=3)
print("Saved experience_score_hist.html and .png")

Saved experience_score_hist.html and .png


In [9]:
bin_counts = df["experience_bin"].value_counts()
bin_order = ["none", "bin_1", "bin_2", "bin_3", "bin_4", "missing"]
ordered_bins = [b for b in bin_order if b in bin_counts.index]
ordered_counts = [int(bin_counts[b]) for b in ordered_bins]
ordered_colors = ["red" if x == "none" else "gray" if x == "missing" else px.colors.sequential.Greens[5] for x in ordered_bins]

x_labels = []
for b in ordered_bins:
    if b == "none":
        x_labels.append("none\n(0.0000)")
    elif b == "missing":
        x_labels.append("missing")
    else:
        r = experience_bin_ranges.get(b, {})
        x_labels.append(f"{b}\n[{r.get('min', np.nan):.4f}–{r.get('max', np.nan):.4f}]")

fig_bins = go.Figure()
fig_bins.add_trace(go.Bar(x=x_labels, y=ordered_counts, text=ordered_counts, textposition="auto", marker_color=ordered_colors))
fig_bins.update_layout(
    title="Distribution of experience bins<br><sub>Equal frequency binning (excluding NaN values)</sub>",
    xaxis_title="Experience bins (with ranges)",
    yaxis_title="Number of Samples",
    showlegend=False,
    xaxis_tickangle=-45,  # Rotate labels for better readability
)
fig_bins.show()

# Save
fig_bins.write_html(images_path / "experience_bins.html")
fig_bins.write_image(images_path / "experience_bins.png", width=800, height=500, scale=3)
print("Saved experience_bins.html and .png")

Saved experience_bins.html and .png


## Food neophobia (H1 outcome)

In [10]:
fig_neophobia = px.histogram(df, x="food_neophobia_score", nbins=25, title="Distribution of food neophobia score (10–70)", labels={"food_neophobia_score": "Food neophobia score (sum of 10 items)"}, color_discrete_sequence=[px.colors.sequential.Greens[2]])
fig_neophobia.update_layout(bargap=0.02, showlegend=False)
fig_neophobia.show()

# Save
fig_neophobia.write_html(images_path / "food_neophobia_hist.html")
fig_neophobia.write_image(images_path / "food_neophobia_hist.png", width=800, height=500, scale=3)
print("Saved food_neophobia_hist.html and .png")

Saved food_neophobia_hist.html and .png


In [11]:
df_box = df[df["experience_bin"].isin(["none", "bin_1", "bin_2", "bin_3", "bin_4"])].copy()
df_box["experience_bin"] = pd.Categorical(df_box["experience_bin"], categories=["none", "bin_1", "bin_2", "bin_3", "bin_4"], ordered=True)

fig_box = px.box(df_box, x="experience_bin", y="food_neophobia_score", points="all", title="Food neophobia by experience bin", labels={"experience_bin": "Experience bin", "food_neophobia_score": "Food neophobia score"}, color="experience_bin", color_discrete_sequence=px.colors.sequential.Greens[2:])
fig_box.update_layout(showlegend=False)
fig_box.show()

# Save
fig_box.write_html(images_path / "food_neophobia_by_bin_box.html")
fig_box.write_image(images_path / "food_neophobia_by_bin_box.png", width=800, height=500, scale=3)
print("Saved food_neophobia_by_bin_box.html and .png")

Saved food_neophobia_by_bin_box.html and .png


In [12]:
fig_scatter = px.scatter(
    df,
    x="garden_experience_score",
    y="food_neophobia_score",
    trendline="ols",
    title="Garden experience vs. food neophobia",
    labels={"garden_experience_score": "Garden experience score (0–1)", "food_neophobia_score": "Food neophobia score"},
    opacity=0.6,
    color_discrete_sequence=[px.colors.sequential.Greens[5]],
)
fig_scatter.show()

# Save
fig_scatter.write_html(images_path / "experience_vs_food_neophobia_scatter.html")
fig_scatter.write_image(images_path / "experience_vs_food_neophobia_scatter.png", width=800, height=500, scale=3)
print("Saved experience_vs_food_neophobia_scatter.html and .png")

Saved experience_vs_food_neophobia_scatter.html and .png


## Perceived reduction in neophobia (H2 outcome)
The theoretical range is **0 (no change) to -12 (stronger perceived reduction)**.

In [13]:
df_h2_plot = df.dropna(subset=["perceived_neophobia_change"]).copy()
print("Non-missing perceived change N =", len(df_h2_plot))

fig_p = px.histogram(df_h2_plot, x="perceived_neophobia_change", nbins=20, title="Distribution of perceived neophobia change (0 to -12)", labels={"perceived_neophobia_change": "Perceived neophobia change"}, color_discrete_sequence=[px.colors.sequential.Greens[4]])
fig_p.update_layout(bargap=0.02, showlegend=False)
fig_p.show()

# Save
fig_p.write_html(images_path / "perceived_change_hist.html")
fig_p.write_image(images_path / "perceived_change_hist.png", width=800, height=500, scale=3)
print("Saved perceived_change_hist.html and .png")

Non-missing perceived change N = 35


Saved perceived_change_hist.html and .png


In [14]:
fig_p_scatter = px.scatter(
    df_h2_plot,
    x="garden_experience_score",
    y="perceived_neophobia_change",
    trendline="ols",
    title="Garden experience vs. perceived neophobia change",
    labels={"garden_experience_score": "Garden experience score (0–1)", "perceived_neophobia_change": "Perceived neophobia change"},
    opacity=0.6,
    color_discrete_sequence=[px.colors.sequential.Greens[4]],
)
fig_p_scatter.show()

# Save
fig_p_scatter.write_html(images_path / "experience_vs_perceived_change_scatter.html")
fig_p_scatter.write_image(images_path / "experience_vs_perceived_change_scatter.png", width=800, height=500, scale=3)
print("Saved experience_vs_perceived_change_scatter.html and .png")

Saved experience_vs_perceived_change_scatter.html and .png


## Descriptive and comparative tables by experience bin
This section produces descriptive tables (counts and row-wise percentages) by experience bins.

Core variables (current dataset): `zip_code`, `gender`, `education_level`, `education_field`, `diet_type`.
For geographic comparisons, we report `state` and `region` instead of raw `zip_code` to avoid extremely sparse tables.

Extra exploratory variables  `social_model_*`, `social_trigger_*`, `meals_*`.

In [15]:
# English mappings
gender_map = {
    "Mulher cis - pessoa que nasceu com sexo feminino e se identifica com o gênero feminino": "Female",
    "Homem cis - pessoa que nasceu com sexo masculino e se identifica com o gênero masculino": "Male",
    "Pessoa não binária - pessoa que não se identifica estritamente com o gênero masculino ou feminino": "Non-binary",
    "Não sei/prefiro não responder": "Prefer not to answer",
}
df["gender_en"] = df["gender"].map(gender_map).fillna("Other / missing")

diet_map = {
    "Onívora - Possuo uma alimentação diversificada consumindo carne, frango, peixe, verduras, frutas, leite, entre outros alimentos": "Omnivore",
    "Flexitariano - Tenho um cardápio flexível e como carne eventualmente ou, pelo menos, tento reduzir as quantidades": "Flexitarian",
    "Vegetariana - Excluo todos os tipos de carnes, aves, peixes e, posso incluir, ovos, laticínios e seus produtos": "Vegetarian",
    "Vegana - Não me alimento de nenhum produto que contenha carne, ovos, leite, mel ou outros ingredientes derivados de animais": "Vegan",
}
df["diet_type_en"] = df["diet_type"].map(diet_map).fillna("Other / missing")

education_level_map = {
    "Graduação em curso": "Undergraduate (in progress)",
    "Graduação concluída": "Undergraduate (completed)",
    "Mestrado em curso": "MSc (in progress)",
    "Mestrado concluído": "MSc (completed)",
    "Doutorado em curso": "PhD (in progress)",
    "Doutorado concluído": "PhD (completed)",
}
df["education_level_en"] = df["education_level"].map(education_level_map).fillna("Other / missing")


def map_education_field_to_broad(field: str) -> str:
    if pd.isna(field):
        return "Missing"
    f = str(field).strip().lower()

    if any(k in f for k in ["nutri", "medicin", "enferm", "odont", "biomed", "psicolog", "farmac", "gastron"]):
        return "Health Sciences"
    if any(k in f for k in ["ecolog", "biolog", "quim", "ambient", "agronom", "florest", "pesca", "ciências biológicas"]):
        return "Natural & Environmental Sciences"
    if any(k in f for k in ["engenhar", "arquitet", "comput", "sistemas", "tecnolog", "redes", "ti", "química", "desenho industrial"]):
        return "Engineering & Technology"
    if any(k in f for k in ["direito", "hist", "letras", "admin", "turismo", "geograf", "comunic", "sociais", "relig", "serviço social", "servico social", "relações internacionais", "pedagogia", "produção cultural", "contábeis", "teatro"]):
        return "Humanities & Social Sciences"

    return "Other"


df["education_field_broad"] = df["education_field"].apply(map_education_field_to_broad)
df["education_field_broad"].value_counts()

education_field_broad
Health Sciences                     61
Humanities & Social Sciences        43
Engineering & Technology            37
Natural & Environmental Sciences    14
Other                                2
Name: count, dtype: int64

In [16]:
df.query('education_field_broad == "Other"')["education_field"]

86     ainda em curso
137              2023
Name: education_field, dtype: object

In [17]:
df_bins = df[df["experience_bin"].isin(["none", "bin_1", "bin_2", "bin_3", "bin_4"])].copy()
df_bins["experience_bin"] = pd.Categorical(df_bins["experience_bin"], categories=["none", "bin_1", "bin_2", "bin_3", "bin_4"], ordered=True)

core_tables = {
    "Gender": descriptive_table_by_bin(df_bins, "experience_bin", "gender_en"),
    "Diet type": descriptive_table_by_bin(df_bins, "experience_bin", "diet_type_en"),
    "Education level": descriptive_table_by_bin(df_bins, "experience_bin", "education_level_en"),
    "Education field (broad)": descriptive_table_by_bin(df_bins, "experience_bin", "education_field_broad"),
    "State": descriptive_table_by_bin(df_bins, "experience_bin", "state", max_categories=10),
    "Region": descriptive_table_by_bin(df_bins, "experience_bin", "region", max_categories=10),
}

for title, tbl in core_tables.items():
    print("\n" + "=" * 90)
    print(title)
    print("=" * 90)
    display(tbl)

extra_cols = [
    "social_model_plants",
    "social_model_mushrooms",
    "social_model_algae",
    "social_model_other",
    "social_trigger_plants",
    "social_trigger_mushrooms",
    "social_trigger_algae",
    "meals_friends",
    "meals_family_older",
    "meals_family_same_generation",
    "meals_not_family_older",
]

for col in extra_cols:
    print("\n" + "=" * 90)
    print(f"Extra variable: {col}")
    print("=" * 90)
    display(descriptive_table_by_bin(df_bins, "experience_bin", col, max_categories=10))



Gender


gender_en,Female,Male,Non-binary,Prefer not to answer
experience_bin,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
none,47 (51.1%),44 (47.8%),0 (0.0%),1 (1.1%)
bin_1,7 (41.2%),9 (52.9%),1 (5.9%),0 (0.0%)
bin_2,12 (75.0%),4 (25.0%),0 (0.0%),0 (0.0%)
bin_3,13 (81.2%),3 (18.8%),0 (0.0%),0 (0.0%)
bin_4,9 (56.2%),7 (43.8%),0 (0.0%),0 (0.0%)



Diet type


diet_type_en,Flexitarian,Omnivore,Vegan,Vegetarian
experience_bin,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
none,9 (9.8%),81 (88.0%),1 (1.1%),1 (1.1%)
bin_1,2 (11.8%),15 (88.2%),0 (0.0%),0 (0.0%)
bin_2,2 (12.5%),11 (68.8%),0 (0.0%),3 (18.8%)
bin_3,2 (12.5%),13 (81.2%),1 (6.2%),0 (0.0%)
bin_4,1 (6.2%),13 (81.2%),1 (6.2%),1 (6.2%)



Education level


education_level_en,MSc (completed),MSc (in progress),PhD (completed),PhD (in progress),Undergraduate (completed),Undergraduate (in progress)
experience_bin,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
none,2 (2.2%),8 (8.7%),6 (6.5%),6 (6.5%),21 (22.8%),49 (53.3%)
bin_1,3 (17.6%),0 (0.0%),2 (11.8%),0 (0.0%),4 (23.5%),8 (47.1%)
bin_2,0 (0.0%),2 (12.5%),0 (0.0%),1 (6.2%),7 (43.8%),6 (37.5%)
bin_3,2 (12.5%),1 (6.2%),2 (12.5%),2 (12.5%),3 (18.8%),6 (37.5%)
bin_4,2 (12.5%),1 (6.2%),4 (25.0%),1 (6.2%),4 (25.0%),4 (25.0%)



Education field (broad)


education_field_broad,Engineering & Technology,Health Sciences,Humanities & Social Sciences,Natural & Environmental Sciences,Other
experience_bin,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
none,30 (32.6%),26 (28.3%),30 (32.6%),4 (4.3%),2 (2.2%)
bin_1,1 (5.9%),7 (41.2%),7 (41.2%),2 (11.8%),0 (0.0%)
bin_2,0 (0.0%),9 (56.2%),3 (18.8%),4 (25.0%),0 (0.0%)
bin_3,0 (0.0%),11 (68.8%),2 (12.5%),3 (18.8%),0 (0.0%)
bin_4,6 (37.5%),8 (50.0%),1 (6.2%),1 (6.2%),0 (0.0%)



State


state,RN,PB,SP,RJ,RS,PE,CE,DF,PI,Other (collapsed)
experience_bin,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
none,73 (79.3%),5 (5.4%),2 (2.2%),3 (3.3%),3 (3.3%),2 (2.2%),2 (2.2%),1 (1.1%),1 (1.1%),0 (0.0%)
bin_1,10 (58.8%),1 (5.9%),2 (11.8%),0 (0.0%),0 (0.0%),0 (0.0%),1 (5.9%),0 (0.0%),1 (5.9%),2 (11.8%)
bin_2,12 (75.0%),1 (6.2%),2 (12.5%),0 (0.0%),0 (0.0%),1 (6.2%),0 (0.0%),0 (0.0%),0 (0.0%),0 (0.0%)
bin_3,13 (81.2%),0 (0.0%),0 (0.0%),1 (6.2%),1 (6.2%),0 (0.0%),0 (0.0%),1 (6.2%),0 (0.0%),0 (0.0%)
bin_4,7 (43.8%),1 (6.2%),0 (0.0%),2 (12.5%),2 (12.5%),1 (6.2%),1 (6.2%),0 (0.0%),0 (0.0%),2 (12.5%)



Region


region,Central-West,Northeast,South,Southeast
experience_bin,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
none,1 (1.1%),83 (90.2%),3 (3.3%),5 (5.4%)
bin_1,0 (0.0%),15 (88.2%),0 (0.0%),2 (11.8%)
bin_2,0 (0.0%),14 (87.5%),0 (0.0%),2 (12.5%)
bin_3,1 (6.2%),13 (81.2%),1 (6.2%),1 (6.2%)
bin_4,0 (0.0%),10 (62.5%),4 (25.0%),2 (12.5%)



Extra variable: social_model_plants


social_model_plants,Familiares da mesma idade ou mais novos do que eu,Familiares mais velhos que eu,Meus amigos ou colegas de trabalho,Não consumi ou consumo esse tipo de alimento,Outro,Professores ou pessoas de referência mais velhas do que eu (desde que não sejam meus familiares)
experience_bin,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
none,7 (7.6%),22 (23.9%),18 (19.6%),24 (26.1%),7 (7.6%),14 (15.2%)
bin_1,1 (5.9%),2 (11.8%),4 (23.5%),4 (23.5%),1 (5.9%),5 (29.4%)
bin_2,0 (0.0%),4 (25.0%),3 (18.8%),3 (18.8%),0 (0.0%),6 (37.5%)
bin_3,0 (0.0%),2 (12.5%),2 (12.5%),0 (0.0%),1 (6.2%),11 (68.8%)
bin_4,1 (6.2%),7 (43.8%),3 (18.8%),1 (6.2%),1 (6.2%),3 (18.8%)



Extra variable: social_model_mushrooms


social_model_mushrooms,Familiares da mesma idade ou mais novos do que eu,Familiares mais velhos que eu,Meus amigos ou colegas de trabalho,Não consumi ou consumo esse tipo de alimento,Outro,Professores ou pessoas de referência mais velhas do que eu (desde que não sejam meus familiares)
experience_bin,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
none,7 (7.6%),23 (25.0%),15 (16.3%),27 (29.3%),12 (13.0%),8 (8.7%)
bin_1,2 (11.8%),4 (23.5%),2 (11.8%),5 (29.4%),3 (17.6%),1 (5.9%)
bin_2,1 (6.2%),3 (18.8%),4 (25.0%),5 (31.2%),2 (12.5%),1 (6.2%)
bin_3,2 (12.5%),5 (31.2%),4 (25.0%),2 (12.5%),1 (6.2%),2 (12.5%)
bin_4,5 (31.2%),4 (25.0%),3 (18.8%),1 (6.2%),1 (6.2%),2 (12.5%)



Extra variable: social_model_algae


social_model_algae,Familiares da mesma idade ou mais novos do que eu,Familiares mais velhos que eu,Meus amigos ou colegas de trabalho,Não consumi ou consumo esse tipo de alimento,Outro,Professores ou pessoas de referência mais velhas do que eu (desde que não sejam meus familiares)
experience_bin,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
none,12 (13.0%),18 (19.6%),14 (15.2%),35 (38.0%),5 (5.4%),8 (8.7%)
bin_1,2 (11.8%),1 (5.9%),3 (17.6%),6 (35.3%),4 (23.5%),1 (5.9%)
bin_2,0 (0.0%),1 (6.2%),2 (12.5%),7 (43.8%),3 (18.8%),3 (18.8%)
bin_3,4 (25.0%),1 (6.2%),5 (31.2%),3 (18.8%),2 (12.5%),1 (6.2%)
bin_4,3 (18.8%),2 (12.5%),2 (12.5%),7 (43.8%),2 (12.5%),0 (0.0%)



Extra variable: social_model_other


social_model_other,A principal influência para meu consumo de cogumelos e algas foi pro meio de redes sociais e programas culinários,"Com base em receitas e programas culinários, como Masterchef, por exemplo, passei a ter curiosidade de experimentar alimentos nos quais eu não conhecia.",Culinária japonesa,"Curiosidade, gosto de experimentar coisas novas. Mas também me respeito se não gosto, como é o caso de algas, não curti o que já provei.","Foi por curiosidade. Conhecendo algo em restaurante ou mesmo sabendo de algo por mídias. Mas, principalmente a curiosidade de conhecer alimentos.",Leitura de artigos a respeito do consumo de algas,"Marquei outros para algas, meu consumo é influenciado por curiosidade e pesquisa na internet e livros.",Meus pais principalmente meu pai,Namorada,Other (collapsed)
experience_bin,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
none,0 (0.0%),1 (14.3%),0 (0.0%),1 (14.3%),1 (14.3%),0 (0.0%),0 (0.0%),1 (14.3%),1 (14.3%),2 (28.6%)
bin_1,0 (0.0%),0 (0.0%),0 (0.0%),0 (0.0%),0 (0.0%),0 (0.0%),0 (0.0%),0 (0.0%),0 (0.0%),3 (100.0%)
bin_2,0 (0.0%),0 (0.0%),1 (33.3%),0 (0.0%),0 (0.0%),0 (0.0%),1 (33.3%),0 (0.0%),0 (0.0%),1 (33.3%)
bin_3,1 (33.3%),0 (0.0%),0 (0.0%),0 (0.0%),0 (0.0%),1 (33.3%),0 (0.0%),0 (0.0%),0 (0.0%),1 (33.3%)
bin_4,0 (0.0%),0 (0.0%),0 (0.0%),0 (0.0%),0 (0.0%),0 (0.0%),0 (0.0%),0 (0.0%),0 (0.0%),1 (100.0%)



Extra variable: social_trigger_plants


social_trigger_plants,Familiares da mesma idade ou mais novos do que eu,Familiares mais velhos que eu,Meus amigos ou colegas de trabalho,Professores ou pessoas de referência mais velhas do que eu (desde que não sejam meus familiares)
experience_bin,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
none,25 (27.2%),26 (28.3%),32 (34.8%),9 (9.8%)
bin_1,3 (17.6%),7 (41.2%),5 (29.4%),2 (11.8%)
bin_2,4 (25.0%),4 (25.0%),4 (25.0%),4 (25.0%)
bin_3,2 (12.5%),4 (25.0%),7 (43.8%),3 (18.8%)
bin_4,6 (37.5%),3 (18.8%),4 (25.0%),3 (18.8%)



Extra variable: social_trigger_mushrooms


social_trigger_mushrooms,Familiares da mesma idade ou mais novos do que eu,Familiares mais velhos que eu,Meus amigos ou colegas de trabalho,Professores ou pessoas de referência mais velhas do que eu (desde que não sejam meus familiares)
experience_bin,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
none,24 (26.1%),27 (29.3%),32 (34.8%),9 (9.8%)
bin_1,3 (17.6%),5 (29.4%),7 (41.2%),2 (11.8%)
bin_2,7 (43.8%),4 (25.0%),2 (12.5%),3 (18.8%)
bin_3,3 (18.8%),4 (25.0%),8 (50.0%),1 (6.2%)
bin_4,7 (43.8%),5 (31.2%),2 (12.5%),2 (12.5%)



Extra variable: social_trigger_algae


social_trigger_algae,Familiares da mesma idade ou mais novos do que eu,Familiares mais velhos que eu,Meus amigos ou colegas de trabalho,Professores ou pessoas de referência mais velhas do que eu (desde que não sejam meus familiares)
experience_bin,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
none,25 (27.2%),25 (27.2%),34 (37.0%),8 (8.7%)
bin_1,5 (29.4%),5 (29.4%),5 (29.4%),2 (11.8%)
bin_2,5 (31.2%),2 (12.5%),5 (31.2%),4 (25.0%)
bin_3,2 (12.5%),4 (25.0%),8 (50.0%),2 (12.5%)
bin_4,6 (37.5%),3 (18.8%),5 (31.2%),2 (12.5%)



Extra variable: meals_friends


meals_friends,0,1,2,5,3,4,10,7,15,Other (collapsed)
experience_bin,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
none,34 (37.0%),14 (15.2%),15 (16.3%),11 (12.0%),8 (8.7%),3 (3.3%),4 (4.3%),1 (1.1%),0 (0.0%),2 (2.2%)
bin_1,2 (11.8%),4 (23.5%),3 (17.6%),5 (29.4%),3 (17.6%),0 (0.0%),0 (0.0%),0 (0.0%),0 (0.0%),0 (0.0%)
bin_2,2 (12.5%),3 (18.8%),2 (12.5%),3 (18.8%),3 (18.8%),2 (12.5%),0 (0.0%),1 (6.2%),0 (0.0%),0 (0.0%)
bin_3,2 (12.5%),4 (25.0%),2 (12.5%),2 (12.5%),0 (0.0%),0 (0.0%),0 (0.0%),1 (6.2%),2 (12.5%),3 (18.8%)
bin_4,6 (37.5%),1 (6.2%),2 (12.5%),3 (18.8%),2 (12.5%),1 (6.2%),1 (6.2%),0 (0.0%),0 (0.0%),0 (0.0%)



Extra variable: meals_family_older


meals_family_older,0,2,1,5,4,7,3,14,6,Other (collapsed)
experience_bin,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
none,37 (40.2%),12 (13.0%),6 (6.5%),5 (5.4%),8 (8.7%),6 (6.5%),4 (4.3%),1 (1.1%),1 (1.1%),12 (13.0%)
bin_1,2 (11.8%),3 (17.6%),3 (17.6%),1 (5.9%),0 (0.0%),0 (0.0%),2 (11.8%),3 (17.6%),1 (5.9%),2 (11.8%)
bin_2,2 (12.5%),2 (12.5%),1 (6.2%),4 (25.0%),2 (12.5%),1 (6.2%),0 (0.0%),0 (0.0%),1 (6.2%),3 (18.8%)
bin_3,3 (18.8%),3 (18.8%),4 (25.0%),2 (12.5%),0 (0.0%),0 (0.0%),1 (6.2%),1 (6.2%),0 (0.0%),2 (12.5%)
bin_4,8 (50.0%),1 (6.2%),4 (25.0%),0 (0.0%),1 (6.2%),1 (6.2%),0 (0.0%),0 (0.0%),1 (6.2%),0 (0.0%)



Extra variable: meals_family_same_generation


meals_family_same_generation,0,2,3,5,7,1,10,14,8,Other (collapsed)
experience_bin,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
none,46 (50.0%),7 (7.6%),6 (6.5%),6 (6.5%),8 (8.7%),3 (3.3%),1 (1.1%),1 (1.1%),2 (2.2%),12 (13.0%)
bin_1,3 (17.6%),2 (11.8%),2 (11.8%),2 (11.8%),0 (0.0%),2 (11.8%),0 (0.0%),3 (17.6%),0 (0.0%),3 (17.6%)
bin_2,6 (37.5%),1 (6.2%),1 (6.2%),2 (12.5%),1 (6.2%),0 (0.0%),2 (12.5%),1 (6.2%),2 (12.5%),0 (0.0%)
bin_3,6 (37.5%),2 (12.5%),3 (18.8%),1 (6.2%),1 (6.2%),1 (6.2%),1 (6.2%),0 (0.0%),0 (0.0%),1 (6.2%)
bin_4,5 (31.2%),2 (12.5%),0 (0.0%),1 (6.2%),1 (6.2%),3 (18.8%),1 (6.2%),0 (0.0%),0 (0.0%),3 (18.8%)



Extra variable: meals_not_family_older


meals_not_family_older,0,1,2,3,4,5,NENHUMA
experience_bin,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
none,76 (82.6%),10 (10.9%),3 (3.3%),1 (1.1%),1 (1.1%),0 (0.0%),1 (1.1%)
bin_1,14 (82.4%),3 (17.6%),0 (0.0%),0 (0.0%),0 (0.0%),0 (0.0%),0 (0.0%)
bin_2,11 (68.8%),3 (18.8%),0 (0.0%),1 (6.2%),0 (0.0%),0 (0.0%),1 (6.2%)
bin_3,9 (56.2%),3 (18.8%),2 (12.5%),1 (6.2%),0 (0.0%),1 (6.2%),0 (0.0%)
bin_4,11 (68.8%),3 (18.8%),2 (12.5%),0 (0.0%),0 (0.0%),0 (0.0%),0 (0.0%)


## Statistical analysis
We report:
- Correlations between continuous experience and outcomes
- Group comparisons across experience bins
- Regression models (OLS with robust SE) for H1 and H2

In [18]:
# --- Correlations (continuous X vs outcomes) ---
df_h1 = df.dropna(subset=["garden_experience_score", "food_neophobia_score"]).copy()
df_h2 = df.dropna(subset=["garden_experience_score", "perceived_neophobia_change"]).copy()

print("H1 usable N:", len(df_h1))
print("H2 usable N:", len(df_h2))


H1 usable N: 157
H2 usable N: 35


In [19]:
print("\n=== CORRELATION ANALYSIS ===")


def interpret_p(p: float) -> str:
    if p < 0.001:
        return "Highly significant (p < 0.001)"
    if p < 0.05:
        return "Significant (p < 0.05)"
    return "Not significant (p >= 0.05)"


def corr_report(x: pd.Series, y: pd.Series, label: str) -> None:
    sp_r, sp_p = spearmanr(x, y, nan_policy="omit")
    kt_r, kt_p = kendalltau(x, y, nan_policy="omit")

    print(f"\n--- {label} ---")
    print(f"Spearman: ρ = {sp_r:.3f}, p = {sp_p:.4f} ({interpret_p(sp_p)})")
    print(f"Kendall:  τ = {kt_r:.3f}, p = {kt_p:.4f} ({interpret_p(kt_p)})")

    if sp_p < 0.05:
        direction = "positive" if sp_r > 0 else "negative"
        meaning = "associated with higher" if direction == "positive" else "associated with lower"
        print(f"Interpretation: There is a statistically significant {direction} correlation.")
        print(f"  -> Higher garden experience is {meaning} {y.name.replace('_', ' ')}.")
    else:
        print("Interpretation: No statistically significant correlation observed.")


corr_report(df_h1["garden_experience_score"], df_h1["food_neophobia_score"], "H1: Experience vs Food Neophobia Score")
if len(df_h2) > 5:
    corr_report(df_h2["garden_experience_score"], df_h2["perceived_neophobia_change"], "H2: Experience vs Perceived Change")
else:
    print("\nH2: Insufficient data for Perceived Change correlations.")


=== CORRELATION ANALYSIS ===

--- H1: Experience vs Food Neophobia Score ---
Spearman: ρ = -0.274, p = 0.0005 (Highly significant (p < 0.001))
Kendall:  τ = -0.201, p = 0.0009 (Highly significant (p < 0.001))
Interpretation: There is a statistically significant negative correlation.
  -> Higher garden experience is associated with lower food neophobia score.

--- H2: Experience vs Perceived Change ---
Spearman: ρ = -0.278, p = 0.1060 (Not significant (p >= 0.05))
Kendall:  τ = -0.218, p = 0.0907 (Not significant (p >= 0.05))
Interpretation: No statistically significant correlation observed.


In [20]:
print("\n=== GROUP COMPARISONS (Experience Bins) ===")

# H1 Comparisons
df_bins_h1 = df_bins.dropna(subset=["food_neophobia_score"]).copy()
groups_h1 = [df_bins_h1[df_bins_h1["experience_bin"] == g]["food_neophobia_score"] for g in ["none", "bin_1", "bin_2", "bin_3", "bin_4"]]
kw_stat, kw_p = kruskal(*groups_h1)

print("\n--- H1: Food Neophobia Score across Bins ---")
print(f"Kruskal–Wallis Test: H = {kw_stat:.3f}, p = {kw_p:.4f}")

if kw_p < 0.05:
    print("Interpretation: There is a statistically significant difference in food neophobia scores between at least two experience groups.")
else:
    print("Interpretation: We cannot reject the null hypothesis; no significant difference in food neophobia found across experience bins.")


=== GROUP COMPARISONS (Experience Bins) ===

--- H1: Food Neophobia Score across Bins ---
Kruskal–Wallis Test: H = 15.299, p = 0.0041
Interpretation: There is a statistically significant difference in food neophobia scores between at least two experience groups.


In [21]:
none_group = df_bins_h1[df_bins_h1["experience_bin"] == "none"]["food_neophobia_score"]
high_group = df_bins_h1[df_bins_h1["experience_bin"].isin(["bin_3", "bin_4"])]["food_neophobia_score"]
mw_u, mw_p = mannwhitneyu(none_group, high_group, alternative="two-sided")

print(f"Mann–Whitney U (None vs High[Bins 3+4]): U = {mw_u:.1f}, p = {mw_p:.4f}")
print(f"  Mean (None): {none_group.mean():.2f}")
print(f"  Mean (High): {high_group.mean():.2f}")

if mw_p < 0.05:
    diff = high_group.mean() - none_group.mean()
    direction = "higher" if diff > 0 else "lower"
    print(f"Interpretation: High-experience gardeners have significantly {direction} food neophobia scores than non-gardeners (avg diff: {diff:.2f}).")
else:
    print("Interpretation: No statistically significant difference between high-experience gardeners and non-gardeners.")

Mann–Whitney U (None vs High[Bins 3+4]): U = 2073.5, p = 0.0006
  Mean (None): 34.02
  Mean (High): 26.44
Interpretation: High-experience gardeners have significantly lower food neophobia scores than non-gardeners (avg diff: -7.58).


In [22]:
# H2 Comparisons
df_bins_h2 = df_bins.dropna(subset=["perceived_neophobia_change"]).copy()
print(f"\n--- H2: Perceived Neophobia Change across Bins (N={len(df_bins_h2)}) ---")

if len(df_bins_h2) >= 10 and df_bins_h2["experience_bin"].nunique() >= 2:
    groups_h2 = [df_bins_h2[df_bins_h2["experience_bin"] == g]["perceived_neophobia_change"] for g in ["none", "bin_1", "bin_2", "bin_3", "bin_4"] if (df_bins_h2["experience_bin"] == g).any()]
    if len(groups_h2) > 1:
        kw2_stat, kw2_p = kruskal(*groups_h2)
        print(f"Kruskal–Wallis Test: H = {kw2_stat:.3f}, p = {kw2_p:.4f}")

        if kw2_p < 0.05:
            print("Interpretation: Statistically significant difference in perceived change across experience bins.")
        else:
            print("Interpretation: No significant difference in perceived change across experience bins.")
    else:
        print("Insufficient groups for Kruskal-Wallis.")
else:
    print("Insufficient data for H2 group comparisons.")


--- H2: Perceived Neophobia Change across Bins (N=35) ---
Kruskal–Wallis Test: H = 6.026, p = 0.1972
Interpretation: No significant difference in perceived change across experience bins.


In [23]:
print("\n=== REGRESSION ANALYSIS: H1 (Food Neophobia) ===")

m_h1 = smf.ols("food_neophobia_score ~ garden_experience_score", data=df_h1).fit(cov_type="HC3")
print("\n--- Model: Simple OLS (Robust SE) ---")

# Interpretation
coef = m_h1.params["garden_experience_score"]
pval = m_h1.pvalues["garden_experience_score"]
r2 = m_h1.rsquared

print(f"R-squared: {r2:.3f} (explains {r2 * 100:.1f}% of variance)")
print(f"Coefficient (experience): {coef:.4f}")
print(f"P-value: {pval:.4f}")

if pval < 0.05:
    effect = "decreases" if coef < 0 else "increases"
    print(f"Interpretation: Garden experience significantly predicts food neophobia.")
    print(f"  -> As experience increases from 0 to 1, food neophobia score {effect} by {abs(coef):.2f} points.")
else:
    print("Interpretation: Garden experience is not a statistically significant predictor of food neophobia in this model.")

display(m_h1.summary())


=== REGRESSION ANALYSIS: H1 (Food Neophobia) ===

--- Model: Simple OLS (Robust SE) ---
R-squared: 0.019 (explains 1.9% of variance)
Coefficient (experience): -11.7957
P-value: 0.3111
Interpretation: Garden experience is not a statistically significant predictor of food neophobia in this model.


0,1,2,3
Dep. Variable:,food_neophobia_score,R-squared:,0.019
Model:,OLS,Adj. R-squared:,0.013
Method:,Least Squares,F-statistic:,1.026
Date:,"Tue, 03 Feb 2026",Prob (F-statistic):,0.313
Time:,09:16:12,Log-Likelihood:,-588.3
No. Observations:,157,AIC:,1181.0
Df Residuals:,155,BIC:,1187.0
Df Model:,1,,
Covariance Type:,HC3,,

0,1,2,3,4,5,6
,coef,std err,z,P>|z|,[0.025,0.975]
Intercept,32.2736,0.957,33.741,0.000,30.399,34.148
garden_experience_score,-11.7957,11.645,-1.013,0.311,-34.620,11.028

0,1,2,3
Omnibus:,3.215,Durbin-Watson:,1.659
Prob(Omnibus):,0.2,Jarque-Bera (JB):,3.239
Skew:,0.317,Prob(JB):,0.198
Kurtosis:,2.697,Cond. No.,8.29


In [24]:
# Adjusted model (optional; helps control for obvious confounders)
df_h1_adj = df_h1.dropna(subset=["age", "gender_en", "diet_type_en", "education_level_en", "education_field_broad"]).copy()
print(f"\n--- Model: Adjusted OLS (Robust SE) [N={len(df_h1_adj)}] ---")

if len(df_h1_adj) >= 30:
    formula = "food_neophobia_score ~ garden_experience_score + age + C(gender_en) + C(diet_type_en) + C(education_level_en) + C(education_field_broad)"

    m_h1_adj = None
    err_type = ""

    # Attempt 1: Robust HC3
    try:
        res = smf.ols(formula, data=df_h1_adj).fit(cov_type="HC3")  # HC3 means
        # Check stability: if SE is NaN or if p-value is exactly 1.0 for a large coefficient (sign of instability)
        if res.bse.isna().any() or (res.pvalues["garden_experience_score"] > 0.999 and abs(res.params["garden_experience_score"]) > 0.01):
            raise ValueError("Unstable HC3 results")
        m_h1_adj = res
        err_type = "HC3 (Robust)"
    except Exception as e:
        print(f"Notice: Robust covariance failed or unstable ({e}). Falling back to standard errors.")
        # Attempt 2: Standard Errors
        try:
            m_h1_adj = smf.ols(formula, data=df_h1_adj).fit(cov_type="nonrobust")
            err_type = "Standard (Non-robust)"
        except Exception as e2:
            print(f"Standard model also failed: {e2}")

    if m_h1_adj is not None:
        # Interpretation
        coef = m_h1_adj.params["garden_experience_score"]
        pval = m_h1_adj.pvalues["garden_experience_score"]

        print(f"Covariance Type: {err_type}")
        print(f"Coefficient (experience): {coef:.4f}, P-value: {pval:.4f}")

        if pval < 0.05:
            print("Interpretation: Even after controlling for age, gender, diet, and education, garden experience remains a significant predictor.")
        else:
            print("Interpretation: After controlling for confounders, garden experience is NOT a significant predictor.")

        display(m_h1_adj.summary())
else:
    print("Skipping adjusted model due to low N.")


--- Model: Adjusted OLS (Robust SE) [N=157] ---
Notice: Robust covariance failed or unstable (Unstable HC3 results). Falling back to standard errors.


Covariance Type: Standard (Non-robust)
Coefficient (experience): -8.4518, P-value: 0.2402
Interpretation: After controlling for confounders, garden experience is NOT a significant predictor.


0,1,2,3
Dep. Variable:,food_neophobia_score,R-squared:,0.154
Model:,OLS,Adj. R-squared:,0.05
Method:,Least Squares,F-statistic:,1.484
Date:,"Tue, 03 Feb 2026",Prob (F-statistic):,0.109
Time:,09:16:12,Log-Likelihood:,-576.71
No. Observations:,157,AIC:,1189.0
Df Residuals:,139,BIC:,1244.0
Df Model:,17,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,22.0470,5.650,3.902,0.000,10.876,33.218
C(gender_en)[T.Male],2.8200,1.857,1.519,0.131,-0.851,6.491
C(gender_en)[T.Non-binary],1.5075,10.351,0.146,0.884,-18.958,21.973
C(gender_en)[T.Prefer not to answer],0.1528,10.423,0.015,0.988,-20.456,20.761
C(diet_type_en)[T.Omnivore],4.5971,2.820,1.630,0.105,-0.979,10.173
C(diet_type_en)[T.Vegan],2.1991,6.675,0.329,0.742,-10.998,15.396
C(diet_type_en)[T.Vegetarian],1.5733,5.369,0.293,0.770,-9.043,12.190
C(education_level_en)[T.MSc (in progress)],0.9783,4.638,0.211,0.833,-8.192,10.149
C(education_level_en)[T.PhD (completed)],2.5593,4.892,0.523,0.602,-7.114,12.232

0,1,2,3
Omnibus:,2.387,Durbin-Watson:,1.724
Prob(Omnibus):,0.303,Jarque-Bera (JB):,2.118
Skew:,0.283,Prob(JB):,0.347
Kurtosis:,3.063,Cond. No.,448.0


In [25]:
print("\n=== REGRESSION ANALYSIS: H2 (Perceived Change) ===")

if len(df_h2) >= 10:
    # Simple Model
    try:
        m_h2 = smf.ols("perceived_neophobia_change ~ garden_experience_score", data=df_h2).fit(cov_type="HC3")
        print("\n--- Model: Simple OLS (Robust SE) ---")

        coef = m_h2.params["garden_experience_score"]
        pval = m_h2.pvalues["garden_experience_score"]

        print(f"Coefficient: {coef:.4f}, P-value: {pval:.4f}")
        if pval < 0.05:
            print("Interpretation: Garden experience is significantly associated with a perceived reduction in neophobia.")
            if coef < 0:
                print("  -> Higher experience leads to a stronger perceived reduction (more negative score).")
        else:
            print("Interpretation: No significant association found.")

        display(m_h2.summary())
    except Exception as e:
        print(f"Simple model failed: {e}")

    # Adjusted Model
    df_h2_adj = df_h2.dropna(subset=["age", "gender_en", "diet_type_en", "education_level_en", "education_field_broad"]).copy()
    print(f"\n--- Model: Adjusted OLS (Robust SE) [N={len(df_h2_adj)}] ---")

    if len(df_h2_adj) >= 30:
        formula_h2 = "perceived_neophobia_change ~ garden_experience_score + age + C(gender_en) + C(diet_type_en) + C(education_level_en) + C(education_field_broad)"
        m_h2_adj = None

        try:
            res = smf.ols(formula_h2, data=df_h2_adj).fit(cov_type="HC3")
            if res.bse.isna().any() or (res.pvalues["garden_experience_score"] > 0.999 and abs(res.params["garden_experience_score"]) > 0.01):
                raise ValueError("Unstable HC3")
            m_h2_adj = res
        except Exception:
            print("Notice: Robust covariance unstable. Falling back to standard errors.")
            m_h2_adj = smf.ols(formula_h2, data=df_h2_adj).fit(cov_type="nonrobust")

        if m_h2_adj:
            coef_adj = m_h2_adj.params["garden_experience_score"]
            pval_adj = m_h2_adj.pvalues["garden_experience_score"]

            print(f"Coefficient (adj): {coef_adj:.4f}, P-value (adj): {pval_adj:.4f}")
            if pval_adj < 0.05:
                print("Interpretation: Association holds after controlling for confounders.")
            else:
                print("Interpretation: Association is not significant after controlling for confounders.")

            display(m_h2_adj.summary())
    else:
        print("Skipping adjusted model due to low N.")
else:
    print("\nH2 model not run: insufficient non-missing responses for g06q49–g06q52.")


=== REGRESSION ANALYSIS: H2 (Perceived Change) ===

--- Model: Simple OLS (Robust SE) ---
Coefficient: -1.2364, P-value: 0.4979
Interpretation: No significant association found.


0,1,2,3
Dep. Variable:,perceived_neophobia_change,R-squared:,0.005
Model:,OLS,Adj. R-squared:,-0.025
Method:,Least Squares,F-statistic:,0.4595
Date:,"Tue, 03 Feb 2026",Prob (F-statistic):,0.503
Time:,09:16:12,Log-Likelihood:,-82.03
No. Observations:,35,AIC:,168.1
Df Residuals:,33,BIC:,171.2
Df Model:,1,,
Covariance Type:,HC3,,

0,1,2,3,4,5,6
,coef,std err,z,P>|z|,[0.025,0.975]
Intercept,-2.1710,0.530,-4.099,0.000,-3.209,-1.133
garden_experience_score,-1.2364,1.824,-0.678,0.498,-4.812,2.339

0,1,2,3
Omnibus:,5.029,Durbin-Watson:,1.247
Prob(Omnibus):,0.081,Jarque-Bera (JB):,4.763
Skew:,-0.87,Prob(JB):,0.0924
Kurtosis:,2.512,Cond. No.,7.23



--- Model: Adjusted OLS (Robust SE) [N=35] ---
Coefficient (adj): -2.8351, P-value (adj): 0.2657
Interpretation: Association is not significant after controlling for confounders.


0,1,2,3
Dep. Variable:,perceived_neophobia_change,R-squared:,0.683
Model:,OLS,Adj. R-squared:,0.46
Method:,Least Squares,F-statistic:,2.88
Date:,"Tue, 03 Feb 2026",Prob (F-statistic):,0.0153
Time:,09:16:12,Log-Likelihood:,-62.035
No. Observations:,35,AIC:,154.1
Df Residuals:,20,BIC:,177.4
Df Model:,14,,
Covariance Type:,HC3,,

0,1,2,3,4,5,6
,coef,std err,z,P>|z|,[0.025,0.975]
Intercept,-3.8502,4.653,-0.828,0.408,-12.969,5.269
C(gender_en)[T.Male],-1.0021,1.167,-0.859,0.390,-3.289,1.285
C(diet_type_en)[T.Omnivore],-0.7755,1.052,-0.737,0.461,-2.838,1.287
C(diet_type_en)[T.Vegan],-5.3270,2.978,-1.789,0.074,-11.163,0.509
C(diet_type_en)[T.Vegetarian],-2.1877,2.169,-1.009,0.313,-6.439,2.064
C(education_level_en)[T.MSc (in progress)],0.6847,3.366,0.203,0.839,-5.913,7.283
C(education_level_en)[T.PhD (completed)],0.0129,2.804,0.005,0.996,-5.483,5.509
C(education_level_en)[T.PhD (in progress)],-2.7414,3.584,-0.765,0.444,-9.766,4.283
C(education_level_en)[T.Undergraduate (completed)],1.4235,1.512,0.941,0.346,-1.540,4.387

0,1,2,3
Omnibus:,1.913,Durbin-Watson:,2.336
Prob(Omnibus):,0.384,Jarque-Bera (JB):,1.269
Skew:,-0.465,Prob(JB):,0.53
Kurtosis:,3.066,Cond. No.,539.0


---

# Summary Report: Gardening Experience and Food Neophobia

## Overview

This analysis examined whether gardening experience serves as a mitigating strategy for food neophobia, testing two hypotheses:

| Hypothesis | Description | Result |
|------------|-------------|--------|
| **H1** | Higher gardening experience → Lower food neophobia | ✅ **Partially Supported** |
| **H2** | Higher gardening experience → Greater perceived reduction in neophobia | ❌ **Not Supported** |

---

## Sample Characteristics

| Variable | N | Key Statistics |
|----------|---|----------------|
| Total participants | 157 | — |
| Garden experience score | 157 | Mean = 0.047 (SD = 0.12), Median = 0 |
| Food neophobia score | 157 | Mean = 31.7 (SD = 10.4), Range: 11–61 |
| Perceived neophobia change | 35 | Mean = -2.3 (SD = 2.6), Range: -8 to 0 |

**Note:** The majority of participants (N=92, 59%) reported **no gardening experience** (score = 0). The remaining 65 participants were distributed across four equal-frequency bins.

---

## Hypothesis 1: Garden Experience and Food Neophobia

### Key Findings

| Analysis | Statistic | p-value | Interpretation |
|----------|-----------|---------|----------------|
| **Spearman Correlation** | ρ = -0.274 | p < 0.001 | Significant negative correlation |
| **Kendall Correlation** | τ = -0.201 | p < 0.001 | Significant negative correlation |
| **Kruskal-Wallis (5 bins)** | H = 15.30 | p = 0.004 | Significant group differences |
| **Mann-Whitney (None vs High)** | U = 2073.5 | p < 0.001 | Significant difference |

### Descriptive Comparison
- **Non-gardeners (None):** Mean neophobia = **34.02**
- **High-experience gardeners (Bins 3+4):** Mean neophobia = **26.44**
- **Difference:** High-experience gardeners score **7.58 points lower** on the food neophobia scale

### Regression Analysis

| Model | Coefficient | p-value | R² | Significant? |
|-------|-------------|---------|-----|--------------|
| Simple OLS | -11.80 | 0.311 | 1.9% | No |
| Adjusted OLS (with confounders) | -8.45 | 0.240 | — | No |

### Interpretation

**The non-parametric tests strongly support H1**: There is a statistically significant negative association between gardening experience and food neophobia. Participants with higher gardening experience tend to have lower food neophobia scores.

**However, the linear regression models are not significant.** This apparent contradiction can be explained by:

1. **Non-linear relationship**: The effect may be categorical/threshold-based rather than strictly linear.
2. **Skewed distribution**: 59% of participants have zero experience, violating OLS assumptions.
3. **Small effect size**: R² = 1.9% indicates experience explains only a small portion of variance.

**Conclusion for H1:** ✅ Partially supported. Non-parametric evidence indicates a real association, but the linear relationship is weak and not robust to regression modeling.

---

## Hypothesis 2: Garden Experience and Perceived Neophobia Change

### Key Findings

| Analysis | Statistic | p-value | Interpretation |
|----------|-----------|---------|----------------|
| **Spearman Correlation** | ρ = -0.278 | p = 0.106 | Not significant |
| **Kendall Correlation** | τ = -0.218 | p = 0.091 | Not significant |
| **Kruskal-Wallis (bins)** | H = 6.03 | p = 0.197 | No significant group differences |
| **Simple OLS** | β = -1.24 | p = 0.498 | Not significant |
| **Adjusted OLS** | β = -2.84 | p = 0.266 | Not significant |

### Interpretation

**H2 is not supported.** None of the statistical tests found a significant relationship between gardening experience and perceived reduction in food neophobia.

**Important caveat:** The H2 analysis suffers from **severe data limitation**:
- Only **35 participants** (22% of sample) had non-missing values for the perceived change items (`g06q49`–`g06q52`).
- This drastically reduces statistical power to detect effects.

**Conclusion for H2:** ❌ Not supported, but results should be interpreted cautiously due to high missingness.

---

## Limitations

1. **Cross-sectional design:** Cannot establish causality (does gardening reduce neophobia, or do less neophobic people garden more?).
2. **High missingness for H2:** Only 22% of participants completed the perceived change items.
3. **Unbalanced groups:** 59% of participants are non-gardeners, limiting between-group comparisons.
4. **Self-reported measures:** Both gardening experience and food neophobia rely on self-report.
5. **Model instability:** Robust standard errors (HC3) failed for adjusted models due to sparse categorical covariates.

---

## Conclusions

1. **Gardening experience is associated with lower food neophobia scores** (H1 partially supported).
   - The correlation is modest (ρ = -0.27) but highly significant (p < 0.001).
   - High-experience gardeners score ~7.6 points lower on the Food Neophobia Scale than non-gardeners.

2. **No evidence that gardening experience is associated with greater perceived reduction in neophobia** (H2 not supported).
   - However, the small sample size (N=35) limits confidence in this null finding.

3. **Practical implication:** While gardening appears to be associated with lower food neophobia, the effect size is small. Gardening alone may not be a sufficient intervention strategy for food neophobia, but it could be a component of broader dietary exposure programs.