# Food supply of world regions

## Introduction

This document describes how we produced a time series of food supply and agricultural land use for each of [the world's regions](https://ourworldindata.org/grapher/continents-according-to-our-world-in-data).

The resulting data is used in the following data insight: Food supplies have grown even faster than the population â€” on every continent.

You can run this notebook on [Google Colab](https://colab.research.google.com/github/owid/etl/blob/docs-technical-publication-agriculturalindicators/docs/analyses/food_supply_of_world_regions/food_supply_of_world_regions.ipynb), or download the folder to run it on your computer: [this notebook](food_supply_of_world_regions.ipynb) | [full folder](https://catalog.owid.io/analyses/food-supply-of-world-regions.zip) (including notebook, input data and results).

For this analysis, we rely on data from FAOSTAT.
Unfortunately, if you inspect food supply and agricultural land for different world regions, you may observe abrupt changes in the data.
These abrupt changes are mostly due to a combination of the following reasons:
* Changes in FAOSTAT's methodology: the Food Balances dataset is split in a historical and a current dataset, with different methodologies.
* Changes in data coverage: some countries have data only for certain years, e.g. for Papua New Guinea there is only data from 2010 onwards.
* Changes in historical regions: the dissolution of the USSR significantly altered the geographical areas classified as Europe and Asia.


The objective is not to have very accurate absolute numbers, but rather to be able to visualize historical trends.
By the end of this notebook, we will have a data table with population, food supply (in kilocalories), and agricultural land use (in hectares), for every continent, fulfilling the following conditions:
* Regions refer to the same set of countries each year and for each indicator.
* The data does not show abrupt, spurious jumps.

## Initial setup
### Install and import requirements

To be able to run the code, there are some libraries you may need to install.

In [None]:
# If running on Colab, install all dependencies.
%pip install --upgrade owid-catalog plotly > /dev/null 2>&1

Import necessary libraries.

In [2]:
import json
import plotly.express as px
from owid.catalog import find, processing as pr

Set common variables.

In [3]:
# Other variables that should only be modified by OWID in case there is a data update.
FAOSTAT_VERSION = "2025-03-17"
GARDEN_VERSION = "2025-07-30"
GARDEN_STEP = f"https://github.com/owid/etl/blob/master/etl/steps/data/garden/agriculture/{GARDEN_VERSION}/long_term_food_and_agriculture_trends.py"
REGIONS = ["Europe", "Asia", "Oceania", "Africa", "North America", "South America"]

## Data processing

### Origin of the data

The data comes from the Food and Agriculture Organization of the United Nations (FAO). More specifically, we import the data from the following FAOSTAT datasets:
* [Food Balances (-2013, old methodology and population)](https://www.fao.org/faostat/en/#data/FBSH).
* [Food Balances (2010-)](https://www.fao.org/faostat/en/#data/FBS).
* [Land Use](https://www.fao.org/faostat/en/#data/RL).

As you can see, the Food Balances dataset is split into two, due to methodological changes in FAO: a historical dataset (containing data from 1961 until 2013), and a current dataset (with data from 2010 until the latest available year). In this document, we will not explain how we combine them, or more generally, how we process FAOSTAT raw data. If you are interested in that detailed pipeline, you can visit our [FAOSTAT data documentation](../../data/faostat.md).

One important technical note is that FAOSTAT only provides Food supply (total in kcal) in the current dataset; the historical one only provides kilocalories per capita.
So we create the data for total food supply (in kilocalories) since 1961 by multiplying per capita food supply by FAOSTAT's population data.

In [4]:
from IPython.display import Markdown
Markdown(f"The full code used to generate the data used in our charts can be found on [this file of our ETL repository]({GARDEN_STEP}).")

The full code used to generate the data used in our charts can be found on [this file of our ETL repository](https://github.com/owid/etl/blob/master/etl/steps/data/garden/agriculture/2025-07-30/long_term_food_and_agriculture_trends.py).

### Data loading

Instead of starting off with the raw FAOSTAT data, we will load the corresponding curated datasets from the OWID catalog. We will load the combined Food Balances dataset (which puts together the historical and current Food Balances datasets), from which we will load data on food supply (as well as population); and a Land Use dataset, from which we will load the land area dedicated to agriculture.

In [5]:
# Load Food Balances data from the OWID catalog.
# NOTE: This may take some times (less than a minute), and some memory (at peak, around 5 GB; once loaded, less than 300MB).
# We use "tb_" for Tables, which are similar to pandas DataFrames, but with metadata.
tb_food_balances = find("faostat_fbsc_flat", version=FAOSTAT_VERSION).load().reset_index()
# Load Land Use data from the OWID catalog.
tb_land_use = find("faostat_rl_flat", version=FAOSTAT_VERSION).load().reset_index()
# Load regions dataset from the OWID catalog.
tb_regions = find("regions", version="2023-01-01").load().reset_index()

### Minor data processing

Select only necessary columns, combine Food Balances and Land Use data, and do some minor cleaning.

In [6]:
# Select necessary columns from the curated Food Balances dataset, and rename them conveniently.
COLUMNS_FBSC = {
    "country": "country",
    "year": "year",
    "population__00002501__total_population__both_sexes__000511__thousand_number": "population",
    "total__00002901__food_available_for_consumption__000664__kilocalories_per_day": "food_supply",
}
tb_food_balances = tb_food_balances[list(COLUMNS_FBSC)].rename(columns=COLUMNS_FBSC, errors="raise")

# Select and rename columns in Land Use data.
COLUMNS_RL = {
    "country": "country",
    "year": "year",
    "agricultural_land__00006610__area__005110__hectares": "agricultural_land",
}
tb_land_use = tb_land_use[list(COLUMNS_RL)].rename(columns=COLUMNS_RL, errors="raise")

# Combine both tables.
tb = tb_food_balances.merge(tb_land_use, on=["country", "year"], how="outer", short_name="food_supply_of_world_regions")

# Convert population from thousands to persons.
tb["population"] *= 1000

# Remove empty rows.
tb = tb.dropna(subset=["population", "food_supply", "agricultural_land"], how="all").reset_index(drop=True)

We now have a table with FAOSTAT data. For each country and year, it has three indicators:
* Food supply (in daily kilocalories).
* Agricultural land use (in hectares).
* Population (in persons).

### Consistency in data coverage among indicators


If we inspect a random sample of rows in the table, we see that different indicators have different data coverage.

In [7]:
tb.sample(10, random_state=10).format(["country", "year"], short_name="Food supply of world regions")

Unnamed: 0_level_0,Unnamed: 1_level_0,population,food_supply,agricultural_land
country,year,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Albania,2012,2892190.0,9678135342.0,1201300.0
Aruba,2022,,,2000.0
Brazil,1994,159432992.0,449282194000.0,229220800.0
Channel Islands,1996,,,8400.0
Dominica,2014,69370.0,209029855.0,25000.0
Falkland Islands,2013,,,1119800.0
Libya,2018,6477790.0,20370511496.0,15350000.0
Mexico,2019,125085312.0,408982675650.0,96106000.0
Sri Lanka,1972,13001000.0,28160166000.0,2345000.0
Tuvalu,2011,,,1800.0


Population data is missing exactly where food supply is missing; this is by construction, since we extracted population data from the Food Balances dataset. However, there are rows where we have food supply (and population) data, but no agricultural land data, and vice versa.

We want to ensure that the same countries are considered in all indicators. So we will only keep rows for which there is data for all indicators. This causes some small data loss, but, for our purposes, it's more important to have consistency between indicators, to be able to visualize historical trends, than exact absolute values.

In [8]:
# Check that rows where population is missing coincide with rows where food supply is missing
assert set(tb["population"].isnull().index.tolist()) == set(tb["food_supply"].isnull().index.tolist())

# Now keep only rows for which we have data for both food supply, and agricultural land.
# This way we ensure that the data coverage of food supply and agricultural land use is the same.
tb = tb.dropna(subset=["agricultural_land", "food_supply"], how="any").reset_index(drop=True)

# After dropping rows with no food supply or agricultural land use data, population still has missing data.
# However, those missing regions are aggregates (continents and income groups), for which we don't need population data.
assert set(tb[tb["population"].isnull()]["country"]) == {'Africa', 'Asia', 'Europe', 'High-income countries', 'Low-income countries', 'Lower-middle-income countries', 'North America', 'Oceania', 'South America', 'Upper-middle-income countries'}

### Changes in historical regions and data coverage

We will first create a dictionary of regions and members, using OWID's Regions dataset.

In [9]:
# Create a dictionary of regions and their member countries.
regions = {
    region: sorted(
        set(
            tb_regions[
                (tb_regions["code"].isin(json.loads(tb_regions[tb_regions["name"] == region]["members"].item())))
            ]["name"]
        )
    )
    for region in REGIONS
}

# Add USSR as a region.
# NOTE: It is treated differently because it's a different type of region in the OWID dataset; it does not have "members" but "successors", since it's a historical region.
regions["USSR"] = sorted(
    set(
        tb_regions[
            (tb_regions["code"].isin(json.loads(tb_regions[tb_regions["name"] == "USSR"]["successors"].item())))
        ]["name"]
    )
)
# For convenience, create also a region for the USSR successor countries in Asia.
regions["USSR Asia"] = sorted(set(regions["USSR"]) & set(regions["Asia"]))

The resulting dictionary contains the members (both historical and current) of each region.
For example, for South America:

In [10]:
regions["South America"]

['Argentina',
 'Bolivia',
 'Brazil',
 'Chile',
 'Colombia',
 'Ecuador',
 'Falkland Islands',
 'French Guiana',
 'Great Colombia',
 'Guyana',
 'Paraguay',
 'Peru',
 'South Georgia and the South Sandwich Islands',
 'Suriname',
 'Uruguay',
 'Venezuela']

In [11]:
# For convenience, create a dictionary mapping machine-readable to human-readable column names; it will be used for plotting.
PLOTTED_COLUMNS = {
    "country": "Region",
    "year": "Year",
    "agricultural_land": "Agricultural land",
    "food_supply": "Food supply",
    "population": "Population",
}

# Idem for y-axis labels.
PLOTTED_LABELS = {
    "Population": "Population / people",
    "Agricultural land": "Agricultural land / hectares",
    "Food supply": "Food supply / kilocalories per day",
}


# Create convenient functions for adding regions to the main data table, and to plot indicators for regions.
def create_regions(tb, regions):
    """Add new definitions of continents, corrected for changes in historical regions and changes in data coverage.

    """
    regions_to_add = [region for region in REGIONS if f"{region} (corrected)" in regions]
    tables_corrected = [
        tb[tb["country"].isin(regions[f"{region} (corrected)"])]
        .groupby("year", as_index=False)
        .agg({"food_supply": "sum", "population": "sum", "agricultural_land": "sum"})
        .assign(**{"country": f"{region} (corrected)"})
        for region in regions_to_add
    ]
    tb_corrected = pr.concat([tb] + tables_corrected, ignore_index=True)
    # Remove duplicates (in case the function is used multiple times).
    tb_corrected = tb_corrected.drop_duplicates(subset=["country", "year"], keep="last").reset_index(drop=True)

    return tb_corrected


def plot_indicators_for_regions(tb, indicators=None, regions=None):
    # If regions is not specified, assume all regions in the table.
    if regions is None:
        regions = sorted(set(tb["country"]))
    if indicators is None:
        indicators = [indicator for indicator in PLOTTED_COLUMNS.values() if indicator not in ["Region", "Year"]]

    for column in indicators:
        px.line(tb[(tb["country"].isin(regions))].rename(columns=PLOTTED_COLUMNS), x="Year", y=column, color="Region", markers=True, title=column, range_y=[0, None], labels=PLOTTED_LABELS[column], color_discrete_sequence=["rgba(239, 85, 59, 0.8)", "rgba(0, 204, 150, 0.8)", "rgba(99, 110, 250, 0.8)"]).show()

Note that FAOSTAT doesn't have data for North America.
Instead, they have data for Northern America, Central America, South America, and Caribbean.
For convenience, to be able to compare the original and corrected regions, we create a "North America (FAO)" by adding up Northern America, Central America, and Caribbean.

In [12]:
tb_north_america_fao = (
    tb[tb["country"].isin(["Northern America (FAO)", "Central America (FAO)", "Caribbean (FAO)"])]
    .groupby("year", as_index=False)
    .agg({"population": "sum", "agricultural_land": "sum", "food_supply": "sum"})
    .assign(**{"country": "North America (FAO)"})
)
tb = pr.concat([tb, tb_north_america_fao], ignore_index=True)

#### Corrections for Europe

If we plot food supply and agricultural land for Europe, using FAOSTAT data directly, we see the following time series.

In [13]:
plot_indicators_for_regions(tb=tb, indicators=["Food supply", "Agricultural land"], regions=["Europe (FAO)"])

There, we see an abrupt decline between 1990 and 1992. This coincides with the dissolution of the USSR, for two main factors:
1. Food supplies genuinely declined due to economic collapse, disrupted trade, and reduced purchasing power.
2. Part of the data simply shifted from Europe to Asia, since some successor states are classified as Asian.

Only the first factor reflects a real change in people's lives; the second is merely a statistical artifact.

Following [OWID's definitions of current world regions](https://ourworldindata.org/world-region-map-definitions), the USSR split into 7 European countries, and 8 Asian countries.
But, to be able to properly visualize the impact of the first factor, we will modify the definitions of Europe and Asia, so that the 8 Asian successor countries stay as part of Europe.

In [14]:
# Create a list of countries that will be assigned to "Europe (corrected)".
# For now, add all European countries, plus USSR Asia successors.
# NOTE: To be able to replicate FAO's Europe data, we need to add 'Belgium-Luxembourg (FAO)'.
regions["Europe"] = sorted(set(regions["Europe"]) | set(["Belgium-Luxembourg (FAO)"]))
regions["Europe (corrected)"] = sorted(set(regions["Europe"]) | set(regions["USSR Asia"]))

In 1992, the agricultural land of the 8 USSR Asian successors is suddenly removed from Europe's agricultural land.
Indeed, we see is a decrease of 773 M ha - 498 M ha = 275 M ha.
Conversely, the same year there is an abrupt increase in Asia's agricultural land, of about 1612 M ha - 1304 M ha = 308 M ha.

However, if we compare the two abrupt changes, we notice that Europe's decrease is smaller than Asia's increase; there are an additional 308 M ha - 275 M ha = 33 M ha that are added to Asia.
Could this be a real increase in agricultural land that year?

In [15]:
land_asia_increase = tb[(tb["country"] == "Asia (FAO)") & (tb["year"] == 1992)]["agricultural_land"].item() - tb[(tb["country"] == "Asia (FAO)") & (tb["year"] == 1991)]["agricultural_land"].item()
land_europe_decrease = tb[(tb["country"] == "Europe (FAO)") & (tb["year"] == 1991)]["agricultural_land"].item() - tb[(tb["country"] == "Europe (FAO)") & (tb["year"] == 1992)]["agricultural_land"].item()
missing_area = land_asia_increase - land_europe_decrease
land_world_pre_1992 = tb[(tb["country"] == "World") & (tb["year"]<1992)]["agricultural_land"].diff()
land_world_pre_1992_increase_mean = land_world_pre_1992.iloc[-10:].mean() * 1e-6
land_world_pre_1992_increase_std = land_world_pre_1992.iloc[-10:].std() * 1e-6
land_world_1992_increase = (tb[(tb["country"] == "World") & (tb["year"] == 1992)]["agricultural_land"].item() - tb[(tb["country"] == "World") & (tb["year"] == 1991)]["agricultural_land"].item()) * 1e-6
plot_indicators_for_regions(tb, regions=["World", "Asia (FAO)", "Europe (FAO)", "North America (FAO)", "South America (FAO)", "Africa (FAO)", "Oceania (FAO)"], indicators=["Agricultural land"])

In [16]:
Markdown(f"The world's agricultural land area in the 10 years prior to 1992 was growing at an average pace of ({land_world_pre_1992_increase_mean: .0f} +/- {land_world_pre_1992_increase_std:.0f})M ha. Yet, in 1992, the world's agricultural land increased by around {land_world_1992_increase:.0f}M ha. This represents an annual growth that is {(land_world_1992_increase - land_world_pre_1992_increase_mean) / land_world_pre_1992_increase_std: .1f} standard deviations larger than the mean growth in the previous 10 years.")

The world's agricultural land area in the 10 years prior to 1992 was growing at an average pace of ( 14 +/- 9)M ha. Yet, in 1992, the world's agricultural land increased by around 39M ha. This represents an annual growth that is  2.7 standard deviations larger than the mean growth in the previous 10 years.

In [17]:
# Find out which country could explain that mismatch.
tb_land_first_year = tb.groupby("country", as_index=False).agg({"year": "min"})
countries_starting_in_1992 = tb_land_first_year[tb_land_first_year["year"]==1992]["country"]
check = tb[(tb["country"].isin(countries_starting_in_1992)) & (tb["year"]==1992)].reset_index(drop=True)
check["diff"] = abs(check["agricultural_land"] - missing_area)
check.format(short_name="Countries that could explain the missing area").sort_values("diff")[["agricultural_land"]].head()

Unnamed: 0_level_0,Unnamed: 1_level_0,agricultural_land
country,year,Unnamed: 2_level_1
Turkmenistan,1992,35350000.0
Uzbekistan,1992,27724000.0
Ukraine,1992,41929000.0
Kyrgyzstan,1992,10088000.0
Belarus,1992,9391000.0


After some investigation, a plausible explanation is that some country's data was introduced precisely in that year.
So, if we select countries whose first year of data is 1992, and sorted them by their absolute difference to 33M ha in 1992.
The country at the top of that list is Turkmenistan, with an agricultural land area of 35M ha.

Agricultural land data doesn't include Turkmenistan prior to 1992.
This can be noticed by plotting the USSR land area before and after 1992 (the later being the sum of USSR successors).
When Turkmenistan is included, there is a jump of an additional ~33M ha.
In terms of food supply and population, however, it's unclear; it seems likely that Turkmenistan is indeed included.

In [18]:
tb_ussr_without_turkmenistan = tb[(tb["country"].isin(sorted(set(regions["USSR"]) - set(["Turkmenistan"]))))].drop(columns="country").groupby("year", as_index=False).sum().assign(**{"country": "USSR (without Turkmenistan)"})
tb_ussr_with_turkmenistan = tb[(tb["country"].isin(sorted(set(regions["USSR"]))))].drop(columns="country").groupby("year", as_index=False).sum().assign(**{"country": "USSR (with Turkmenistan)"})
tb_ussr = tb[(tb["country"] == "USSR")].reset_index(drop=True)
tb_ussr_options = pr.concat([tb_ussr, tb_ussr_without_turkmenistan, tb_ussr_with_turkmenistan], ignore_index=True)
plot_indicators_for_regions(tb=tb_ussr_options)

It's therefore possible that FAOSTAT's Land Use dataset omits Turkmenistan's entire agricultural area prior to 1992.
This is not necessarily a significant issue (Turkmenistan's agricultural land is ~0.8% of the world's agricultural land), but it may be worth contacting FAOSTAT, in case it's a solvable technical issue, rather than a lack of data.

Given that we don't have access to the composition of the USSR data, there is no perfect solution.
But the sudden increase in land use when including Turkmenistan has a larger effect in Europe than the sudden decrease in food supply when removing it.
And Asia is not significantly affected in either case.
So we will remove Turkmenistan from the corrected series of Europe.

In [19]:
# Remove Turkmenistan from the corrected composition of Europe.
regions["Europe (corrected)"] = sorted(set(regions["Europe (corrected)"]) - set(["Turkmenistan"]))

These are the resulting corrected series for food supply and land use for Europe.

In [20]:
# Add Europe (corrected) to the main data table.
tb = create_regions(tb=tb, regions=regions)
plot_indicators_for_regions(tb, indicators=None, regions=["Europe (FAO)", "Europe (corrected)"])

#### Corrections for Asia

We need to be consistent with our previous choices regarding the USSR and Europe.
So, we will remove all USSR successors from Asia (since they are not included in Europe).
Additionally, we remove Turkmenistan from Asia too, because, as we discussed in the previous section, it was probably not included in the USSR data, and hence including it now could cause a spurious (although small) increase in agricultural land.
In other words, since it seems probable that Turkmenistan was missing in the data before 1992, we will not include it in the data from 1992 onwards.

Additionally, there are other data coverage issues in the Food Balances dataset:
* Brunei does not have data from 2010 onwards.
* North Korea does not have data from 2018 onwards.
* Syria does not have data prior to 2010.
* Bahrain, Qatar, and Bhutan do not have data prior to 2019.
* Oman has data only between 1990 and 2021.

To avoid spurious jumps in the data, we remove all these countries from Asia.

In [21]:
# For "Asia (corrected)", add all Asian countries, and remove USSR Asia successors (which are kept in Europe).
# Remove other countries with limited data coverage.
regions["Asia (corrected)"] = sorted(
    set(regions["Asia"])
    - set(regions["USSR Asia"])
    - set(["Bahrain", "Bhutan", "Brunei", "North Korea", "Oman", "Qatar", "Syria"])
)
# Add Asia (corrected) to the main data table.
tb = create_regions(tb=tb, regions=regions)
plot_indicators_for_regions(tb=tb, regions=["Asia (FAO)", "Asia (corrected)"])

#### Corrections for Oceania

There are no changes in historical regions in Oceania, but there are some small data coverage issues in the Food Balances dataset:
* Papua New Guinea does not have data prior to 2010.
* Marshall Islands, Micronesia (country), Nauru, Tonga, and Tuvalu do not have data prior to 2019.

To avoid spurious jumps in the data, we remove all these countries from Oceania.
It is unfortunate that the historical FAOSTAT Food Balances dataset does not contain data for Papua New Guinea; but, if we keep it in the data, the additional 7.6 million people in Oceania cause a significant spurious jump in both food supply and population.

In [22]:
# For "Oceania (corrected", remove all countries that are added after 2010 (namely Papua New Guinea, and other small islands that are added in 2019 to food supply data).
assert tb[tb["country"].isin(["Papua New Guinea"])]["year"].min() == 2010
assert (
    tb[tb["country"].isin(["Marshall Islands", "Micronesia (country)", "Nauru", "Tonga", "Tuvalu"])]["year"].min()
    == 2019
)
regions["Oceania (corrected)"] = sorted(
    set(regions["Oceania"])
    - set(["Marshall Islands", "Micronesia (country)", "Nauru", "Papua New Guinea", "Tonga", "Tuvalu"])
)
# Add Oceania (corrected) to the main data table.
tb = create_regions(tb=tb, regions=regions)
plot_indicators_for_regions(tb=tb, regions=["Oceania (FAO)", "Oceania (corrected)"])

Note that, even after correcting for data coverage issues, there is still a noticeable increase from 2009 to 2010 in food supply.
This change can actually be observed in individual countries like Australia or Kiribati, and it is likely due to changes in methodology between the historical and the current Food Balances dataset.

In [23]:
# Optionally, inspect Australia's food supply, to see a significant increase from 2009 to 2010.
# plot_indicators_for_regions(tb=tb, indicators=["Food supply"], regions=["Australia"])

#### Corrections for Africa

For Africa, there are various data coverage issues:
* Burundi, Comoros, Democratic Republic of Congo, Libya, Seychelles, and Somalia have data only from 2010 onwards.
* In 2011, Sudan (former) split into Sudan and South Sudan. However, between 2012 and 2018, the Food Balances dataset only has data for Sudan (not South Sudan).

To avoid spurious jumps in the data, we remove the data from Burundi, Comoros, Democratic Republic of Congo, Libya, Seychelles, and Somalia.

We could also exclude Sudan (former), Sudan and South Sudan, because the latter is only informed from 2019 on.
However, this would cause a significant data loss, so it may be better to keep them, despite it creating an abrupt dent in agricultural land between 2012 and 2018.

In [24]:
# For "Africa (corrected)":
# - From 2009 to 2010, we gain data for Burundi, Comoros, Democratic Republic of Congo, Libya, Seychelles, and Somalia. These countries didn't have data in FBSH, but do have in FBS.
# - In 2011, data for Sudan (former) ends, but in 2012 we only have data for Sudan (referring to North Sudan). Unfortunately, data for South Sudan in FBS starts in 2019 (hence we are missing data for South Sudan between 2012 and 2018). This causes an abrupt decrease during those years.
assert (
    tb[
        tb["country"].isin(["Burundi", "Comoros", "Democratic Republic of Congo", "Libya", "Seychelles", "Somalia"])
    ]["year"].min()
    == 2010
)
assert tb[tb["country"] == "Sudan (former)"]["year"].max() == 2011
assert tb[tb["country"] == "Sudan"]["year"].min() == 2012
assert tb[tb["country"] == "South Sudan"]["year"].min() == 2019
regions["Africa (corrected)"] = sorted(
    set(regions["Africa"])
    - set(
        [
            "Burundi",
            "Comoros",
            "Democratic Republic of Congo",
            "Libya",
            "Seychelles",
            "Somalia",
            # "Sudan",
            # "South Sudan",
            # "Sudan (former)",
        ]
    )
)
# Add Africa (corrected) to the main data table.
tb = create_regions(tb=tb, regions=regions)
plot_indicators_for_regions(tb=tb, regions=["Africa (FAO)", "Africa (corrected)"])

#### Corrections for North America

For North America, there is one change in a historical region, namely Netherlands Antilles, which was dissolved in 2010.
Additionally, there is only a minor change in data coverage, namely Bermuda, that does not have data from 2010 onwards.
To avoid spurious jumps in the data, we remove these countries from North America.
But these changes are insignificant.

In [25]:
# For "North America (corrected)":
regions["North America (corrected)"] = sorted(set(regions["North America"]) - {"Bermuda", "Netherlands Antilles"})

# Add North America (corrected) to the main data table.
tb = create_regions(tb=tb, regions=regions)
plot_indicators_for_regions(tb=tb, regions=["North America (FAO)", "North America (corrected)"])

#### Corrections for South America

No corrections were necessary for South America.

In [26]:
# South America doesn't need any corrections, but we include it here for convenience.
regions["South America (corrected)"] = regions["South America"]

# For consistency (even though there were no corrections) we add South America (corrected) to the main data table.
tb = create_regions(tb=tb, regions=regions)

## Conclusions

In this document we have imported data on food supply and agricultural land from FAOSTAT.
We identified important data issues that affected Africa, Asia, Europe, and Oceania.
North America was only minimally affected, and we found no issues affecting South America.

All issues were related to changes in (1) methodology, (2) historical regions, and (3) data coverage.

To fix those issues, the main modifications we applied to the data are as follows: 
* We keep only country-years for which all indicators (food supply and agricultural land) have data.
  * This removes 37 countries, namely American Samoa, Andorra, Aruba, British Virgin Islands, Cayman Islands, Channel Islands, Cook Islands, Equatorial Guinea, Eritrea, Falkland Islands, Faroe Islands, French Guiana, Greenland, Guadeloupe, Guam, Isle of Man, Liechtenstein, Macao, Martinique, Mayotte, Montserrat, Niue, Norfolk Island, Northern Mariana Islands, Palau, Palestine, Puerto Rico, Reunion, Saint Helena, Saint Pierre and Miquelon, San Marino, Singapore, Tokelau, Turks and Caicos Islands, United States Virgin Islands, Wallis and Futuna, and Western Sahara.
* We exclude the 8 USSR Asian successors from Asia.
  * These countries are Armenia, Azerbaijan, Georgia, Kazakhstan, Kyrgyzstan, Tajikistan, Turkmenistan, and Uzbekistan.
* We include those 8 USSR Asian successors as part of Europe.
  * Turkmenistan is not added to Europe or Asia, since it seems likely that it was not included in the agricultural land data for USSR prior to its dissolution.
* We exclude another 23 countries and regions that have limited data coverage.
  * These countries are: Bahrain, Bermuda, Bhutan, Brunei, Burundi, Comoros, Democratic Republic of Congo, Libya, Marshall Islands, Melanesia, Micronesia (country), Nauru, Netherlands Antilles, North Korea, Oman, Papua New Guinea, Polynesia, Qatar, Seychelles, Somalia, Syria, Tonga, and Tuvalu.
  Note that Sudan (former), Sudan, and South Sudan were kept in the data, despite the fact that the latter is missing between 2012 and 2018 (which creates a spurious dip in the data); removing Sudan area causes a significant loss in useful data, so it seems more reasonable to keep them.

After fixing these issues, we were able to create a time series of food supply, and a time series of agricultural land, that we can use to visualize historical trends in world regions without spurious jumps.

In [31]:
Markdown(f"The resulting data can be downloaded using [this link](https://catalog.owid.io/garden/agriculture/{GARDEN_VERSION}/long_term_food_and_agriculture_trends/long_term_food_and_agriculture_trends.csv). For more details on how the dataset was created, and further sanity checks, you can visit [the full code](https://github.com/owid/etl/blob/master/etl/steps/data/garden/agriculture/{GARDEN_VERSION}/long_term_food_and_agriculture_trends.py).")

The resulting data can be downloaded using [this link](https://catalog.owid.io/garden/agriculture/2025-07-30/long_term_food_and_agriculture_trends/long_term_food_and_agriculture_trends.csv). For more details on how the dataset was created, and further sanity checks, you can visit [the full code](https://github.com/owid/etl/blob/master/etl/steps/data/garden/agriculture/2025-07-30/long_term_food_and_agriculture_trends.py).