<center><img src="MKn_Staffelter_Hof.jpeg" alt="Picture of old business"</center>
<!--Image Credit: Martin Kraft https://commons.wikimedia.org/wiki/File:MKn_Staffelter_Hof.jpg -->

Staffelter Hof Winery is Germany's oldest business, established in 862 under the Carolingian dynasty. It has continued to serve customers through dramatic changes in Europe, such as the Holy Roman Empire, the Ottoman Empire, and both world wars. What characteristics enable a business to stand the test of time?

To help answer this question, BusinessFinancing.co.uk researched the oldest company still in business in **almost** every country and compiled the results into several CSV files. This dataset has been cleaned.

Having useful information in different files is a common problem. While it's better to keep different types of data separate for data storage, you'll want all the data in one place for analysis. You'll use joining and data manipulation to work with this data and better understand the world's oldest businesses.

## The Data
`businesses` and `new_businesses`
|Column|Description|
|------|-----------|
|`business`|Name of the business (varchar)|
|`year_founded`|Year the business was founded (int)|
|`category_code`|Code for the business category (varchar)|
|`country_code`|ISO 3166-1 three-letter country code (char)|
---
`countries`
|Column|Description|
|------|-----------|
|`country_code`|ISO 3166-1 three-letter country code (varchar)|
|`country`|Name of the country (varchar)|
|`continent`|Name of the continent the country exists in (varchar)|
---
`categories`
|Column|Description|
|------|-----------|
|`category_code`|Code for the business category (varchar)|
|`category`|Description of the business category (varchar)|

In [1]:
# connect to the database
import os
import psycopg2
import pandas as pd
from dotenv import load_dotenv

# Load DB credentials from .env
load_dotenv()

conn = psycopg2.connect(
    host=os.getenv("DB_HOST", "localhost"),
    port=os.getenv("DB_PORT", "5438"),
    user=os.getenv("DB_USER", "postgres"),
    password=os.getenv("DB_PASS"),
    database=os.getenv("DB_NAME", "Oldest_Businesses_DB")
)

# Helper: run SQL and return DataFrame
def run_query(sql: str):
    return pd.read_sql(sql, conn)

# Test connection with categories table
df = run_query("SELECT * FROM categories")
print(f"Successfully connected! Categories table has {len(df)} rows")
display(df)


Successfully connected! Categories table has 19 rows


  return pd.read_sql(sql, conn)


Unnamed: 0,index,category_code,category
0,0,CAT1,Agriculture
1,1,CAT2,Aviation & Transport
2,2,CAT3,Banking & Finance
3,3,CAT4,"Cafés, Restaurants & Bars"
4,4,CAT5,Conglomerate
5,5,CAT6,Construction
6,6,CAT7,Consumer Goods
7,7,CAT8,Defense
8,8,CAT9,"Distillers, Vintners, & Breweries"
9,9,CAT10,Energy


In [2]:
# View all business categories
df_categories = run_query("SELECT * FROM categories")
display(df_categories)


  return pd.read_sql(sql, conn)


Unnamed: 0,index,category_code,category
0,0,CAT1,Agriculture
1,1,CAT2,Aviation & Transport
2,2,CAT3,Banking & Finance
3,3,CAT4,"Cafés, Restaurants & Bars"
4,4,CAT5,Conglomerate
5,5,CAT6,Construction
6,6,CAT7,Consumer Goods
7,7,CAT8,Defense
8,8,CAT9,"Distillers, Vintners, & Breweries"
9,9,CAT10,Energy


In [3]:
# What is the oldest business on each continent?
query = """
WITH ranking as(
    SELECT continent, country, business, year_founded,
        ROW_NUMBER() OVER (
            PARTITION BY c.continent
            ORDER BY b.year_founded ASC
        ) AS rn
    FROM businesses as b
    LEFT JOIN countries as c
    ON b.country_code=c.country_code
)
SELECT continent, country, business, year_founded
FROM ranking
WHERE rn=1
ORDER BY year_founded;
"""

df_oldest_per_continent = run_query(query)
print("Oldest business on each continent:")
display(df_oldest_per_continent)


Oldest business on each continent:


  return pd.read_sql(sql, conn)


Unnamed: 0,continent,country,business,year_founded
0,Asia,Japan,Kongō Gumi,578
1,Europe,Austria,St. Peter Stifts Kulinarium,803
2,North America,Mexico,La Casa de Moneda de México,1534
3,South America,Peru,Casa Nacional de Moneda,1565
4,Africa,Mauritius,Mauritius Post,1772
5,Oceania,Australia,Australia Post,1809
6,,,Meridian Corporation,1999


In [4]:
# How many countries per continent lack data on the oldest businesses
# Does including the `new_businesses` data change this?

query = """
-- Count of countries per continent with no business data (including new_businesses)
WITH all_businesses AS (
    SELECT DISTINCT country_code FROM businesses
    UNION
    SELECT DISTINCT country_code FROM new_businesses
),
missing_countries AS (
    SELECT c.continent, c.country_code
    FROM countries c
    LEFT JOIN all_businesses a
      ON c.country_code = a.country_code
    WHERE a.country_code IS NULL
)
SELECT
    continent,
    COUNT(DISTINCT country_code) AS countries_without_businesses
FROM missing_countries
GROUP BY continent
ORDER BY continent;
"""

df_missing = run_query(query)
print("Countries without business data per continent:")
display(df_missing)
print(f"\nTotal countries without data: {df_missing['countries_without_businesses'].sum()}")


Countries without business data per continent:


  return pd.read_sql(sql, conn)


Unnamed: 0,continent,countries_without_businesses
0,Africa,3
1,Asia,7
2,Europe,3
3,North America,5
4,Oceania,10
5,South America,3



Total countries without data: 31


In [5]:
# Which business categories are best suited to last over the course of centuries?
# Oldest founding year per continent-category

query = """
SELECT
  c.continent,
  cat.category,
  MIN(b.year_founded) AS year_founded
FROM businesses b
JOIN countries c
  ON b.country_code = c.country_code
JOIN categories cat
  ON b.category_code = cat.category_code
WHERE b.year_founded IS NOT NULL
GROUP BY c.continent, cat.category
ORDER BY year_founded
LIMIT 20;
"""

df_categories_longevity = run_query(query)
print("Oldest business categories by continent (Top 20):")
display(df_categories_longevity)

# Summary statistics
print("\nCategory distribution across continents:")
summary = run_query("""
SELECT cat.category, COUNT(*) as count, MIN(year_founded) as oldest
FROM businesses b
JOIN categories cat ON b.category_code = cat.category_code
WHERE b.year_founded IS NOT NULL
GROUP BY cat.category
ORDER BY oldest;
""")
display(summary)


Oldest business categories by continent (Top 20):


  return pd.read_sql(sql, conn)


Unnamed: 0,continent,category,year_founded
0,Asia,Construction,578
1,Europe,"Cafés, Restaurants & Bars",803
2,Europe,"Distillers, Vintners, & Breweries",862
3,Europe,Manufacturing & Production,864
4,Asia,"Cafés, Restaurants & Bars",1153
5,Europe,Agriculture,1218
6,Europe,Tourism & Hotels,1230
7,Europe,Mining,1248
8,Europe,Medical,1422
9,Europe,Postal Service,1520



Category distribution across continents:


  return pd.read_sql(sql, conn)


Unnamed: 0,category,count,oldest
0,Construction,2,578
1,"Cafés, Restaurants & Bars",6,803
2,"Distillers, Vintners, & Breweries",22,862
3,Manufacturing & Production,15,864
4,Agriculture,6,1218
5,Tourism & Hotels,4,1230
6,Mining,3,1248
7,Medical,1,1422
8,Postal Service,16,1520
9,Banking & Finance,37,1565
