<center><img src="MKn_Staffelter_Hof.jpeg" alt="Picture of old business"</center>
<!--Image Credit: Martin Kraft https://commons.wikimedia.org/wiki/File:MKn_Staffelter_Hof.jpg -->

Staffelter Hof Winery is Germany's oldest business, established in 862 under the Carolingian dynasty. It has continued to serve customers through dramatic changes in Europe, such as the Holy Roman Empire, the Ottoman Empire, and both world wars. What characteristics enable a business to stand the test of time?

To help answer this question, BusinessFinancing.co.uk researched the oldest company still in business in **almost** every country and compiled the results into several CSV files. This dataset has been cleaned.

Having useful information in different files is a common problem. While it's better to keep different types of data separate for data storage, you'll want all the data in one place for analysis. You'll use joining and data manipulation to work with this data and better understand the world's oldest businesses.

## The Data
`data/businesses.csv` and `data/new_businesses.csv`
|Column|Description|
|------|-----------|
|`business`|Name of the business (varchar)|
|`year_founded`|Year the business was founded (int)|
|`category_code`|Code for the business category (varchar)|
|`country_code`|ISO 3166-1 three-letter country code (char)|

`data/countries.csv`
|Column|Description|
|------|-----------|
|`country_code`|ISO 3166-1 three-letter country code (varchar)|
|`country`|Name of the country (varchar)|
|`continent`|Name of the continent the country exists in (varchar)|

`data/categories.csv`
|Column|Description|
|------|-----------|
|`category_code`|Code for the business category (varchar)|
|`category`|Description of the business category (varchar)|

Understand the factors that help a business be timeless by answering these questions:

- What is the oldest business on each continent? Save your answer as a DataFrame called oldest_business_continent with four columns: continent, country, business, and year_founded in any order.

- How many countries per continent lack data on the oldest businesses? Does including new_businesses change this? Count the number of countries per continent missing business data, including new_businesses, and store the results in a DataFrame named count_missing with columns for the continent and the count.

- Which business categories are best suited to last many years, and on what continent are they? Create a DataFrame called oldest_by_continent_category that stores the oldest founding year for each continent and category combination. It should contain three columns: continent, category, and year_founded, in that order.

In [1]:
# Import necessary libraries
import pandas as pd

# Load the data
businesses = pd.read_csv("data/businesses.csv")
new_businesses = pd.read_csv("data/new_businesses.csv")
countries = pd.read_csv("data/countries.csv")
categories = pd.read_csv("data/categories.csv")

In [7]:
businesses.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 163 entries, 0 to 162
Data columns (total 4 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   business       163 non-null    object
 1   year_founded   163 non-null    int64 
 2   category_code  163 non-null    object
 3   country_code   163 non-null    object
dtypes: int64(1), object(3)
memory usage: 5.2+ KB


In [8]:
countries.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 195 entries, 0 to 194
Data columns (total 3 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   country_code  195 non-null    object
 1   country       195 non-null    object
 2   continent     195 non-null    object
dtypes: object(3)
memory usage: 4.7+ KB


In [9]:
categories.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 19 entries, 0 to 18
Data columns (total 2 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   category_code  19 non-null     object
 1   category       19 non-null     object
dtypes: object(2)
memory usage: 436.0+ bytes


In [6]:
concated_df = pd.concat([businesses, new_businesses], axis=0)


In [10]:
merged_df = pd.merge(concated_df, countries, on='country_code')
merged_df

Unnamed: 0,business,year_founded,category_code,country_code,country,continent
0,Hamoud Boualem,1878,CAT11,DZA,Algeria,Africa
1,Communauté Électrique du Bénin,1968,CAT10,BEN,Benin,Africa
2,Botswana Meat Commission,1965,CAT1,BWA,Botswana,Africa
3,Air Burkina,1967,CAT2,BFA,Burkina Faso,Africa
4,Brarudi,1955,CAT9,BDI,Burundi,Africa
...,...,...,...,...,...,...
160,Australia Post,1809,CAT16,AUS,Australia,Oceania
161,Bank of New Zealand,1861,CAT3,NZL,New Zealand,Oceania
162,European Trust Company,1991,CAT3,VUT,Vanuatu,Oceania
163,Fiji Times,1869,CAT13,FJI,Fiji,Oceania


In [11]:
merged_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 165 entries, 0 to 164
Data columns (total 6 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   business       165 non-null    object
 1   year_founded   165 non-null    int64 
 2   category_code  165 non-null    object
 3   country_code   165 non-null    object
 4   country        165 non-null    object
 5   continent      165 non-null    object
dtypes: int64(1), object(5)
memory usage: 7.9+ KB


### Pytanie 1

- What is the oldest business on each continent? Save your answer as a DataFrame called oldest_business_continent with four columns: continent, country, business, and year_founded in any order.

In [13]:
continent_grouped_df = merged_df.groupby('continent').agg(
    year = ('year_founded', 'min')
)
continent_grouped_df.sort_values('year')

Unnamed: 0_level_0,year
continent,Unnamed: 1_level_1
Asia,578
Europe,803
North America,1534
South America,1565
Africa,1772
Oceania,1809


### Pytanie 2

- How many countries per continent lack data on the oldest businesses? Does including new_businesses change this? Count the number of countries per continent missing business data, including new_businesses, and store the results in a DataFrame named count_missing with columns for the continent and the count.

In [18]:
all_businesses = pd.concat([new_businesses, businesses])

new_all_countries = all_businesses.merge(countries, on="country_code", how="outer",  indicator=True)
new_all_countries

Unnamed: 0,business,year_founded,category_code,country_code,country,continent,_merge
0,Spinzar Cotton Company,1930.0,CAT1,AFG,Afghanistan,Asia,both
1,,,,AGO,Angola,Africa,right_only
2,ALBtelecom,1912.0,CAT18,ALB,Albania,Europe,both
3,Andbank,1930.0,CAT3,AND,Andorra,Europe,both
4,Liwa Chemicals,1939.0,CAT12,ARE,United Arab Emirates,Asia,both
...,...,...,...,...,...,...,...
190,Meridian Corporation,1999.0,CAT13,XK,Kosovo,Europe,both
191,Yemenia Airways,1962.0,CAT2,YEM,Yemen,Asia,both
192,Premier FMCG,1820.0,CAT12,ZAF,South Africa,Africa,both
193,ZamPost,1896.0,CAT16,ZMB,Zambia,Africa,both


In [19]:
new_all_countries.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 195 entries, 0 to 194
Data columns (total 7 columns):
 #   Column         Non-Null Count  Dtype   
---  ------         --------------  -----   
 0   business       165 non-null    object  
 1   year_founded   165 non-null    float64 
 2   category_code  165 non-null    object  
 3   country_code   195 non-null    object  
 4   country        195 non-null    object  
 5   continent      195 non-null    object  
 6   _merge         195 non-null    category
dtypes: category(1), float64(1), object(5)
memory usage: 9.6+ KB


In [20]:
new_missing_countries = new_all_countries[new_all_countries["_merge"] != "both"]
new_missing_countries

Unnamed: 0,business,year_founded,category_code,country_code,country,continent,_merge
1,,,,AGO,Angola,Africa,right_only
7,,,,ATG,Antigua and Barbuda,North America,right_only
18,,,,BHS,Bahamas,North America,right_only
50,,,,ECU,Ecuador,South America,right_only
59,,,,FSM,"Micronesia, Federated States of",Oceania,right_only
63,,,,GHA,Ghana,Africa,right_only
65,,,,GMB,Gambia,Africa,right_only
69,,,,GRD,Grenada,North America,right_only
79,,,,IRN,"Iran, Islamic Republic of",Asia,right_only
89,,,,KGZ,Kyrgyzstan,Asia,right_only


In [24]:
count_missing = new_missing_countries.groupby("continent").agg({"country":"count"})
count_missing.columns = ["count_missing"]
count_missing

Unnamed: 0_level_0,count_missing
continent,Unnamed: 1_level_1
Africa,3
Asia,7
Europe,2
North America,5
Oceania,10
South America,3


### Pytanie 3

- Which business categories are best suited to last many years, and on what continent are they? Create a DataFrame called oldest_by_continent_category that stores the oldest founding year for each continent and category combination. It should contain three columns: continent, category, and year_founded, in that order.

In [25]:
category_df = pd.merge(merged_df, categories, on='category_code')
category_df

Unnamed: 0,business,year_founded,category_code,country_code,country,continent,category
0,Hamoud Boualem,1878,CAT11,DZA,Algeria,Africa,Food & Beverages
1,Communauté Électrique du Bénin,1968,CAT10,BEN,Benin,Africa,Energy
2,Botswana Meat Commission,1965,CAT1,BWA,Botswana,Africa,Agriculture
3,Air Burkina,1967,CAT2,BFA,Burkina Faso,Africa,Aviation & Transport
4,Brarudi,1955,CAT9,BDI,Burundi,Africa,"Distillers, Vintners, & Breweries"
...,...,...,...,...,...,...,...
160,Australia Post,1809,CAT16,AUS,Australia,Oceania,Postal Service
161,Bank of New Zealand,1861,CAT3,NZL,New Zealand,Oceania,Banking & Finance
162,European Trust Company,1991,CAT3,VUT,Vanuatu,Oceania,Banking & Finance
163,Fiji Times,1869,CAT13,FJI,Fiji,Oceania,Media


In [33]:
oldest_by_continent_category = category_df.groupby(["continent", "category"]).agg(
        year = ('year_founded', 'min')
    )
oldest_by_continent_category

Unnamed: 0_level_0,Unnamed: 1_level_0,year
continent,category,Unnamed: 2_level_1
Africa,Agriculture,1947
Africa,Aviation & Transport,1854
Africa,Banking & Finance,1892
Africa,"Distillers, Vintners, & Breweries",1933
Africa,Energy,1968
Africa,Food & Beverages,1878
Africa,Manufacturing & Production,1820
Africa,Media,1943
Africa,Mining,1962
Africa,Postal Service,1772


# Solution

In [29]:
# Import necessary libraries
import pandas as pd

# Load the data
businesses = pd.read_csv("data/businesses.csv")
new_businesses = pd.read_csv("data/new_businesses.csv")
countries = pd.read_csv("data/countries.csv")
categories = pd.read_csv("data/categories.csv")

# What is the oldest business on every continent?

# Start by merging the businesses and countries datasets into one
businesses_countries = businesses.merge(countries, on="country_code")

# Create a new DataFrame that lists only the continent and oldest year_founded
continent = businesses_countries.groupby("continent").agg({"year_founded":"min"})

# Merge this continent DataFrame with businesses_countries
merged_continent = continent.merge(businesses_countries, on=["continent", "year_founded"])

# Subset the continent DataFrame so that only the four columns of interest are included, saving it as oldest_business_continent
oldest_business_continent = merged_continent[["continent", "country", "business", "year_founded"]]

# View the result
print(oldest_business_continent)

# How many countries per continent lack data on the oldest businesses? 
# Does including the `new_businesses` data change this?

# Add the data in new_businesses to the existing businesses
all_businesses = pd.concat([new_businesses, businesses])

# Perform a new merge between the businesses and the countries data. Use additional parameters this time to perform an outer merge and create an indicator column to better see the missing values. An outer merge combines two DataFrames based on a key column and includes all rows from both DataFrames
new_all_countries = all_businesses.merge(countries, on="country_code", how="outer",  indicator=True)

# Filter to find countries with missing business data
new_missing_countries = new_all_countries[new_all_countries["_merge"] != "both"]

# Group by continent and create a "count_missing" column
count_missing = new_missing_countries.groupby("continent").agg({"country":"count"})
count_missing.columns = ["count_missing"]

# View the results
print(count_missing)

# Which business categories are best suited to last over the course of centuries?

# Start by merging the businesses and categories data into one DataFrame
businesses_categories = businesses.merge(categories, on="category_code")

# Merge all businesses, countries, and categories together
businesses_categories_countries = businesses_categories.merge(countries, on="country_code")

# Create the oldest by continent and category DataFrame
oldest_by_continent_category = businesses_categories_countries.groupby(["continent", "category"]).agg({"year_founded":"min"})
oldest_by_continent_category.head()

       continent    country                     business  year_founded
0         Africa  Mauritius               Mauritius Post          1772
1           Asia      Japan                   Kongō Gumi           578
2         Europe    Austria  St. Peter Stifts Kulinarium           803
3  North America     Mexico  La Casa de Moneda de México          1534
4        Oceania  Australia               Australia Post          1809
5  South America       Peru      Casa Nacional de Moneda          1565
               count_missing
continent                   
Africa                     3
Asia                       7
Europe                     2
North America              5
Oceania                   10
South America              3


Unnamed: 0_level_0,Unnamed: 1_level_0,year_founded
continent,category,Unnamed: 2_level_1
Africa,Agriculture,1947
Africa,Aviation & Transport,1854
Africa,Banking & Finance,1892
Africa,"Distillers, Vintners, & Breweries",1933
Africa,Energy,1968
