# Northwind project

Northwind is a classic sample database that models a small international wholesale company, Northwind Traders. It contains realistic data on customers, suppliers, employees, products, and orders, allowing you to analyze the full sales process from product sourcing to order fulfillment.

### Question 1 

For their annual review of the company pricing strategy, the Product Team wants to 
look at the products that are currently being offered for a specific price range ($20 to $50). 

In order to help them they asked you to provide them with a list of products with the following information:
1. their name
2. their unit price

Filtered on the following conditions:

1. their unit price is between 20 and 50
2. they are not discontinued

Finally order the results by unit price in a descending order (highest first).

In [31]:
import pandas as pd 

products = pd.read_csv('./data/Northwind_dataset/products.csv', encoding='latin1')

# 1) Filter on price range AND not discontinued
df_res = products[products['unitPrice'].between(20,50)] 
df_res =  df_res[df_res['discontinued'] == 0] 

df_res = (
    df_res[['productName', 'unitPrice']]
    .sort_values(by='unitPrice', ascending=False)
    .reset_index(drop=True)
    .head()
)
df_res

Unnamed: 0,productName,unitPrice
0,Tarte au sucre,49.3
1,Ipoh Coffee,46.0
2,Schoggi Schokolade,43.9
3,Vegie-spread,43.9
4,Northwoods Cranberry Sauce,40.0



### Question 2

The Logistics Team wants to do a retrospection of their performances for the year 1998, 
in order to identify for which countries they didn’t perform well. They asked you to 
provide them a list of countries with the following information:

1. their average days between the order date and the shipping date (formatted to have only 2 decimals)
2. their total number of orders (based on the order date). Filtered on the following conditions:
	1. the year of order date is 2015
	2. their average days between the order date and the shipping date is greater or equal 5 days
	3. their total number of orders is greater than 10 orders

Finally order the results by country name in an ascending order (lowest first).


In [71]:
import pandas as pd

orders = pd.read_csv("./data/Northwind_dataset/orders.csv", encoding="latin1")
customers = pd.read_csv("./data/Northwind_dataset/customers.csv", encoding="latin1")

# Merge orders with customer country
df = orders[["customerID", "orderDate", "shippedDate"]].merge(
    customers[["customerID", "country"]],
    on="customerID",
    how="inner"
)

# 1) Calculate shipping time in days
df["orderDate"] = pd.to_datetime(df["orderDate"])
df["shippedDate"] = pd.to_datetime(df["shippedDate"])
df["shipping_time"] = (df["shippedDate"] - df["orderDate"]).dt.days

# 2) Keep only orders in 1998 (or 2015 if that’s really what you want)
df = df[df["orderDate"].dt.year == 2015]

# 3) Aggregate by country
result1 = (
    df.groupby("country")
      .agg(
          average_shipping_time=("shipping_time", "mean"),
          total_orders=("customerID", "count")
      )
)

# 4) Format avg shipping time to 2 decimals
result1["average_shipping_time"] = result1["average_shipping_time"].round(2)

# 5) Apply filters: avg days >= 5, total orders > 10
result1 = result1[
    (result1["average_shipping_time"] >= 5)
    & (result1["total_orders"] > 10)
]

# 6) Sort by country ascending
result1 = result1.sort_index()

result1


Unnamed: 0_level_0,average_shipping_time,total_orders
country,Unnamed: 1_level_1,Unnamed: 2_level_1
Austria,5.89,11
Brazil,8.12,28
France,9.43,23
Germany,5.38,34
Spain,7.83,12
Sweden,13.29,14
UK,6.25,16
USA,7.89,39
Venezuela,8.73,18
