# Northwind project

Northwind is a classic sample database that models a small international wholesale company, Northwind Traders. It contains realistic data on customers, suppliers, employees, products, and orders, allowing you to analyze the full sales process from product sourcing to order fulfillment.

questions link:

https://github.com/ndleah/northwind/blob/main/queries/01_query.sql

### Question 1 

For their annual review of the company pricing strategy, the Product Team wants to 
look at the products that are currently being offered for a specific price range ($20 to $50). 

In order to help them they asked you to provide them with a list of products with the following information:
1. their name
2. their unit price

Filtered on the following conditions:

1. their unit price is between 20 and 50
2. they are not discontinued

Finally order the results by unit price in a descending order (highest first).

In [0]:
import pandas as pd 

products = pd.read_csv('./data/Northwind_dataset/products.csv')

# 1) Filter on price range AND not discontinued
df_res = products[products['unitprice'].between(20,50)] 
df_res =  df_res[df_res['discontinued'] == 0] 

df_res = (
    df_res[['productname', 'unitprice']]
    .sort_values(by='unitprice', ascending=False)
    .reset_index(drop=True)
    .head()
)
df_res


### Question 2

The Logistics Team wants to do a retrospection of their performances for the year 1998, 
in order to identify for which countries they didn’t perform well. They asked you to 
provide them a list of countries with the following information:

1. their average days between the order date and the shipping date (formatted to have only 2 decimals)
2. their total number of orders (based on the order date). Filtered on the following conditions:
	1. the year of order date is 2015
	2. their average days between the order date and the shipping date is greater or equal 5 days
	3. their total number of orders is greater than 10 orders

Finally order the results by country name in an ascending order (lowest first).


In [0]:
import pandas as pd

orders = pd.read_csv("./data/Northwind_dataset/orders.csv")
customers = pd.read_csv("./data/Northwind_dataset/customers.csv")

# Merge orders with customer country
df = orders[["customerid", "orderdate", "shippeddate"]].merge(
    customers[["customerid", "country"]],
    on="customerid",
    how="inner"
)

# 1) Calculate shipping time in days
df["orderdate"] = pd.to_datetime(df["orderdate"])
df["shippeddate"] = pd.to_datetime(df["shippeddate"])
df["shipping_time"] = (df["shippeddate"] - df["orderdate"]).dt.days

# 2) Keep only orders in 1998 (or 2015 if that’s really what you want)
df = df[df["orderdate"].dt.year == 2015]

# 3) Aggregate by country
result1 = (
    df.groupby("country")
      .agg(
          average_shipping_time=("shipping_time", "mean"),
          total_orders=("customerid", "count")
      )
)

# 4) Format avg shipping time to 2 decimals
result1["average_shipping_time"] = result1["average_shipping_time"].round(2)

# 5) Apply filters: avg days >= 5, total orders > 10
result1 = result1[
    (result1["average_shipping_time"] >= 5)
    & (result1["total_orders"] > 10)
]

# 6) Sort by country ascending
result1 = result1.sort_index()

result1


## Question 3
The HR Team wants to know for each employee what was their age on the date
they joined the company and who they currently report to. Provide them with 
a list of every employees with the following information:

1. their full name (first name and last name combined in a single field)
2. their job title
3. their age at the time they were hired
4. their manager full name (first name and last name combined in a single field)
5. their manager job title

Finally order the results by employee age and employee full name in an ascending order (lowest first).

In [0]:
import pandas as pd

employee = pd.read_csv('./data/Northwind_dataset/employees.csv')
employee["fullname"] = employee["firstname"] + " " + employee["lastname"]
#first merge the employee table with itself
df_merge = employee.merge(employee[["employeeid","fullname","title"]].add_prefix("manager_"),
                         left_on="reportsto",
                         right_on="manager_employeeid",
                         how="left")

# tehn, convert the birthdate and hiredate columns to datetime
df_merge['birthdate'] = pd.to_datetime(df_merge['birthdate'])
df_merge['hiredate'] = pd.to_datetime(df_merge['hiredate'])

# 1) Calculate the age of each employee at the time of hire
df_merge['age_at_hire'] = df_merge['hiredate'].dt.year - df_merge['birthdate'].dt.year

df = (df_merge[["employeeid", "fullname","title","age_at_hire","manager_fullname","manager_title"]]
        .sort_values(by=['age_at_hire','fullname'], ascending=True))

display(df)

In [0]:
employee.columns


## Question 4
The Logistics Team wants to do a retrospection of their global performances over 1997-1998, 
in order to identify for which month they perform well. They asked you to provide them a list with:

1. their year/month as single field in a date format (e.g. “1996-01-01” January 1996)
2. their total number of orders
3. their total freight (formatted to have no decimals)

Filtered on the following conditions:
1. the order date is between 1997 and 1998
2. their total number of orders is greater than 35 orders 

Finally order the results by total freight (descending order).


In [0]:
import pandas as pd

orders = pd.read_csv("./data/Northwind_dataset/orders.csv")
order_details = pd.read_csv("./data/Northwind_dataset/orders_details.csv")
products = pd.read_csv("./data/Northwind_dataset/products.csv")

#date condition
orders["orderdate"] = pd.to_datetime(orders["orderdate"])
orders = orders[orders["orderdate"].between("1997-01-01", "1998-12-31")]
#Build a year/month field floored to first of month, e.g. 1997-01-01
orders["year_month"] = orders["orderdate"].dt.to_period("M").dt.to_timestamp().dt.date

#calculating total orders and freights number
df = (
    orders.merge(order_details, on="orderid", how="inner")
    .groupby("year_month")
    .agg(
        total_orders=("orderid", "nunique"),
        total_freights=("freight", "sum"))
    .reset_index()
)

#order condition
df = df[df["total_orders"] > 35]

#Format total_freight to have no decimals (integer)
df["total_freights"] = df["total_freights"].round(0).astype(int)
display(df)
len(df)