In [1]:
import pandas as pd


You are a Product Analyst working on **Nike**'s marketing performance team. Your team wants to evaluate the effectiveness of celebrity product collaborations by analyzing sales data. You will investigate the performance of celebrity product drops to inform future marketing strategies.

In [2]:
# Load the CSV file into a DataFrame
fct_sales = pd.read_csv('fct_sales.csv')

# Display de DataFrame
print("Sales Data:")
print(fct_sales)


Sales Data:
    sale_id   sale_date  product_id  sale_amount  celebrity_id
0         1  2025-01-10         901          NaN           101
1         2  2025-01-15         901      1500.00           101
2         3  2025-02-03         902      2000.50           102
3         4  2025-03-12         903      2500.75           103
4         5  2025-03-20         904          NaN           104
5         6  2025-02-28         901      1000.00           101
6         7  2025-03-25         902       300.00           102
7         8  2025-03-30         905      1800.00           105
8         9  2025-01-20         903      1200.00           103
9        10  2025-02-05         906       500.00           106
10       11  2025-03-01         907      2200.00           107
11       12  2025-02-15         908      1300.00           101
12       13  2025-03-15         909          NaN           102
13       14  2025-01-25         910       900.00           108
14       15  2025-02-20         905       7

### Question 1 of 3

For Q1 2025 (January 1st through March 31st, 2025), can you identify all records of celebrity collaborations from the sales data where the `sale_amount` is missing? This will help us flag incomplete records that could impact the analysis of Nike's product performance.

In [3]:
# Filter for Q1 2025 and missing sale_amount
q1_2025_missing_sales = fct_sales[
    (fct_sales['sale_date'] >= '2025-01-01') &
    (fct_sales['sale_date'] <= '2025-03-31') &
    (fct_sales['celebrity_id'].notnull()) &
    (fct_sales['sale_amount'].isnull())
]

print("Q1 2025 Celebrity Collaborations with Missing Sale Amount:")
print(q1_2025_missing_sales)


Q1 2025 Celebrity Collaborations with Missing Sale Amount:
    sale_id   sale_date  product_id  sale_amount  celebrity_id
0         1  2025-01-10         901          NaN           101
4         5  2025-03-20         904          NaN           104
12       13  2025-03-15         909          NaN           102


### Question 2 of 3

For Q1 2025 (January 1st through March 31st, 2025), can you list the unique combinations of `celebrity_id` and `product_id` from the sales table? This will ensure that each collaboration is accurately accounted for in the analysis of Nike's marketing performance.

In [4]:
# List unique combinations of celebrity_id and product_id for Q1 2025
q1_2025_collaborations = fct_sales[
    (fct_sales['sale_date'] >= '2025-01-01') &
    (fct_sales['sale_date'] <= '2025-03-31') &
    (fct_sales['celebrity_id'].notnull())
][['celebrity_id', 'product_id']].drop_duplicates()

print("Unique celebrity_id and product_id combinations for Q1 2025:")
print(q1_2025_collaborations)


Unique celebrity_id and product_id combinations for Q1 2025:
    celebrity_id  product_id
0            101         901
2            102         902
3            103         903
4            104         904
7            105         905
9            106         906
10           107         907
11           101         908
12           102         909
13           108         910


### Question 3 of 3

For Q1 2025 (January 1st through March 31st, 2025), can you rank the unique celebrity collaborations based on their total sales amounts and list the top 3 collaborations in descending order? This will help recommend the most successful partnerships for Nike's future product drop strategies.

In [5]:
# Filter for relevant sales data for Q1 2025
q1_2025_sales = fct_sales[
    (fct_sales['sale_date'] >= '2025-01-01') &
    (fct_sales['sale_date'] <= '2025-03-31') &
    (fct_sales['celebrity_id'].notnull()) &
    (fct_sales['sale_amount'].notnull())
]

# Group by celebrity_id and product_id, summing the sale_amount
collab_sales = (
    q1_2025_sales
    .groupby(['celebrity_id', 'product_id'], as_index=False)['sale_amount']
    .sum()
    .rename(columns={'sale_amount': 'total_sales'})
)

# Rank by total_sales (descending), dense method for ties
collab_sales['rank'] = collab_sales['total_sales'].rank(method='dense', ascending=False)

# Filter for top 3 ranks (with possible ties)
top3_with_ties = collab_sales[collab_sales['rank'] <= 3].sort_values(['rank', 'total_sales'], ascending=[True, False])

print("Top 3 celebrity collaborations by total sales amount for Q1 2025 (with possible ties):")
print(top3_with_ties)


Top 3 celebrity collaborations by total sales amount for Q1 2025 (with possible ties):
   celebrity_id  product_id  total_sales  rank
2           102         902      3800.50   1.0
3           103         903      3700.75   2.0
0           101         901      2500.00   3.0
4           105         905      2500.00   3.0
