Main task is to generate actionable insights from this pet food customer orders dataset.
Focusing on customers order and reorder the wet food, characteristics of the pets, and their orders.

In [None]:
# This Python 3 environment with many helpful analytics libraries installed
# as defined by the kaggle/python Docker 

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O 


import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))


In [None]:
import seaborn as sns
import matplotlib.pyplot as plt #plotting, math, stats
%matplotlib inline

In [None]:
#First, open CSV file as a pandas dataframe
df = pd.read_csv('../input/pet-food-customer-orders-online/pet_food_customer_orders.csv')
#visualize the dataframe
df

Dataset contains several NaNs on only certain columns. There may also be repeated entries but with uneven data across columns.

In [None]:
df.isna().sum()

In [None]:
wetfood=df[['customer_id',"wet_food_order_number", 'pet_order_number',
           "orders_since_first_wet_trays_order", "pet_has_active_subscription",
           'pet_food_tier','neutered', 'gender', 'pet_breed_size',
          'ate_wet_food_pre_tails','order_payment_date','wet_trays', 
          'wet_tray_size', 'total_wet_food_updates']]
wetfood          

In [None]:
sns.countplot(wetfood['total_wet_food_updates'])

In [None]:
wetfood['gender'].value_counts()

Most pets are males, but by a small margin over females:

In [None]:
sns.countplot(wetfood['gender'])

In [None]:
wetfood['neutered'].value_counts()

Most pets are nuetured:

In [None]:
mylabels='neutered','not neutered'
mycolors='green', 'gray'
plt.pie(wetfood['neutered'].value_counts(),
        labels=mylabels,autopct='%1.1f%%',
       colors=mycolors)

Most pets that customers by wet food for are small breads.   
The bigger the dog, the less likely are customers to buy wet food.

In [None]:
wetfood['pet_breed_size'].value_counts().plot(color='darkcyan')

In [None]:
wetfood['wet_trays'].value_counts()

In [None]:
height = wetfood['wet_trays'].value_counts()
bars = wetfood['wet_trays'].unique()
 
# Create horizontal bars
plt.bar(bars, height, color='green')
     
plt.xlim(0, 100)

# Show graphic
plt.show()

Most customers are subcribed and in the super premium pet food tier.

In [None]:
mylabels2='subcribed','not subscribed'
plt.pie(wetfood['pet_has_active_subscription'].value_counts(),labels=mylabels2,autopct='%1.1f%%')

In [None]:
sns.countplot(wetfood['pet_food_tier'])

In [None]:
wetfood['ate_wet_food_pre_tails'].value_counts()

In [None]:
sns.countplot(wetfood['wet_tray_size'])

In [None]:
import datetime
#remove time portion of date_time input
wetfood['order_date'] = wetfood['order_payment_date'].apply(lambda x: pd.Timestamp(x).strftime('%Y-%m-%d'))
wetfood.drop('order_payment_date', axis=1, inplace=True)

Frequency of orders per date:

In [None]:
wetfood['order_date'].value_counts().plot(color='tomato')
plt.xticks(rotation=90)

In [None]:
wetfood.describe()

In [None]:
wetfood.corr()#correlation

In [None]:
sns.heatmap(wetfood.corr())

Correlation between wet_food_order_number and orders_since_first_wet_trays_order:

In [None]:
#correlating 2 columns
wetfood['wet_food_order_number'].corr(wetfood['orders_since_first_wet_trays_order'])

There may be a correlation between wet food order number and order since 1st wet tray order.

In [None]:
wetfood['wet_food_order_number'] = wetfood['wet_food_order_number'].fillna(value=0)

In [None]:
wetbuyers= wetfood[wetfood.wet_food_order_number >=1]
wetbuyers

In [None]:
wetbuyers.describe(include='all')

Of the 49042 custumers, 12788 bought wet food at least once. Most have subscriptions, own a neutered male small-breed pet. Jan 26, 2020 had the hisghest total orders of wet food.

In [None]:
wetbuyers.groupby(wetbuyers.order_date)['wet_food_order_number'].sum().plot(figsize=(12, 6), color='orange')
plt.ylabel('Wet food orders')
plt.title("Total Wet Food Orders Per Date", fontsize=20)

In [None]:
#those that reorder
reorderers= wetbuyers[wetbuyers.orders_since_first_wet_trays_order >=1]
reorderers.describe(include="all")

Those who ordered wet food, also reordered since their 1st purchase at least once. The demographics remain the same as those who ordered wet food at least once.

In [None]:
reorderers.groupby(reorderers.order_date)['orders_since_first_wet_trays_order'].sum().plot(figsize=(12, 6), color='cadetblue')
plt.ylabel('orders since 1st wet order')
plt.title("Total orders since 1st wet order per date", fontsize=20)

Are there certain traits of the pets, customers or their orders that impact the likelihood that they will purchase wet food, and reordering it?

In [None]:
#grouped by customers
df.groupby(['customer_id']).count()

In [None]:
df.groupby(['customer_id']).count().mean()

In [None]:
#change customer ids from numeric to string
df["customer_id"]=df["customer_id"].astype(str)

In [None]:
grouped_clients = df.groupby("customer_id")['wet_food_order_number'].sum().to_frame()

In [None]:
#groups clients' data
groupWetBuyers= grouped_clients[grouped_clients.wet_food_order_number >=1]
groupWetBuyers.reset_index()

In [None]:
groupWetBuyers.describe()

Some clients repeat in the original dataframe. After combining rows counted for each customer id, there are 11168 clients listed. On average, each client have 4 to 5 pet ids, 1 wet food order, 1 order since their 1st wet order, 1 to 2 favorite flavors listed, 2 health issues listed. After grouping customers by their id, 3787 clients total ordered wet food.   

By grouping by client id, we can identify those who purchase wet food often. Among the 3787 clients that bought wet food, the most orders placed by 1 client is 252.

In [None]:
df.dtypes

In [None]:
#change data type for pet ID to string
df["pet_id"]=df["pet_id"].astype(str)

In [None]:
#get to know each pet
pets=df[['customer_id','pet_id', 'gender', 'neutered', 'pet_breed_size',
         'wet_food_order_number', 'pet_life_stage_at_order', 'pet_allergen_list',
         'pet_health_issue_list',
         'pet_has_active_subscription', 'orders_since_first_wet_trays_order',
         'ate_wet_food_pre_tails', 'pet_food_tier', 'wet_food_textures_in_order'
        ]]
pets

In [None]:
pets.describe(include='all')

In [None]:
pets2 = pets.groupby(['customer_id'])['pet_id'].apply(', '.join).reset_index()
pets2

Of the 49042 entries in the original dataframe, there are 11168 unique owners listed.

In [None]:
grouped_df1 = pets.groupby("customer_id")

grouped_lists1 = grouped_df1["pet_breed_size"].agg(lambda column: ", ".join(column))
grouped_lists1 = grouped_lists1.reset_index(name="pet_breed_size")

In [None]:
grouped_df2 = pets.groupby("customer_id")

grouped_lists2 = grouped_df2["gender"].agg(lambda column: ", ".join(column))
grouped_lists2 = grouped_lists2.reset_index(name="gender")

In [None]:
pets["neutered"]=pets["neutered"].astype(str)

In [None]:
grouped_df3 = pets.groupby("customer_id")

grouped_lists3 = grouped_df3["neutered"].agg(lambda column: ", ".join(column))
grouped_lists3 = grouped_lists3.reset_index(name="neutered")


In [None]:
grouped_df4 = pets.groupby("customer_id")

grouped_lists4 = grouped_df4["pet_life_stage_at_order"].agg(lambda column: ", ".join(column))
grouped_lists4 = grouped_lists4.reset_index(name="pet_life_stage_at_order")


In [None]:
pets = pets.astype(object).replace(np.nan, 'None')

In [None]:
grouped_df5 = pets.groupby("customer_id")

grouped_lists5 = grouped_df5["pet_health_issue_list"].agg(lambda column: ", ".join(column))
grouped_lists5 = grouped_lists5.reset_index(name="pet_health_issue_list")


In [None]:
grouped_df6 = pets.groupby("customer_id")

grouped_lists6 = grouped_df6["pet_allergen_list"].agg(lambda column: ", ".join(column))
grouped_lists6 = grouped_lists6.reset_index(name="pet_allergen_list")

In [None]:
grouped_df7 = pets.groupby("customer_id")

grouped_lists7 = grouped_df7["pet_food_tier"].agg(lambda column: ", ".join(column))
grouped_lists7 = grouped_lists7.reset_index(name="pet_food_tier")


In [None]:
grouped_df8 = pets.groupby("customer_id")

grouped_lists8 = grouped_df8["wet_food_textures_in_order"].agg(lambda column: ", ".join(column))
grouped_lists8 = grouped_lists8.reset_index(name="wet_food_textures_in_order")

In [None]:
pets['pet_has_active_subscription']=pets['pet_has_active_subscription'].astype(str)

grouped_df9 = pets.groupby("customer_id")

grouped_lists9 = grouped_df9["pet_has_active_subscription"].agg(lambda column: ", ".join(column))
grouped_lists9 = grouped_lists9.reset_index(name="pet_has_active_subscription")

In [None]:
pets['ate_wet_food_pre_tails']=pets['ate_wet_food_pre_tails'].astype(str)

grouped_df9 = pets.groupby("customer_id")

grouped_lists9 = grouped_df9["ate_wet_food_pre_tails"].agg(lambda column: ", ".join(column))
grouped_lists9 = grouped_lists9.reset_index(name="ate_wet_food_pre_tails")

In [None]:
pets['wet_food_order_number']= pets['wet_food_order_number'].replace("None", 0)
pets['orders_since_first_wet_trays_order']= pets['orders_since_first_wet_trays_order'].replace("None", 0)

In [None]:
numerics1 = pets.groupby('customer_id')['wet_food_order_number'].sum().reset_index()
numerics2 = pets.groupby('customer_id')['orders_since_first_wet_trays_order'].sum().reset_index()

In [None]:
a=pd.merge(pets2, grouped_lists1, on='customer_id')
a=pd.merge(a, grouped_lists2, on='customer_id')
a=pd.merge(a, grouped_lists3, on='customer_id')
a=pd.merge(a, grouped_lists4, on='customer_id')
a=pd.merge(a, grouped_lists5, on='customer_id')
a=pd.merge(a, grouped_lists6, on='customer_id')
a=pd.merge(a, grouped_lists7, on='customer_id')
a=pd.merge(a, grouped_lists8, on='customer_id')
a=pd.merge(a, grouped_lists9, on='customer_id')

Dataframe combining all entries per customer_id:

In [None]:
a=pd.merge(a, numerics1, on='customer_id')
a=pd.merge(a, numerics2, on='customer_id')
a

In [None]:
a.describe(include='all')

Of the 11168 costumer entries, most have ordered wet food more than once.   
Their pets are mostly small breed, mature, neutered males.   
Most of these pets dont seem to have allergies or known health issues.   
These clients are in the superpremium tier, have placed 3 to 4 orders.   
