Instacart Basket Analysis.

Instacart, an online grocery store that operates through an app.
Instacart already has very good sales, but they want to uncover more information about their sales patterns.
My task was to perform an initial data and exploratory analysis of some of their data in order to derive insights and suggest strategies for better segmentation based on the provided criteria.

Developed skills:

Exploratory Analysis using open-source data sets.
Provided business insights and strategy suggestions for marketing and segmentation.
Data set analysis to detect patterns
Data wrangling
Data merging
Deriving variables
Grouping data
Aggregating data
Python, creating scripts to perform data wrangling and merge data frames.
Python, Data visualization.
Reporting in Excel

A list of Jupyter notebooks and functions for Instacard tasks.

Or Refer to the short video Jupyter Notebook with the code.

Some code and visualization options are shown here, if You want to see the full report look this file

Table of Contents

Data importing
Data checks and merging dataframes
PII privacy checks
Assign regions

#creating region flags with loc
df.loc[df['state'].isin(northeast), 'region'] = 'Northeast'
df.loc[df['state'].isin(midwest), 'region'] = 'Midwest'
df.loc[df['state'].isin(south), 'region'] = 'South'
df.loc[df['state'].isin(west), 'region'] = 'West'

Flag and exclude customers with less than 5 orders

# creating activity flag using loc
df.loc[df['max_order']< 5, 'activity_flag'] = 'Low Activity'
df.loc[df['max_order'] >= 5, 'activity_flag'] = 'High Activity'.

Creating customer profile
6a. Age Profile

# creating age group profile
df_high_activity.loc[df_high_activity['age']<=30, 'age_profile']= 'Young Adult'
df_high_activity.loc[(df_high_activity['age']>30) & (df_high_activity['age']<65), 'age_profile']= "Mid Adult"
df_high_activity.loc[df_high_activity['age']>=65, 'age_profile']= 'Senior'
df_high_activity['age_profile'].value_counts()

6b. Parental Age Profile

# create parental age profile for different groups
df_high_activity.loc[(df_high_activity['age']<30) & (df_high_activity['no_of_dependents']>=1), 'parental_age_profile']= 'Young Parent'
df_high_activity.loc[(df_high_activity['age']>=30) & (df_high_activity['age']<40) & (df_high_activity['no_of_dependents']>=1), 'parental_age_profile']= 'Mid Parent'
df_high_activity.loc[(df_high_activity['age']>=40) & (df_high_activity['no_of_dependents']>=1), 'parental_age_profile']= 'Older Parent'
df_high_activity['parental_age_profile'].value_counts()

6c. Single Parent Profile (Dependants Profile)

crosstabfam= pd.crosstab(df_high_activity['marital_status'], df_high_activity['no_of_dependents'], dropna=False)
crosstabfam

# create parental-relationship status flag for different groups
df_high_activity.loc[(df_high_activity['marital_status'] == 'single') & (df_high_activity['no_of_dependents']>=1), 'parental_relationship_status']= 'Single Parent'
df_high_activity.loc[(df_high_activity['marital_status'] == 'divorved/widowed') & (df_high_activity['no_of_dependents']>=1), 'parental_relationship_status']= 'Single Parent'
df_high_activity.loc[(df_high_activity['marital_status'] == 'living with parents and siblings') & (df_high_activity['no_of_dependents']>=1), 'parental_relationship_status']= 'Single Parent'
df_high_activity.loc[(df_high_activity['marital_status'] =='married') & (df_high_activity['no_of_dependents']>=1), 'parental_relationship_status']= 'Nuclear Family'
df_high_activity['parental_relationship_status'].value_counts()

6d. Relationship-Age Profile

# create a relationship-age flag
crosstab_ra= pd.crosstab(df_high_activity['age_profile'], df_high_activity['marital_status'], dropna=False)
crosstab_ra

df_high_activity.loc[(df_high_activity['age_profile']=='Mid Adult') & (df_high_activity['marital_status'] == 'divorced/widowed'), 'relationship_age_status']= 'Single Mid Adult'
df_high_activity.loc[(df_high_activity['age_profile']=='Mid Adult') & (df_high_activity['marital_status'] == 'single'), 'relationship_age_status']= 'Single Mid Adult'
df_high_activity.loc[(df_high_activity['age_profile']=='Mid Adult') & (df_high_activity['marital_status'] == 'living with parents and siblings'), 'relationship_age_status']= 'Single Mid Adult'
df_high_activity.loc[(df_high_activity['age_profile']=='Mid Adult') & (df_high_activity['marital_status']=='married'), 'relationship_age_status']= 'Married Mid Adult'
df_high_activity.loc[(df_high_activity['age_profile']=='Senior') & (df_high_activity['marital_status'] == 'divorced/widowed'), 'relationship_age_status']= 'Single Senior'
df_high_activity.loc[(df_high_activity['age_profile']=='Senior') & (df_high_activity['marital_status'] == 'single'), 'relationship_age_status']= 'Single Senior'
df_high_activity.loc[(df_high_activity['age_profile']=='Senior') & (df_high_activity['marital_status'] == 'living with parents and siblings'), 'relationship_age_status']= 'Single Senior'
df_high_activity.loc[(df_high_activity['age_profile']=='Senior') & (df_high_activity['marital_status']=='married'), 'relationship_age_status']= 'Married Senior'
df_high_activity.loc[(df_high_activity['age_profile']=='Young Adult') & (df_high_activity['marital_status'] == 'divorced/widowed'), 'relationship_age_status']= 'Single Young Adult'
df_high_activity.loc[(df_high_activity['age_profile']=='Young Adult') & (df_high_activity['marital_status'] == 'single'), 'relationship_age_status']= 'Single Young Adult'
df_high_activity.loc[(df_high_activity['age_profile']=='Young Adult') & (df_high_activity['marital_status'] == 'living with parents and siblings'), 'relationship_age_status']= 'Single Young Adult'
df_high_activity.loc[(df_high_activity['age_profile']=='Young Adult') & (df_high_activity['marital_status']=='married'), 'relationship_age_status']= 'Married Young Adult'

6e. Income Profile.

# creating Income profile
df_high_activity.loc[df_high_activity['income']<=50000, 'income_profile']= 'Low Class'
df_high_activity.loc[(df_high_activity['income']>51000) & (df_high_activity['income']<=150000), 'income_profile']= "Mid Class"
df_high_activity.loc[df_high_activity['income']>=151000, 'income_profile']= 'High Class'
df_high_activity['income_profile'].value_counts()

6f. Gender Profile
6g. Deparments Profiles - Alcohol, Baby items, International items 7. Creating an appropriate visualization to show the distribution of profiles
7a. Age Profile
7b. Parental Age Profile. 7c. Single Parent Profile (Dependants Profile). 7d. Relationship-Age Profile
7e. Income Profile
7f. Gender Profile
7g. Deparments Profiles - Alcohol, Baby items, International items

df_high_activity['alcohol_flag']= [1 if x== 'alcohol' else 0 for x in df_high_activity['department']]
df_high_activity['alcohol_flag']= df_high_activity.groupby('user_id')['alcohol_flag'].transform(np.max)
df_high_activity['alcohol_flag'].value_counts(dropna=False)

Aggregating the max, mean, and min variables on a customer-profile level for usage frequency and expenditure
8a. Age Profile

# aggregate for age profile expenditure
df_unique_customers.groupby('age_profile').agg({'prices': ['mean', 'min', 'max']})

8b. Parental Age Profile
8c. Single Parent Profile (Dependants Profile)
8d. Income Profile
8e. Relationship-Age Profile
8f. Gender Profile
8g. Deparments Profiles - Alcohol, Baby items, International items. 9. Comparison Visualizations customer profiles with regions and departments
9a. Region Distribution

# Total customers across regions
bar_total_customers=df_unique_customers['region'].value_counts(dropna=False).sort_values(ascending=True).plot.barh(color='teal', fontsize=15)
plt.title('Total customers in each region', fontsize=18)
plt.xlabel('Number of customers', fontsize=15)
plt.legend(fontsize=15)

Output
9b. Order Frequency Distribution
9c. Price Range Distribution

# Grouped bar chart of price range products by regions
bar_ppr_region=ppr_region.plot.bar(color=['teal', 'goldenrod', 'darkkhaki'], fontsize=12)
plt.title('Distribution price ranges in US regions', fontsize=20)
plt.legend(ncol=3, fontsize=12)

Output

9d. Distribution Departments by Regions

# create crosstab for Departments and Regions
orders_region=pd.crosstab(index=df_unique_customers['department'], columns=df_unique_customers['region'], normalize='index')
dept_region

# create stacked bar chart for Departments by Regions
dept_region_stacked=dept_region.plot(kind='bar',
                                    stacked=True,
                                    color=['darkseagreen', 'olivedrab', 'teal', 'powderblue', 'bisque', 'orange'],
                                    figsize=(12,10))
plt.title('Distribution of Departments by Region', fontsize = 20)
plt.legend(loc='upper center', ncol=4)

Output
.

Basic Visualisation
10a. Departments

# Distribution by Departments
department_bar=df_high_activity['department'].value_counts(dropna=False).sort_values(ascending=True).plot.barh(color=['olivedrab'], fontsize=12)
plt.title('Distribution by Deparments', fontsize=20)
plt.xlabel('The Amount of Products Purchased', fontsize=15)

Output
10b. Product Price Ranges

#create pie chart for product price range distribution
price_pie=df_high_activity['price_range_loc'].value_counts().plot.pie(shadow=True,
                                                            startangle=90,
                                                            autopct=('%1.f%%'),
                                                            colors=['steelblue', 'Sienna', 'paleturquoise'],
                                                            label='')

Output

10c. Customer Loyalty
10d. Customer Spending
10e. Order Frequency
11. Exporting Final Dataset to_pickle.

Other Received pictures during the analysis.

bar_chart_orders_day_of_week.
bar_chart_parental_age_distribution.
bar_chart_parental_age_region_stacked
bar_ppr_region
bar_relationship_age_distribution
bar_relationship_single_parent_distribution
bar_total_customers
gender_distribution_pie
histogram_income
hist_prices
income_distribution_hist
income_distribution_of_customres
income_distribution
line_dow_prices
line_fam_status_age
line_hour_of_day
pei_alcohol_distribution
pie_baby_distribution
pie_customer_spending
pie_international_distribution
pie_loyalty_to_brand
pie_order_frequency
pie_pet_distribution
price_range_region_stacked
splot_income
splot_prices
stacked_age_order_freq
stacked_department_region
stacked_family_order_freq
stacked_family_stat_price_range
stacked_parental_relationship_region
stacked_order_frequency_region

Recommendations:

After reviewing all the charts above, we should focus on showing ads to 3 main groups: young adults, young parents, and people from the upper economic class. As stated earlier, you adults and parents may be struggling to afford using Instacart, so we can create a discount program for students or people with dependents. In addition, we can show more ads that target people from the upper classes because they have the money to afford more expensive products and can increase the company's revenue. They already have the funds, we just need to create an easy environment to keep our customers coming back to us and not to our competitors.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
01 Project Management		01 Project Management
03 Scripts		03 Scripts
04 Analysis		04 Analysis
05 Sent to client		05 Sent to client
.gitattributes		.gitattributes
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

01 Project Management

01 Project Management

03 Scripts

03 Scripts

04 Analysis

04 Analysis

05 Sent to client

05 Sent to client

.gitattributes

.gitattributes

README.md

README.md

Repository files navigation

Instacart Basket Analysis.

Developed skills:

A list of Jupyter notebooks and functions for Instacard tasks.

Other Received pictures during the analysis.

Recommendations:

About

Languages

halinakryvanos/Instacart-Grocery-Basket-Analysis-CareerFoundry-Practice

Folders and files

Latest commit

History

Repository files navigation

Instacart Basket Analysis.

Developed skills:

A list of Jupyter notebooks and functions for Instacard tasks.

Other Received pictures during the analysis.

Recommendations:

About

Resources

Stars

Watchers

Forks

Languages