# Conduct the demographic analysis on the dataset and visualise the output
by Smahi

## Scope

- Answer following questions based on different categories:

1. Age Distribution:
    - Identify the age groups that are most prevalent.  
    - Understand if there are any outliers or specific age ranges with higher concentrations.
    
    
2. Gender Distribution:
    - Determine the gender distribution of customers.   
    - Understand if there are gender-related trends or preferences in online food ordering.
    
    
3. Marital Status Distribution:
    - Identify the proportion of customers who are single, married, or in other marital statuses.
    - Explore if there are any correlations between marital status and ordering behavior.
    
    
4. Occupation Distribution:
    - Understand the most common occupations among customers.
    - Identify any patterns in online food ordering behavior based on occupation.
    
    
5. Monthly Income Distribution:
    - Analyze the distribution of customers based on their monthly income levels.
    - Identify potential customer segments with different spending capacities.
    
    
6. Educational Qualifications Distribution:
    - Explore the educational background of customers.
    - Check if there are correlations between educational qualifications and online food ordering.
    
    
7. Family Size Distribution:
    - Understand the distribution of family sizes among your customers.
    - Explore if family size has any impact on the frequency or quantity of online food orders.
    
    
These insights can help tailor marketing strategies, improve customer targeting, and make informed business decisions. For example, if you observe that a particular age group or occupation is predominant, you might adjust your advertising channels or promotions accordingly. Additionally, understanding income levels and family sizes can guide pricing strategies or the development of family-sized meal options.

## Summary
- Age group `20-25`contributes most to the orders.
- **Males** are more likely to order food online.
- `69%` of the orders are placed by **Single people**.
- Most orderes are placed by `Students` followed by `Employees` followed by `Self Employeed` and then `Housewives`.
- `53.35%` of the orders are placed by students which is majority.
- Employees also order a good `30.41%` of the times.
- Company could come up with marketing strategy focused on Students as more than half of the orders are placed by them.
- People with **No income** make most online orders, which also serves as a validation because most orders were made by Students and they do not have income majority of the times.
- Approx `6%` of people who order food online have **Ph.D**.
- `90%` of orders placed are by people with Graduate or **Post Graduate**.
- **Family size of 2 and 3** combined contributes to `56.18%` of orders.
- **Family size of 4 and 5** combined makes about `30%` of the online food order.

## Imports

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import altair as alt

In [2]:
# Load the dataset
df = pd.read_csv('C:/Users/SMAHI/Desktop/Online-food-delivery/Data/clean_data.csv') 

In [3]:
# Preview
df.head()

Unnamed: 0,Age,Gender,Marital_status,Occupation,Monthly_income,Education,Family_size,latitude,longitude,Pin_code,Output,Feedback
0,20,Female,Single,Student,No Income,Post Graduate,4,12.9766,77.5993,560001,Yes,Positive
1,24,Female,Single,Student,Below Rs.10000,Graduate,3,12.977,77.5773,560009,Yes,Positive
2,22,Male,Single,Student,Below Rs.10000,Post Graduate,3,12.9551,77.6593,560017,Yes,Negative
3,22,Female,Single,Student,No Income,Graduate,6,12.9473,77.5616,560019,Yes,Positive
4,22,Male,Single,Student,Below Rs.10000,Post Graduate,4,12.985,77.5533,560010,Yes,Positive


In [4]:
# Shape
df.shape

(388, 12)

## Demographic Analysis
### a. Age distribution

In [5]:
# Get the ages in order
sorted(df.Age.unique())

[18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33]

In [6]:
age_chart = alt.Chart(df).mark_bar().encode(
    alt.X('Age:Q', bin=True, axis=alt.Axis(format='d')),
    alt.Y('count():Q', title='Number of orders'),
    tooltip=['Age','count():Q']
).properties(
    title='Age Distribution',
    width =500
)

In [7]:
age_chart

## Plot Insights
- People with age 22 to 25 order the most among other age groups.
- The distribution is rightly skewed.

### b. Gender distribution

In [8]:
gender_chart = alt.Chart(df).mark_bar().encode(
    alt.X('Gender:N', axis = alt.Axis(labelAngle = 0)),
    alt.Y('count():Q', title='Number of orders'),
    color = 'Gender:N',
    tooltip=['Gender','count():Q']
).properties(
    title='Gender Distribution',
    width=400,  # Adjust the width as needed
    height=300
)

In [9]:
gender_chart

In [10]:
# Female percentage
(166/(166+222))*100

42.78350515463917

In [11]:
# Male percentage
100 - (166/(166+222))*100

57.21649484536083

In [12]:
# Difference
57.21649484536083 - 42.78350515463917

14.432989690721655

## Plot Insights
- Clearly Males order more frequently than females.
- There are approximately `14.43%` more males than females, with `222 males` and `166 females`.

### c. Marital status distribution

In [13]:

marital_chart = alt.Chart(df).mark_bar().encode(
    alt.X('Marital_status:N',axis = alt.Axis(labelAngle = 0)),
    alt.Y('count():Q', title='Number of orders'),
    tooltip=['Marital_status','count():Q']
).properties(
    title='Marital Status Distribution',
    width = 500,
    height = 400
)

In [14]:
marital_chart

In [15]:
# People who do not classify
12/(12+108+268)

0.030927835051546393

## Plot Insights
- `108 Married` people ordered food online.
- `268` of the people who ordered food were single.
- `69%` of the orders are placed by Single people.
- Ratio of Single to Married is `5:2`.
- Approximately `3%` of people do not wish to disclose thier Marital status.

### d. Occupation distribution

In [16]:

occupation_chart = alt.Chart(df).mark_bar().encode(
    alt.X('Occupation:N',axis = alt.Axis(labelAngle = 0)),
    alt.Y('count():Q',title='Number of orders'),
    tooltip=['count():Q']
).properties(
    title='Occupation Distribution',
    width = 500,
    height = 400
)

In [17]:
occupation_chart

## Plot Insights
- Most orderes are placed by `Students` followed by `Employees` followed by `Self Employeed` and then `Housewives`.
- `53.35%` of the orders are placed by students which is majority.
- Employees also order a good `30.41%` of the times.

### e. Monthly income distribution

In [18]:

income_chart = alt.Chart(df).mark_bar().encode(
    alt.X('Monthly_income:N',axis = alt.Axis(labelAngle = 0),
          sort=['No Income','Below Rs.10000','10001 to 25000','25001 to 50000','More than 50000']),
    alt.Y('count():Q',title='Number of orders'),
    tooltip=['Monthly_income','count():Q']
).properties(
    title='Monthly Income Distribution',
     width = 500,
    height = 400
    
)

In [19]:
income_chart

In [20]:
188/388

0.4845360824742268

## Plot Insights
- People with `No income` make most online orders, contributing to `48.45%`.
- The second most orders placed are by people who earn between `Rs.25-50K` a month, closely followed by people who make more than `50K`.

### f. Educational qualifications distribution

In [21]:

edu_chart = alt.Chart(df).mark_bar().encode(
    alt.X('Education:N',axis = alt.Axis(labelAngle = 0)),
    alt.Y('count():Q',title='Number of orders'),
    tooltip=['Education','count():Q']
).properties(
    title='Educational Qualifications Distribution',
    width = 500,
    height = 400
    
)

In [22]:
edu_chart

In [23]:
(177+174)/388

0.904639175257732

## Plot Insights
- Most people who order food online have either a `Graduate or a Post Graduate` degree.
- Approx `6%` of people who order food online have Ph.D
- `90%` of orders placed are by people with Graduate or Post Graduate.
- Only 2/388 are uneducated

### g. Family size distribution

In [24]:

family_chart = alt.Chart(df).mark_bar().encode(
    alt.X('Family_size:N',axis = alt.Axis(labelAngle = 0)),
    alt.Y('count():Q',title='Number of orders'),
    tooltip=['Family_size','count():Q']
).properties(
    title='Family Size Distribution',
    width = 500,
    height = 400
    
)

In [25]:
family_chart

## Plot Insights
- Families with 2 to 3 members tend to order food online the most.
- Families with only one member or six members order food online less often.
- Family size of 2 and 3 combined contributes to `56.18%` of orders.
- Family size of 4 and 5 combined makes about `30%` of the online food order.
- Single and family size of 6 makes `6% and 7.5%` respectively.