# Bike buyers

In [59]:
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv("bike_buyers_clean.csv")

Matplotlib is building the font cache; this may take a moment.


In [4]:
df.head()

Unnamed: 0,ID,Marital Status,Gender,Income,Children,Education,Occupation,Home Owner,Cars,Commute Distance,Region,Age,Purchased Bike
0,12496,Married,Female,40000,1,Bachelors,Skilled Manual,Yes,0,0-1 Miles,Europe,42,No
1,24107,Married,Male,30000,3,Partial College,Clerical,Yes,1,0-1 Miles,Europe,43,No
2,14177,Married,Male,80000,5,Partial College,Professional,No,2,2-5 Miles,Europe,60,No
3,24381,Single,Male,70000,0,Bachelors,Professional,Yes,1,5-10 Miles,Pacific,41,Yes
4,25597,Single,Male,30000,0,Bachelors,Clerical,No,0,0-1 Miles,Europe,36,Yes


## Što nas interesira

- Bračni status osoba sa motociklima, želimo pokazati da samci više voze motore
- Vidjeti po spolu, obrazovanje i regija tko više kupuje motocikle
    - Pretpostavka je da obazovaniji ljudi manje voze motocikle
- Vidjeti koliko osoba ima djece - pretpostavka je da ljudi sa djecom ne žele riskirati
- Vidjeti koliko osoba ima automobila - pretpostavka je da ljudi sa više automobila ne žele plaćati dodatna osgiranja i registracije
    

In [16]:
df_bracni_status = df.groupby("Marital Status")

In [17]:
df_bracni_status["Marital Status"].value_counts()

Marital Status
Married    539
Single     461
Name: count, dtype: int64

In [None]:
married = df_bracni_status["Marital Status"].value_counts()["Married"]
single = df_bracni_status["Marital Status"].value_counts()["Single"]
print(married/(married + single)*100)

53.900000000000006


In [None]:
df_bracni_status["Purchased Bike"].value_counts()

Marital Status  Purchased Bike
Married         No                307
                Yes               232
Single          Yes               249
                No                212
Name: count, dtype: int64

**Zaključak**: Samci kupuju više, ali ne značajno.

In [22]:
df_sex = df.groupby("Gender")
df_edu = df.groupby("Education")
df_region = df.groupby("Region")

In [23]:
df_sex["Purchased Bike"].value_counts()

Gender  Purchased Bike
Female  No                252
        Yes               239
Male    No                267
        Yes               242
Name: count, dtype: int64

In [24]:
df_edu["Purchased Bike"].value_counts()

Education            Purchased Bike
Bachelors            Yes               169
                     No                137
Graduate Degree      Yes                94
                     No                 80
High School          No                100
                     Yes                79
Partial College      No                146
                     Yes               119
Partial High School  No                 56
                     Yes                20
Name: count, dtype: int64

In [28]:
df_region["Purchased Bike"].value_counts()

Region         Purchased Bike
Europe         No                152
               Yes               148
North America  No                288
               Yes               220
Pacific        Yes               113
               No                 79
Name: count, dtype: int64

**Zaključak**: Neznatno više muškaraca kupuje motocikle, obrazovani više kupuju motocikle, a najviše motocikala je prodano u North America.

In [31]:
df_children = df.groupby(["Children"])
df_children["Purchased Bike"].value_counts()

Children  Purchased Bike
0         Yes               140
          No                137
1         Yes                98
          No                 72
2         No                112
          Yes                98
3         Yes                73
          No                 62
4         No                 73
          Yes                54
5         No                 63
          Yes                18
Name: count, dtype: int64

**Zaključak**: Ova teza stoji, ljudi sa više djece manje kupuju motocikle.

In [32]:
df_cars = df.groupby(["Cars"])
df_cars["Purchased Bike"].value_counts()

Cars  Purchased Bike
0     Yes               150
      No                 93
1     Yes               152
      No                115
2     No                220
      Yes               125
3     No                 52
      Yes                33
4     No                 39
      Yes                21
Name: count, dtype: int64

**Zaključak**: Ova teza stoji, ljudi sa više automobila manje kupuju motocikle.

In [50]:
df_two = df.groupby(["Purchased Bike", "Gender"])

In [46]:
df_two["Income"].agg(["count", "mean", "min", "max"])

Unnamed: 0_level_0,Unnamed: 1_level_0,count,mean,min,max
Purchased Bike,Gender,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
No,Female,252,53650.793651,10000,130000
No,Male,267,56029.962547,10000,170000
Yes,Female,239,55648.535565,10000,170000
Yes,Male,242,59338.842975,10000,160000


In [45]:
df_two["Age"].agg(["count", "mean", "min", "max"])

Unnamed: 0_level_0,Unnamed: 1_level_0,count,mean,min,max
Purchased Bike,Gender,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
No,Female,252,45.444444,25,89
No,Male,267,45.258427,26,73
Yes,Female,239,42.736402,25,72
Yes,Male,242,43.140496,25,78


In [49]:
df_two["Home Owner"].value_counts()

Purchased Bike  Gender  Home Owner
No              Female  Yes           171
                        No             81
                Male    Yes           189
                        No             78
Yes             Female  Yes           163
                        No             76
                Male    Yes           162
                        No             80
Name: count, dtype: int64

In [51]:
df_3 = df.groupby(["Purchased Bike", "Gender", "Home Owner"])

In [None]:
df_3["Income"].agg(["count", "mean", "min", "max"])

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,count,mean,min,max
Purchased Bike,Gender,Home Owner,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
No,Female,No,81,50740.740741,10000,130000
No,Female,Yes,171,55029.239766,10000,130000
No,Male,No,78,54102.564103,10000,170000
No,Male,Yes,189,56825.396825,10000,170000
Yes,Female,No,76,52368.421053,10000,170000
Yes,Female,Yes,163,57177.91411,10000,150000
Yes,Male,No,80,64000.0,10000,160000
Yes,Male,Yes,162,57037.037037,10000,160000


**Zaključak**: Oni koji kupuju već imaju kuću i to su podjenako muškarci i žene.