#### Introduction

This notebook aims to discover whether openbid and auction types make a difference on price, number of bids, bid time and bidder rate when selling different items.

If you have any idea or opinion on the result or code (the following code still has much room for improvement), I'll be glad to hear your comments. :)

In [None]:
%%time

import numpy as np
import pandas as pd

from IPython.display import display
import matplotlib.pyplot as plt
%matplotlib inline

input_dir = "../input/"

In [None]:
%%time

auctions = pd.read_csv(input_dir + 'auction.csv')
display(auctions.head(5))

Because the original data is organized by bids instead of auctions, I would like to know how many auctions for each auction type and item.

In [None]:
%%time

count_type_item = pd.get_dummies(auctions.drop_duplicates(subset='auctionid'), columns=['item'])
count_type_item = count_type_item[['openbid', 'price', 'auction_type', 'item_Cartier wristwatch', 'item_Palm Pilot M515 PDA', 'item_Xbox game console']].groupby(by=['auction_type'])[['item_Cartier wristwatch', 'item_Palm Pilot M515 PDA', 'item_Xbox game console']].sum()

print(count_type_item.sum(axis=0))
display(count_type_item)
count_type_item.plot.bar()

As we can see, this is a small dataset contains around 600 auctions. An interesting thing is, sellers prefer 7 day auction > 3 day auction > 5 day auction in general. Maybe they think the longer the auction continues, the higher the price will be. I'm not quite sure about why sellers prefer 3 day auction than 5 day aution but we can observe apparent difference between these two kinds of auction types in Palm Pilot M515 PDA and Xbox game console. However, sellers who sell Cartier wristwatch do not have such preference.

# Watch, PDA and Xbox

Here I just fill the NA in bidder rate with zero.

In [None]:
%%time

auctions['bidderrate'].fillna(0, inplace=True)
print(auctions.isnull().sum())

To observe the impact of openbid and auction types, we need to group bids which are in the same auction. For each aution, we are interested in the folllowing indices:

1. openbid
2. price
3. auction_type
4. count: how many bids in this auction
5. max of bid time: when the auction ended
6. mean of bidder rate in the auction

In [None]:
%%time

pda = auctions[auctions.item == 'Palm Pilot M515 PDA'].drop(['bidder', 'item'], axis = 1)

pda_first = pda[['auctionid', 'openbid', 'price', 'auction_type']].drop_duplicates(subset='auctionid')

grouped = pda.groupby('auctionid')
pda_count = grouped.price.agg(['count']).reset_index()
pda_bid = grouped['bid'].agg([np.min, np.max, np.mean]).reset_index().rename(columns={"amin": "bid_min", "amax": "bid_max", "mean": "bid_mean",})
pda_bidtime = grouped['bidtime'].agg([np.min, np.max, np.mean]).reset_index().rename(columns={"amin": "bidtime_min", "amax": "bidtime_max", "mean": "bidtime_mean",})
pda_bidderrate = grouped['bidderrate'].agg([np.min, np.max, np.mean]).reset_index().rename(columns={"amin": "bidderrate_min", "amax": "bidderrate_max", "mean": "bidderrate_mean",})

pda_all = pda_count.merge(pda_first, how='left', on='auctionid')
pda_all = pda_all.merge(pda_bid, how='left', on='auctionid')
pda_all = pda_all.merge(pda_bidtime, how='left', on='auctionid')
pda_all = pda_all.merge(pda_bidderrate, how='left', on='auctionid')

del pda, pda_first, pda_count, pda_bid, pda_bidtime, pda_bidderrate
display(pda_all.head(5))

We then calculate the mean of each indices and plot them. The reason we use scatter plot is that the variance is too large. Since the dataset is small enough, we just put all the data point in the scatter plot.

In [None]:
%%time

pda_type = pda_all.groupby('auction_type')[['count', 'openbid', 'price', 'bidtime_max', 'bidderrate_mean']].agg(np.mean).reset_index()
display(pda_type)

auction_types = ['3 day auction', '5 day auction', '7 day auction']
for auction_type in auction_types:
    print('Number of', auction_type, 'is', pda_all[pda_all.auction_type == auction_type].shape[0])

y_index = ['price', 'count', 'bidtime_max', 'bidderrate_mean']
for y in y_index:
    ax = pda_all[pda_all.auction_type == '3 day auction'].plot.scatter(x='openbid', y=y, color='DarkBlue', label='3 day auction')
    pda_all[pda_all.auction_type == '5 day auction'].plot.scatter(x='openbid', y=y, color='DarkGreen', label='5 day auction', ax=ax)
    pda_all[pda_all.auction_type == '7 day auction'].plot.scatter(x='openbid', y=y, color='Red', label='7 day auction', ax=ax)
    if y == 'price':
        ax.set_ylim(0, 500)

From the table which calculates the mean of each index, we may have the conclusion that when the period of auction is longer, the number of bids increase the the price is higher. But from the scatter plot, we can see the variance is too large to jump into such conclusion. Therefore, we summarize our discovery mainly from scatter plots:

##### __Price__

The price of PDA seems to have no relationship with openbid and auction types. One possible reason is that the item "Palm Pilot M515 PDA" is a specific electronic device, everyone can easily find the price on the Internet. Bidders, therefore, would not bid too high or too low.

##### __Number of Bids__

As we can see, the higher the openbid is, the smaller the number of bids. But there is no clear pattern for different auction types.

##### __bidtime_max__

The bid time max is directly related to the period of auctions. But there are some auctions ended earlier.

##### __bidderrate_min__

We think there is no clear pattern.

Although we mention the table may not be representative, there is a clear pattern: when the period of auctions is longer, sellers tend to start with a lower openbid. 

We then repeat the same thing for Cartier wristwatch and Xbox Console.

In [None]:
%%time

watch = auctions[auctions.item == 'Cartier wristwatch'].drop(['bidder', 'item'], axis = 1)

watch_first = watch[['auctionid', 'openbid', 'price', 'auction_type']].drop_duplicates(subset='auctionid')

grouped = watch.groupby('auctionid')
watch_count = grouped.price.agg(['count']).reset_index()
watch_bid = grouped['bid'].agg([np.min, np.max, np.mean]).reset_index().rename(columns={"amin": "bid_min", "amax": "bid_max", "mean": "bid_mean",})
watch_bidtime = grouped['bidtime'].agg([np.min, np.max, np.mean]).reset_index().rename(columns={"amin": "bidtime_min", "amax": "bidtime_max", "mean": "bidtime_mean",})
watch_bidderrate = grouped['bidderrate'].agg([np.min, np.max, np.mean]).reset_index().rename(columns={"amin": "bidderrate_min", "amax": "bidderrate_max", "mean": "bidderrate_mean",})

watch_all = watch_count.merge(watch_first, how='left', on='auctionid')
watch_all = watch_all.merge(watch_bid, how='left', on='auctionid')
watch_all = watch_all.merge(watch_bidtime, how='left', on='auctionid')
watch_all = watch_all.merge(watch_bidderrate, how='left', on='auctionid')

del watch, watch_first, watch_count, watch_bid, watch_bidtime, watch_bidderrate
display(watch_all.head(5))

In [None]:
%%time

watch_type = watch_all.groupby('auction_type')[['count', 'openbid', 'price', 'bidtime_max', 'bidderrate_mean']].agg(np.mean).reset_index()
display(watch_type.head(10))

auction_types = ['3 day auction', '5 day auction', '7 day auction']
for auction_type in auction_types:
    print('Number of', auction_type, 'is', watch_all[watch_all.auction_type == auction_type].shape[0])

y_index = ['price', 'count', 'bidtime_max', 'bidderrate_mean']
for y in y_index:
    ax = watch_all[watch_all.auction_type == '3 day auction'].plot.scatter(x='openbid', y=y, color='DarkBlue', label='3 day auction')
    watch_all[watch_all.auction_type == '5 day auction'].plot.scatter(x='openbid', y=y, color='DarkGreen', label='5 day auction', ax=ax)
    watch_all[watch_all.auction_type == '7 day auction'].plot.scatter(x='openbid', y=y, color='Red', label='7 day auction', ax=ax)

Here we just desribe the patterns we find different from PDA.

##### __Price__

Still no obvious difference exists between various auction types. But there is something confusing: from the scatter plot, higher openbid tends to have higher price. However it's a natural result that price would never be lower than openbid if the auction succeeded. To truly discover the benefit of higher openbid, we need to find whether higher openbid also assures the comparable success rate, which is out of scope of this dataset.

##### __bidtime_max__

There are some 7 day auctions for Cartier wristwatch ended far earlier.

The most interesting we find is, from the table, longer period does not lead to lower openbid. Why pattern of watch is different from PDA? We propose a conjecture which previous kernel has mentioned: there might be some fake items. The openbid of fake items is lower than the authentic ones, which is not contracted with our intuition. However it raises another problem: why faked items tend to be auctioned by 3 day auction? Maybe they don't want bidder to have enough time to compare, or it's difficult for ebay to detect. This is an interesting direction requiring more verifications.

In [None]:
%%time

xbox = auctions[auctions.item == 'Xbox game console'].drop(['bidder', 'item'], axis = 1)
display(xbox.head(5))

xbox_first = xbox[['auctionid', 'openbid', 'price', 'auction_type']].drop_duplicates(subset='auctionid')

grouped = xbox.groupby('auctionid')
xbox_count = grouped.price.agg(['count']).reset_index()
xbox_bid = grouped['bid'].agg([np.min, np.max, np.mean]).reset_index().rename(columns={"amin": "bid_min", "amax": "bid_max", "mean": "bid_mean",})
xbox_bidtime = grouped['bidtime'].agg([np.min, np.max, np.mean]).reset_index().rename(columns={"amin": "bidtime_min", "amax": "bidtime_max", "mean": "bidtime_mean",})
xbox_bidderrate = grouped['bidderrate'].agg([np.min, np.max, np.mean]).reset_index().rename(columns={"amin": "bidderrate_min", "amax": "bidderrate_max", "mean": "bidderrate_mean",})

xbox_all = xbox_count.merge(xbox_first, how='left', on='auctionid')
xbox_all = xbox_all.merge(xbox_bid, how='left', on='auctionid')
xbox_all = xbox_all.merge(xbox_bidtime, how='left', on='auctionid')
xbox_all = xbox_all.merge(xbox_bidderrate, how='left', on='auctionid')

del xbox, xbox_first, xbox_count, xbox_bid, xbox_bidtime, xbox_bidderrate
display(xbox_all.head(5))

In [None]:
%%time

xbox_type = xbox_all.groupby('auction_type')[['count', 'openbid', 'price', 'bidtime_max', 'bidderrate_mean']].agg(np.mean).reset_index()

display(xbox_type.head(10))

In [None]:
%%time

auction_types = ['3 day auction', '5 day auction', '7 day auction']
for auction_type in auction_types:
    print('Number of', auction_type, 'is', xbox_all[xbox_all.auction_type == auction_type].shape[0])

y_index = ['price', 'count', 'bidtime_max', 'bidderrate_mean']
for y in y_index:
    ax = xbox_all[xbox_all.auction_type == '3 day auction'].plot.scatter(x='openbid', y=y, color='DarkBlue', label='3 day auction')
    xbox_all[xbox_all.auction_type == '5 day auction'].plot.scatter(x='openbid', y=y, color='DarkGreen', label='5 day auction', ax=ax)
    xbox_all[xbox_all.auction_type == '7 day auction'].plot.scatter(x='openbid', y=y, color='Red', label='7 day auction', ax=ax)

##### __bidtime_max__

Almost all auctions ended at last minutes. Is it the characteristic of related devices of games?

Like Cartier wristwatch, longer period does not lead to lower openbid. Are there also fake Xbox consoles? Not sure, but we know there are OEM consoles and aftermarket ones. Maybe the difference is caused by them.

#### Conclusion

Here we summarize our discovery:

##### __Impact of Openbid__

The price, bidtime_max and bidderrate_mean seems to have no relationship with openbid. However, lower openbid does attract more bidders.

##### __Impact of Auction Types__

The price, number of bids and bidderrate_mean seems to have no relationship with auction types. However, the auctions usually end near the deadline.

##### __ Impact of Different Items__

Long-period auctions tend to have lower openbid. But Cartier wristwatch and Xbox console do not satisfy this pattern in 3 day auction. Our conjecture is that there exist fake watches and aftermarket Xbox console. If this is true, there are some implications:

1. For bidders: if you do not have enough time to check the authentity, you have better bid on items in 5-day auction or 7-day auction.
2. For sellers: although 7-day auction doest not assure higher price, using it may become a signal to show your item authentic if bidders know there are more authentic items in 7-day auctions. (under the assumption that sellers who sell fake items still use 3-day aution)
3. For platform: to detect the fake items, you can start from 3 day aution. But if you put protection of bidders as first priority, you should make sure there are no fake items in 7-day auctions.