# Alibaba Ad Display Click Dataset
The Alibaba Ad Display Click Dataset originates from the real-world traffic logs of the Taobao Marketplace recommender system. Headquartered in Hangzhou, Zhejiang, People's Republic of China, Taobao Marketplace facilitates consumer-to-consumer (C2C) retail for small businesses and entrepreneurs.    

{numref}`ali_display_summary`: Alibaba Ad Display / Click Summary

```{table} Dataset Summary
:name: ali_display_summary

|        Users        |    User Profiles    |     Interactions     |            Advertising Campaigns           |       Behaviors       |
|:-------------------:|:-------------------:|:--------------------:|:------------------------------------------:|:---------------------:|
|          1,140,000  |          1,061,768  |          26,557,961  |                                   846,811  |          723,268,134  |
```
The advertising / click interactions dataset summarized in {numref}`ali_display_summary` represent the raw ads and impressions served to approximately 1.1 million randomly selected users over the eight days beginning May 6, 2017, and ending on May 13, 2017. In addition, user profiles were obtained for 1.06 million of the users to whom ads were served during the eight day period. In total, some 26 million interactions were captured.  

The dataset also includes 22 additional days of user behaviors - page views, favorite tagging, shopping cart activity and product purchases. Data collected from rendered a user behavior log exceeding 700 million actions.

## Entity Relationship Diagram (Raw)
The entity relationship diagram in {ref}`alibaba_dataset_raw_erd` presents the raw data as entities, and specifies the attributes and relationships among them. 

```{figure} ../figures/alibaba_dataset_raw_erd.png
:name: alibaba_dataset_raw_erd
:alt: Alibaba Dataset Raw ERD
Alibaba Dataset Entity Relationship Diagram (Raw)
``` 
Note: All data have been de-identified and desensitized in accordance with relevant privacy regulations. 

## Entity Definitions     
Once again, there are four files in the Alibaba Ad Display / Click Dataset containing:     
- Raw samples of advertising impressions,      
- User profiles,     
- Ad campaigns, and    
- User behaviors ie. click actions such as page views, favorite tagging, adding an item to the shopping cart, and purchasing item(s).

The attributes are described below.

### Raw Sample     
Over 26 million user/advertisement interactions are collectively defined by:     
- **user** (int): A de-identified identifier for a user. (Composite Primary Key)     
- **time_stamp** (timestamp): The timestamp when the interaction occurred. (Composite Primary Key)     
- **adgroup_id** (int): A desensitized advertising unit identifier.     
- **pid** (varchar): Definition unspecified.     
- **noclk** (int): Binary indicator of no click. 0 if yes. 1 if no.     
- **clk** (int): Binary indicator of the occurence of a click. 1 if yes. 0 if no.     

### Advertising Campaign     
About 846,000 Advertising Campaigns connect customer and item. The six features are:     
- **adgroup_id** (int): A desensitized advertising unit identifier. (Primary Key)     
- **cate_id** (int): A product's decensitized commodity category id.     
- **campaign_id** (int): A desensitized advertising plan identifier.     
- **customer** (int): A desensitized customer segment identifier.     
- **brand** (float): A desensitized brand to which the product belongs.     
- **price** (float): The price for the product. Currency not specified.     

### User Profile     
Some 1.06 million user profiles are defined by the following nine attributes:     
- **userid** (int): A de-identified identifier for a user. (Primary Key)     
- **cms_segid** (int): A micro-group identifier.     
- **cms_group_id** (int): Unspecified     
- **final_gender_code** (int): 1 for male, 2 for female.     
- **age_level** (int): Unspecified     
- **pvalue_level** (float): 1.0: low- grade, 2.0: mid-grade, 3.0: high-grade.     
- **shopping_level** (int): 1: shallow user , 2: moderate user , 3: deep user     
- **occupation** (int): 1 if user is a college student, 0 if no.     
- **new_user_class_level** (float): Unspecified.     

### User Behavior     
The behavior file contains over 700 million actions, and has the following five attributes:     
- **user** (int): A de-identified identifier for a user. (Composite Primary Key)     
- **time_stamp** (timestamp): The timestamp when the interaction occurred. (Composite Primary Key)     
- **btag** (varchar): Tag describing one of the following four behaviors:     
    - pv: Page view     
    - fav: Like     
    - cart: Add to shopping cart     
    - buy: Purchase conversion     
- **cate** (int): A product's decensitized commodity category id.     
- **brand** (float): A desensitized brand to which the product belongs.     

The original dataset may be obtained from the [Alibaba Cloud Tianchi website](https://tianchi.aliyun.com/dataset/dataDetail?dataId=56&userId=1)

## Alibaba Database
The {ref}`alibaba_dataset_raw_erd` was normalized, field names were standardized and the implicit 'item' was made explicit by combining the defining attributes, 'category_id' and 'brand'. Those attributes were removed from the Ad entity, and replaced by a foreign key reference to the 'item' entity. The final version of the database design is presented in {ref}`alibaba_database_design`, below:

```{figure} ../figures/alibaba_database_design.png
:name: alibaba_database_design
:alt: Alibaba Database Design
Alibaba Ad Display Click Database Design
``` 