# Alibaba Ad Display Click Dataset
The Alibaba Ad Display Click Dataset originates from the real-world traffic logs of the Taobao Marketplace recommender system. Headquartered in Hangzhou, Zhejiang, People's Republic of China, Taobao Marketplace facilitates consumer-to-consumer (C2C) retail for small businesses and entrepreneurs.    

{numref}`ali_display_summary`: Alibaba Ad Display / Click Summary

```{table} Dataset Summary
:name: ali_display_summary

|        Users        |              User Profiles              |                 Interactions                 |    Behaviors    |
|:-------------------:|:---------------------------------------:|:---------------------------------------:|:---------------:|
|          1,140,000  |                              1,060,000  |                             26,000,000  |    700,000,000  |
```
The advertising / click interactions summarized in {numref}`ali_display_summary` represent the ads and impressions served to approximately 1.1 million randomly selected users over the eight days beginning May 6, 2017, and ending on May 13, 2017. In addition, user profiles were obtained for 1.06 million of the users to whom ads were served during the eight day period. In total, over 26 million interactions were captured.  

The dataset also includes 22 additional days of user behaviors - page views, favorite tagging, shopping cart activity and product purchases. Data collected from several Taobao departments rendered a user behavior log exceeding 700 million interactions. 

## Entity Relationship Diagram
The entity relationship diagram in {ref}`alibaba_dataset_erd` presents the data as objects, illuminating the relationships among the objects, as well as the attributes that define them. 

{ref}`alibaba_dataset_erd`: Alibaba Ad Display / Click Data Model

```{figure} ../figures/alibaba_dataset_erd.png
:name: alibaba_dataset_erd
:alt: Alibaba Dataset ERD
Alibaba Dataset Entity Relationship Diagram
``` 
Note: All data have been de-identified and desensitized in accordance with relevant privacy regulations. 

## Entity Definitions
**Raw Sample**
There are approximately 26 million user/advertisement interactions which are collectively defined by:   
- **user_id** (int): A de-identified identifier for a user.   (Composite Primary Key)
- **time_stamp** (timestamp): The timestamp when the interaction occurred. (Composite Primary Key)   
- **adgroup_id** (int): A desensitized advertising unit identifier.   
- **scenario** (varchar): Definition unspecified.  
- **no_click** (int): Binary indicator of no click. 0 if yes. 1 if no.
- **click** (int): Binary indicator of the occurence of a click. 1 if yes. 0 if no.

**Ad**
Advertising impressions are structures connecting the product, customer, and campaign. The six features are:
- **adgroup_id** (int):  A desensitized advertising unit identifier. (Primary Key)  
- **category_id** (int): A product's decensitized commodity category id.   
- **campaign_id** (int): A desensitized advertising plan identifier.  
- **customer_id** (int): A desensitized customer segment identifier.  
- **brand** (float): A desensitized brand to which the product belongs.    
- **price** (float): The price for the product. Currency not specified.  

**User**
The user file contains some 1.06 million user profiles. The nine-features captured for each user are:
- **user_id** (int): A de-identified identifier for a user.   (Primary Key)
- **cms_segid** (int): A micro-group identifier.  
- **cms_group_id** (int): Unspecified   
- **gender_code** (int): 1 for male, 2 for female.  
- **age_level** (int): Unspecified   
- **consumption_level** (float): 1.0: low- grade, 2.0: mid-grade, 3.0: high-grade.  
- **shopping_level** (int): 1: shallow user , 2: moderate user , 3: deep user
- **student** (int): 1 if user is a college student, 0 if no.     
- **city_level** (float): Unspecified.  

**Behavior**
The behavior file contains over 700 million events, having five attributes:   
 - **user_id** (int): A de-identified identifier for a user.  (Composite Primary Key)
 - **timestamp** (timestamp): The timestamp when the interaction occurred. (Composite Primary Key)   
 - **btag** (varchar): Tag describing one of the following four behaviors:     
    1. pv: Page view   
    2. fav: Like 
    3. cart: Add to shopping cart    
    4. buy: Purchase conversion
- **category_id** (int): A product's decensitized commodity category id.   
- **brand** (float): A desensitized brand to which the product belongs.    

The original dataset may be obtained from the [Alibaba Cloud Tianchi](https://tianchi.aliyun.com/dataset/dataDetail?dataId=56&userId=1) website.