Skip to content
Branch: master
Find file History
Permalink
Type Name Latest commit message Commit time
..
Failed to load latest commit information.
README.md Update on paperdoll raw dataset Jan 30, 2019
chictopia.sql.gz
usage.ipynb Update on paperdoll raw dataset Jan 30, 2019

README.md

PaperDoll raw dataset

Web-crawled data from Chictopia in Fall 2012.

The use of this dataset is subject to copyright law. We release this dataset under Japanese legislation in effect from Jan 2019, but the actual law enforcement is subject to the rule of each country.

Contents

chictopia.sql.gz
usage.ipynb
README.md

Metadata are stored in chictopia.sql.gz, and images are stored in separate LMDB as encoded binaries.

Downloading metadata

Use git to clone this repository. Note Git LFS is used to host metadata file.

git clone https://github.com/kyamagu/paperdoll

Then navigate to data/chictopia dir.

Downloading image data

Download the photo data (~40GB) at either of the following URL:

wget http://vision.is.tohoku.ac.jp/chictopia2/photos.lmdb.tar

or:

wget https://s3-ap-northeast-1.amazonaws.com/kyamagu-public/chictopia2/photos.lmdb.tar

MD5SUM is: a404789687fc3906d7d843942c802f53.

Then extract the LMDB content:

tar xf photos.lmdb.tar

Prerequisites

The dataset requires SQLite3 and LMDB.

First reconstruct the metadata SQLite3 file. The following command will reconstruct the SQLite3 database file.

gunzip -c chictopia.sql.gz | sqlite3 chictopia.sqlite3

Once the file is reconstructed, check the notebook demo for how to handle the data structure.

Photo data

LMDB stores raw PNG binary. The key is photo id and the value is binary. See the notebook demo for how to open.

Metadata structure

Here is a list of major structures in the SQLite3 tables. Most external relations are represented a name + _id field. For example, User-Post relation is stored as user_id field in posts table.

User-Posts-Photos relationships

A user has many posts. A post has many photos (up to 5 / post).

user {
  name,
  type,
  [
    post {
      title,
      description,
      [photo {}, photo {}, ...]
    },
    post {
      title,
      description,
      [photo {}, photo {}, ...]
    }
    ...
  ]
}

Example: retrieving posts of user 1234.

SELECT * FROM posts WHERE id = 1234

Post-Tag relationships

Chictopia has a fairly complicated tagging strcture. First, there are different kinds of tags.

  • Color: color of the item.
  • Style: general style of the post.
  • Occasion: occasion of the outfit.
  • Clothing: type of the item.
  • Brand: brand of the item.
  • Trend: free-form tag.

Only trend tags are stored in the trends table. Other tags are stored in the tags table.

Example: retrieving all clothing tags.

SELECT * FROM tags WHERE type = 'Clothing'

These tags are associated with a post through complex relationships. It is recommended to create a new (temporary) table from existing records to retrieve desired structured tags.

post {
  style_id,
  occasion_id,
  [
    garment {
      clothing_id,
      color_id,
      brand_id,
      trend_id
    },
    garment {
      clothing_id,
      color_id,
      brand_id,
      trend_id
    },
    ...
  ],
  [
    tagging { trend_id },
    tagging { trend_id },
    ...
  ],
  [
    color_tagging { color_id },
    color_tagging { color_id },
    ...
  ]
}

Example: retrieving 100 garment-lists by post id.

SELECT
  post_id,
  "(" || GROUP_CONCAT(name) || ")"
FROM (
  SELECT
    post_id,
    "('" ||
    replace(replace(replace(coalesce(lower(colors.name),""), "%2B", '+'), "%27", "'"), "'", "''") || "','" ||
    replace(replace(replace(coalesce(lower(brands.name),""), "%2B", '+'), "%27", "'"), "'", "''") || "','" ||
    replace(replace(replace(coalesce(lower(trends.name),""), "%2B", '+'), "%27", "'"), "'", "''") || "','" ||
    coalesce(clothings.name,"") ||
    "')" AS name
  FROM garments
  LEFT OUTER JOIN tags AS colors
    ON garments.color_id = colors.id
  LEFT OUTER JOIN tags AS brands
    ON garments.brand_id = brands.id
  LEFT OUTER JOIN tags AS clothings
    ON garments.clothing_id = clothings.id
  LEFT OUTER JOIN trends AS trends
    ON garments.trend_id = trends.id
  ) AS garment_tuples
GROUP BY post_id
LIMIT 100

User-user

There are two networks: friends and fans. Both are stored as edges in the table.

friendship, fanship {
  user_id,
  to_id
}

Schema

Following tables exist in the SQLite3 database.

Users

CREATE TABLE "users" (
  "id" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
  "name" varchar(255) NOT NULL,
  "type" varchar(255) DEFAULT 'User' NOT NULL,
  "street_id" integer,
  "views" integer DEFAULT 0 NOT NULL,
  "chic_points" integer DEFAULT 0 NOT NULL,
  "status" integer DEFAULT 0 NOT NULL,
  "description" text,
  "bookmarks_count" integer DEFAULT 0 NOT NULL,
  "comments_count" integer DEFAULT 0 NOT NULL,
  "friends_count" integer DEFAULT 0 NOT NULL,
  "fans_count" integer DEFAULT 0 NOT NULL,
  "favorite_chictopians_count" integer DEFAULT 0 NOT NULL,
  "created_at" datetime NOT NULL,
  "updated_at" datetime NOT NULL
  );

User information. The status indicates a crawler status in integer. {-1: error, 2: success}.

  • belongs to street
  • has many posts
  • has many comments
  • has many bookmarks
  • has many friendships (user_id, to_id)
  • has many fanships (user_id, to_id)

Posts

CREATE TABLE "posts" (
  "id" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
  "chictopia_id" integer NOT NULL,
  "user_id" integer,
  "type" varchar(255) DEFAULT 'Post' NOT NULL,
  "status" integer DEFAULT 0 NOT NULL,
  "votes" integer DEFAULT 0 NOT NULL,
  "title" varchar(255),
  "date" date,
  "description" text,
  "style_id" integer,
  "occasion_id" integer,
  "season_id" integer,
  "comments_count" integer DEFAULT 0 NOT NULL,
  "bookmarks_count" integer DEFAULT 0 NOT NULL,
  "created_at" datetime NOT NULL,
  "updated_at" datetime NOT NULL);

Posts in Chictopia. The status indicates a crawler status in integer. {-1: error, 2: success}.

  • belongs to user
  • belongs to tag (style_id, occasion_id)
  • has many photos
  • has many comments
  • has many bookmarks
  • has many garments
  • has many taggings
  • has many color taggings

Streets

CREATE TABLE "streets" (
  "id" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
  "name" varchar(255) NOT NULL,
  "chictopia_id" integer NOT NULL,
  "users_count" integer DEFAULT 0 NOT NULL
  );

Location names of the user.

  • belongs to user

Garment

CREATE TABLE "garments" (
  "id" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
  "clothing_id" integer,
  "color_id" integer,
  "brand_id" integer,
  "trend_id" integer,
  "store_id" integer,
  "post_id" integer NOT NULL
);

Structured tags for posts.

  • belongs to post
  • belongs to tag (clothing_id, color_id, brand_id)
  • belongs to trend
  • belongs to store

Tags

CREATE TABLE "tags" (
  "id" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
  "name" varchar(255) NOT NULL,
  "type" varchar(255) NOT NULL,
  "garments_count" integer DEFAULT 0 NOT NULL
  );

Tag names. Depending on the type, it has posts or garments.

  • has many posts
  • has many garments
  • has many color taggings

Trends

CREATE TABLE "trends" (
  "id" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
  "name" varchar(255) NOT NULL,
  "chictopia_id" integer NOT NULL,
  "taggings_count" integer DEFAULT 0 NOT NULL,
  "garments_count" integer DEFAULT 0 NOT NULL
  );

Free-form text tags.

  • has many taggings
  • has many posts

Fanships

CREATE TABLE "fanships" (
  "id" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
  "user_id" integer NOT NULL,
  "to_id" integer NOT NULL,
  "created_at" datetime,
  "updated_at" datetime
  );

User fanship relationships.

  • belongs to user (user_id, to_id)

Friendships

CREATE TABLE "friendships" (
  "id" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
  "user_id" integer NOT NULL,
  "to_id" integer NOT NULL,
  "created_at" datetime,
  "updated_at" datetime
  );

User friendship relationships.

  • belongs to user (user_id, to_id)

Bookmarks

CREATE TABLE "bookmarks" (
  "id" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
  "user_id" integer,
  "post_id" integer
  );

User bookmarking.

  • belongs to post
  • belongs to user

Stores

CREATE TABLE "stores" (
  "id" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
  "uri" varchar(255) NOT NULL,
  "name" varchar(255) NOT NULL,
  "garments_count" integer DEFAULT 0 NOT NULL
  );

Store names found in Chictopia.

  • belongs to post

Comments

CREATE TABLE "comments" (
  "id" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
  "user_id" integer NOT NULL,
  "post_id" integer NOT NULL,
  "text" text,
  "date" date,
  "comment_id" integer,
  "created_at" datetime NOT NULL,
  "updated_at" datetime NOT NULL
  );

User comments. The comment_id tracks reply to another comment.

  • belongs to post
  • belongs to user
  • belongs to comment

Color taggings

CREATE TABLE "color_taggings" (
  "id" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
  "post_id" integer,
  "color_id" integer
  );

Many-to-many relationships for colors and posts.

  • belongs to post
  • belongs to tag (color_id)

Taggings

CREATE TABLE "taggings" (
  "id" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
  "post_id" integer,
  "trend_id" integer,
  "created_at" datetime NOT NULL,
  "updated_at" datetime NOT NULL
  );

Many-to-many relationships for trends and posts.

  • belongs to post
  • belongs to trend

Photos

CREATE TABLE "photos" (
  "id" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
  "post_id" integer,
  "path" varchar(255) NOT NULL,
  "status" integer DEFAULT 0 NOT NULL,
  "file_file_name" varchar(255),
  "file_file_size" integer,
  "file_content_type" varchar(255),
  "file_updated_at" datetime,
  "width" integer,
  "height" integer,
  "created_at" datetime NOT NULL,
  "updated_at" datetime NOT NULL
  );

Photos attached to a post. The status indicates a crawler status in integer. {-1: error, 2: success}.

  • belongs to post
You can’t perform that action at this time.