# hicetnunc.xyz open dataset and parsers

[Hic et nunc](http://hicetnunc.xyz) is a new eco-friendly [NFT](https://en.wikipedia.org/wiki/Non-fungible_token) marketplace, built on top of [Tezos](https://en.wikipedia.org/wiki/Tezos) blockchain.

It is especially popular in generative graphics and data viz community, so I've decided to share data and all scripts that I've made for https://hashquine.github.io/hicetnunc rating.

It is published under [CC BY](https://creativecommons.org/licenses/by/2.0/) license, so that it is even possible to sell NFTs that use that data (or modified scripts) as long as there is the following phrase somewhere in the token description: `based on @hashquine dataset`.

Since hic et nunc servers are already under an extreme load due to quick growth, I've reorganized code, so that all data is taken from Tezos blockchain and IPFS **without any calls** to the [hicetnunc.xyz](http://hicetnunc.xyz) website or API. 

## Data sources

* Blockchain transactions by [TzStats API](https://tzstats.com/docs/api#tezos-api) ([better-call.dev](https://better-call.dev) was not used in order not to interfere with hicetnunc backend).
* [IPFS](https://ru.wikipedia.org/wiki/IPFS) by [cloudflare-ipfs.com](https://cloudflare-ipfs.com/) and [ipfs.io](https://ipfs.io/) depending on mime type (same sources as in hicetnunc frontend).
* Wallet address owner metadata (name, Twitter etc.) from [api.tzkt.io](https://api.tzkt.io/#operation/Accounts_GetMetadata) (same source as in hicetnunc frontend).

## What data is available

* Money data: list of all purchases, prices and commissions.
* All NFTs raw files, their previews and thumbnails, although 3d files and interactive SVG/HTML files are not yet processed properly.
* Authors metadata [verified via tzkt.io](https://github.com/hicetnunc2000/hicetnunc/blob/main/FAQ.md#how-to-get-verified) like Twitter account address.
* Token transfers: list of changes of tokens owners including burns and direct transfers.
* All metadata available for tokens.
* Swaps and mints.

Data not available:
* Everything connected with [hDAO tokens](https://github.com/hicetnunc2000/hicetnunc/blob/main/FAQ.md#what-are-those-little-circles-on-each-post-hdao-what-is-that) and [hDAO feed](https://www.hicetnunc.xyz/hdao). Although all related transactions are already being collected, they are not analysed yet.
* Twitter statistics like the number of followers.
* Direct money transfers between users, when NFT tokens are not transferred in the same transaction.

## Dataset schema

The goal was to simplify data analysis and visualization with a wide range of existing tools, so there are lots of redundant fields, which contain precalculated aggregations and different representations of the same data.

All files have two equivalent versions: JSON and CSV.
* JSON files are dictionary of dictionaries with rows of CSV files are indexed by the `*_id` field.
* CSV files have commas as delimiters.
* Fields values are ether numbers or strings, empty values represented by `-1` or `""`.
* All identifiers are strings.

Any field, which references some event in the blockchain (for example, mint time) have 4 representations:
* `mint_iso_date` &mdash; string with UTC date and time: `"2021-03-01T15:00:00Z"`,
* `mint_stamp` &mdash; integer Unix timestamp in seconds: `1614610800`,
* `mint_hash` &mdash; string with transaction hash, where event occurred: `"oom5Ju6X9nYpBCi..."`,
* `mint_row_id` &mdash; integer with global unique operation id (internal to TzStats) with that event: `42181049`

Any field, which references a set of values (like the set of prices of sold works), have following aggregations:
* `sold_count` &mdash; values count (excl. zeros),
* `sold_zero_count` &mdash; number of zeros,
* `sold_price_min` &mdash; minimum value (excl. zeros),
* `sold_price_max` &mdash; maximum value,
* `sold_price_sum` &mdash; sum of values,
* `sold_price_avg` &mdash; average value (sum divided by count excl. zeros).


## Make readme.md

In [6]:
import json
from pathlib import Path

nb_json = json.loads(Path('./210329_dataset_schema.ipynb').read_text('utf-8'))

assert nb_json['cells'][0]['cell_type'] == 'markdown'
readme_intro = ''.join(nb_json['cells'][0]['source'])

Path('../README.md').write_text(readme_intro + f'''

### [tokens.json](./dataset/tokens.json) and [tokens.csv](./dataset/tokens.csv) &mdash; of all NFTs tokens

There is a confusing fact, that in hicetnunc each NFT can have multiple identical instances, which are fungible.
In this document term "token" refers to the set of all that instances.

There are following invariants:
<pre>mint_count = author_owns_count + available_count + other_own_count + burn_count
author_sent_count <= other_own_count</pre>

{db_fields_schema_to_md(tokens_db_schema)}

### [addrs.json](./dataset/addrs.json) and [addrs.csv](./dataset/addrs.csv) &mdash; of all hicetnunc users

All users, who ever created or owned NFT token.

{db_fields_schema_to_md(address_db_schema)}

### [sells.json](./dataset/sells.json) and [sells.csv](./dataset/sells.csv) &mdash; of all purchases via swaps

There is the following invariant:
<pre>price * count = total_royalties + total_comission + total_seller_income</pre>

{db_fields_schema_to_md(sells_db_schema)}

### [transfers.json](./dataset/transfers.json) and [transfers.csv](./dataset/transfers.csv) &mdash; all token transfers

{db_fields_schema_to_md(transfers_db_schema)}

### [swaps.json](./dataset/swaps.json) and [swaps.csv](./dataset/swaps.csv) &mdash; all swaps ever created

{db_fields_schema_to_md(swaps_db_schema)}

''', 'utf-8')

26088

In [8]:
datasets_fields = {
    'tokens': tokens_db_schema,
    'addrs': address_db_schema,
    'sells': sells_db_schema,
    'transfers': transfers_db_schema,
    'swaps': swaps_db_schema,
}

import json

dataset_dir = Path('../dataset')
(dataset_dir / 'fields_list.json').write_text(
    json.dumps(datasets_fields, ensure_ascii=False, indent=4),
    'utf-8',
)

18458