Memecoin Launch Trace Dataset

This repository contains the dataset and code for the submission: "MELT: A Behavioral Trace Dataset for High-Risk Memecoin Launch Detection"

Environment

Python 3.9
Install packages using:

pip install -r requirements.txt

Part 1: Feature Generation

Step 1: Download Dataset

Since the raw transaction data is very huge (>1TB), we only provide the parsed transaction datasets on Google Drive:

pre_migration_tx.zip — Pre-migration (bonding curve) transactions. Required for feature generation. Download and unzip under MELT/data/tx/ (so the parsed transactions sit under data/tx/...).
bundle.zip — Bundle trace data. Required for feature generation. Download and unzip under MELT/data/ (expands into data/bundle/).
post_migration_tx.zip — Post-migration (Raydium DEX) transactions. Optional, very large, not used by feature generation. Only download if you want to do your own post-migration analysis.

You can skip Step 2 and download our pre-generated feature.pkl directly. Place it at MELT/data/feat/feature.pkl and proceed to Part 2. In this case neither pre_migration_tx.zip nor bundle.zip is needed.

Step 2: Feature Generation

cd MELT/src
python feat_gen.py

This generates data/feat/feature.pkl from the pre-migration transactions, bundle trace data, and contextual information.

Part 2: High-risk Launch Detection

Step 1: Train a model

cd MELT/src
python train.py --model rf

--model accepts any of: rf, xgb, lgbm, lr, mlp, or tcn, lstm, gru, transformer (time-series models). Prediction CSVs are written to MELT/results/{model}_pred_*.csv.

Common flags:

flag	default	applies to
`--model`	`xgb`	all
`--epochs`	20	DNN models
`--batch_size`	256	DNN models
`--lr`	1e-3	DNN models
`--seed`	42	all (Python `random`, numpy, torch, sklearn `random_state`, DataLoader shuffle)

train.py only reports AUPRC (threshold-free) and dumps per-run prediction probabilities to results/. Threshold-based metrics (precision / recall / F1) and ensembling are done in the next step.

Step 2: Evaluate predictions

evaluate.py reads a prediction CSV from Step 1 and prints AUPRC plus a classification_report at one or more probability thresholds.

# single threshold
python evaluate.py --csv lgbm_pred_0.559999.csv --thresholds 0.5

# multi-threshold sweep
python evaluate.py --csv lgbm_pred_0.559999.csv --thresholds 0.3 0.4 0.5 0.6

flag	default	role
`--csv`	—	prediction CSV (relative paths resolve against `results/`)
`--thresholds`	`[0.49]`	one or more probability cutoffs; one `classification_report` per threshold

`src/`

file	role
`feat_gen.py`	Generates `data/feat/feature.pkl` from parsed transactions, bundle traces, and contextual info.
`dataset.py`	Data loading & preprocessing. Reads `feature.pkl` + label CSV, merges by `mint_address`, splits & scales. Exposes `load_dataset()`, `TSDataset`, `ts_collate`.
`model.py`	All model definitions and factories (sklearn baselines + MLP / TS deep models).
`train.py`	Training entry point. Argparse-driven; reports AUPRC and writes per-run prediction CSVs to `results/`.
`evaluate.py`	Evaluates prediction CSVs from `train.py` at one or more thresholds; supports weighted ensembling of multiple CSVs.

License

This project is released under the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) license. See LICENSE for the full text.

You are free to use, share, and adapt the material for non-commercial purposes, provided you give appropriate credit (please cite our paper). Commercial use requires separate permission from the authors.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Memecoin Launch Trace Dataset

Environment

Part 1: Feature Generation

Step 1: Download Dataset

Step 2: Feature Generation

Part 2: High-risk Launch Detection

Step 1: Train a model

Step 2: Evaluate predictions

`src/`

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
data		data
results		results
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Memecoin Launch Trace Dataset

Environment

Part 1: Feature Generation

Step 1: Download Dataset

Step 2: Feature Generation

Part 2: High-risk Launch Detection

Step 1: Train a model

Step 2: Evaluate predictions

src/

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`src/`

Packages