<a href="https://colab.research.google.com/github/sisinflab/DnD4Rec-tutorial/blob/main/Hands_on_part_1_colab.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Standard Practices for Data Processing and Multimodal Feature Extraction in Recommendation with DataRec and Ducho (D&D4Rec) (1st Hands-On Session)

⭐ **The 19th ACM Conference on Recommender Systems** ⭐

*Prague (Czech Republic), September 26th, 2025*


<div>
  <img src="http://github.com/sisinflab/DnD4Rec-tutorial/blob/main/images/DD4Rec-logo.jpg?raw=true" alt="d&d4Rec" width="108">
  <img src="https://recsys.acm.org/wp-content/uploads/2024/10/RecSys2025_website_header.jpg" alt="SisInfLab" width="600">
  <img src="https://recsys.acm.org/wp-content/uploads/2024/10/RecSys2025_logo_transparent.png" alt="recsys" width="200">
</div>

👩 Speaker: [Angela Di Fazio](https://sisinflab.poliba.it/people/angela-di-fazio/)

If you use this code for your experiments, please cite our work 🙏

![GitHub Repo stars](https://img.shields.io/github/stars/sisinflab/DataRec)

 <img src="https://github.com/sisinflab/DnD4Rec-tutorial/blob/main/images/datarec_architecture_nobg.png?raw=true"  width="500">



```
@inproceedings{DBLP:conf/sigir/MancinoBF0MPN25,
  author       = {Alberto Carlo Maria Mancino and
                  Salvatore Bufi and
                  Angela Di Fazio and
                  Antonio Ferrara and
                  Daniele Malitesta and
                  Claudio Pomo and
                  Tommaso Di Noia},
  title        = {DataRec: {A} Python Library for Standardized and Reproducible Data
                  Management in Recommender Systems},
  booktitle    = {{SIGIR}},
  pages        = {3478--3487},
  publisher    = {{ACM}},
  year         = {2025}
}
```

# Introduction

In this hands-on session, we will demonstrate how to use DataRec to manage and process recommendation datasets.

The goal is to provide a practical overview of the main modules, enabling you to apply them in your own research experiments.

## Setup

In this section, we will configure our working environment. We will begin by cloning the repository.

In [4]:
!git clone https://github.com/sisinflab/DataRec.git

Cloning into 'DataRec'...
remote: Enumerating objects: 601, done.[K
remote: Counting objects: 100% (141/141), done.[K
remote: Compressing objects: 100% (123/123), done.[K
remote: Total 601 (delta 63), reused 40 (delta 18), pack-reused 460 (from 1)[K
Receiving objects: 100% (601/601), 4.34 MiB | 16.20 MiB/s, done.
Resolving deltas: 100% (198/198), done.


In [5]:
%cd DataRec

/content/DataRec


Next, we will install the dependencies.

In [6]:
%pip install -r requirements.txt

Collecting py7zr (from -r requirements.txt (line 9))
  Downloading py7zr-1.0.0-py3-none-any.whl.metadata (17 kB)
Collecting appdirs (from -r requirements.txt (line 11))
  Downloading appdirs-1.4.4-py2.py3-none-any.whl.metadata (9.0 kB)
Collecting texttable (from py7zr->-r requirements.txt (line 9))
  Downloading texttable-1.7.0-py2.py3-none-any.whl.metadata (9.8 kB)
Collecting pyzstd>=0.16.1 (from py7zr->-r requirements.txt (line 9))
  Downloading pyzstd-0.17.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (2.5 kB)
Collecting pyppmd<1.3.0,>=1.1.0 (from py7zr->-r requirements.txt (line 9))
  Downloading pyppmd-1.2.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (5.4 kB)
Collecting pybcj<1.1.0,>=1.0.0 (from py7zr->-r requirements.txt (line 9))
  Downloading pybcj-1.0.6-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.7 kB)
Collecting multivolumefile>=0.2.3 (from py7zr->-r requirements.txt (line 9))
  Downloading multivolum

Finally, we can check if everything is working properly.

In [7]:
try:
    import datarec
    print("DataRec imported successfully")
except ImportError:
    print("DataRec import failed")

DataRec imported successfully


  {\&} Knowledge Management, Atlanta, GA, USA, October 17-21, 2022},
  Alejandro Bellog{\'{\i}}n and
  raise FileNotFoundError(f'File \'{file_path}\ not found.')


## I/O Module - Loading Data

In this section, we will explore how to load datasets in DataRec.

### Built-in Datasets

DataRec includes several commonly used recommendation dataset.

| Dataset Name            | Source                                                                 |
|-------------------------|------------------------------------------------------------------------|
| Alibaba iFashion        | https://drive.google.com/drive/folders/1xFdx5xuNXHGsUVG2VIohFTXf9S7G5veq |
| Amazon Baby           | https://amazon-reviews-2023.github.io                                  |
| Amazon Beauty           | https://amazon-reviews-2023.github.io                                  |
| Amazon Books            | https://amazon-reviews-2023.github.io/                                 |
| Amazon Clothing         | https://amazon-reviews-2023.github.io/                                 |
| Amazon Music           | https://amazon-reviews-2023.github.io                                  |
| Amazon Office           | https://amazon-reviews-2023.github.io                                  |
| Amazon Sports and Outdoors | https://amazon-reviews-2023.github.io/                                 |
| Amazon Toys and Games  | https://amazon-reviews-2023.github.io/                                 |
| Amazon Video Games      | https://amazon-reviews-2023.github.io/                                 |
| Ciao | https://guoguibing.github.io/librec/datasets.html                      |
| Epinions                | https://snap.stanford.edu/data/soc-Epinions1.html                      |
| Gowalla                 | https://snap.stanford.edu/data/loc-gowalla.html                                          |
| LastFM                  | https://grouplens.org/datasets/hetrec-2011/                                                                |
| MovieLens               | https://grouplens.org/datasets/movielens/                                                |
| Tmall                       | https://tianchi.aliyun.com/dataset/53?t=1716541860503                                        |
| Yelp                        | https://www.yelp.com/dataset      |


Built-in datasets can be loaded with just a single line of code.

In [8]:
from datarec.datasets import AmazonOffice


data = AmazonOffice(version='2023').prepare_and_load()
print(data)


Raw files folder missing. Folder created at '/root/.cache/datarec/AmazonOffice/raw'
Downloading data file from https://mcauleylab.ucsd.edu/public_datasets/data/amazon_2023/benchmark/0core/rating_only/Office_Products.csv.gz


100%|██████████| 328M/328M [01:42<00:00, 3.20Mbyte/s]


File downloaded successfully and saved at '/root/.cache/datarec/AmazonOffice/raw/Office_Products.csv.gz'
Checksum verified.
Decompress: '/root/.cache/datarec/AmazonOffice/raw/Office_Products.csv.gz'
File decompressed: '/root/.cache/datarec/AmazonOffice/raw/Office_Products.csv'
                               user_id     item_id  rating      timestamp
0         AFKZENTNBQ7A7V7UXW5JJI6UGRYQ  B079HQK8HG     5.0  1534443278981
1         AFKZENTNBQ7A7V7UXW5JJI6UGRYQ  B0BNNDBH81     5.0  1534444231358
2         AFKZENTNBQ7A7V7UXW5JJI6UGRYQ  B0C3R84FLG     1.0  1561156591217
3         AFKZENTNBQ7A7V7UXW5JJI6UGRYQ  B098K24779     5.0  1589933892398
4         AFKZENTNBQ7A7V7UXW5JJI6UGRYQ  B07ZPB8T4P     1.0  1611642164889
...                                ...         ...     ...            ...
12689344  AFD7M22ZYOV6HZ6GNVZZQUZH4R4A  B0030INLF0     1.0  1447080556000
12689345  AGYDA2QY4QVZB5TR6TUO2PVXWORQ  B07HT28KGD     3.0  1617820460972
12689346  AHOKC2PQ4PQ3CPVLLMMIVTM5DJGQ  B0030INLF0     4

#### Versioning

When a dataset has multiple versions, the library handles versioning automatically.

In [9]:
from datarec.datasets import AmazonOffice

data = AmazonOffice(version='2014').prepare_and_load()
print(data)

Downloading data file from http://snap.stanford.edu/data/amazon/productGraph/categoryFiles/ratings_Office_Products.csv


100%|██████████| 50.6M/50.6M [00:03<00:00, 13.2Mbyte/s]


File downloaded successfully and saved at '/root/.cache/datarec/AmazonOffice/raw/ratings_Office_Products.csv'
Checksum verified.
                user_id     item_id  rating   timestamp
0        A2UESEUCI73CBO  0078800242     5.0  1374192000
1        A3BBNK2R5TUYGV  0113000316     5.0  1359417600
2         A5J78T14FJ5DU  0113000316     3.0  1318723200
3        A2P462UH5L6T57  043928631X     5.0  1356912000
4        A2E0X1MWNRTQF4  0439340039     1.0  1379721600
...                 ...         ...     ...         ...
1243181  A23UIUEO9PVK1G  B00LQZE2KM     5.0  1405728000
1243182  A3W0IG7LIY99CK  B00LSYQC7C     5.0  1405296000
1243183   AX0OWSOOJA1ZJ  BT008G9O8G     5.0  1392854400
1243184  A17R4GQHT3RW2R  BT008G9O9A     5.0  1365897600
1243185  A12TWUP92WJJHB  BT008G9O9A     3.0  1294963200

[1243186 rows x 4 columns]


#### Dataset Info

Each DataRec object contains the basic statistics of the dataset.

In [10]:
print(f'{data.dataset_name} {data.version_name} has:\n\n'
      f'{data.n_users} users\n'
      f'{data.n_items} items\n'
      f'{data.transactions} ratings\n')

print(f'=== {data.dataset_name} {data.version_name} metrics === \n')
for metric in data.metrics:
    print(metric, data.__getattribute__(metric))


AmazonOffice 2014 has:

909314 users
130006 items
1243186 ratings

=== AmazonOffice 2014 metrics === 

transactions 1243186
space_size 343.8259383525333
space_size_log 2.5363386369189533
shape 6.99440025844961
shape_log 0.8447504814279222
density 1.0516199996182244e-05
density_log -4.97814116311223
gini_item 0.7849995387371369
gini_user 0.23947608424872227
ratings_per_user 1.367169096703669
ratings_per_item 9.562527883328462


### Custom Datasets

DataRec can also handle custom datasets.

In [11]:
import pandas as pd
from datarec.io import read_tabular
from datarec.data.dataset import DataRec


df = pd.DataFrame({
    "user": ["u1", "u1", "u2", "u3"],
    "item": ["i1", "i2", "i2", "i3"],
    "rating": [5, 3, 4, 2],
    "timestamp": [1694300000, 1694300500, 1694310000, 1694320000],
})

df_path = '../dummy.csv'
df.to_csv(df_path, index=False)


raw = read_tabular(
    filepath=df_path,
    sep=",",
    user_col="user",
    item_col="item",
    rating_col="rating",
    timestamp_col="timestamp"
)

dummy_dr = DataRec(rawdata=raw, dataset_name='dummy', version_name='1')

print(f'==== {dummy_dr.dataset_name} {dummy_dr.version_name} ===\n')
print(dummy_dr)

print(f'\n{dummy_dr.dataset_name} {dummy_dr.version_name} has:\n\n'
      f'{dummy_dr.n_users} users\n'
      f'{dummy_dr.n_items} items\n'
      f'{dummy_dr.transactions} ratings\n')

print(f'==== {dummy_dr.dataset_name} {dummy_dr.version_name} metrics ===\n')

for metric in dummy_dr.metrics:
    print(metric, dummy_dr.__getattribute__(metric))

==== dummy 1 ===

  user_id item_id  rating   timestamp
0      u1      i1       5  1694300000
1      u1      i2       3  1694300500
2      u2      i2       4  1694310000
3      u3      i3       2  1694320000

dummy 1 has:

3 users
3 items
4 ratings

==== dummy 1 metrics ===

transactions 4
space_size 0.003
space_size_log -2.5228787452803374
shape 1.0
shape_log 0.0
density 0.4444444444444444
density_log -0.35218251811136253
gini_item 0.16666666666666666
gini_user 0.16666666666666666
ratings_per_user 1.3333333333333333
ratings_per_item 1.3333333333333333


## Processing Module

We now move to the data preprocessing phase. The library provides several data processing operations.

In [12]:
from datarec.processing.rating import FilterOutDuplicatedInteractions

filter_duplicate = FilterOutDuplicatedInteractions()
data_no_duplicate = filter_duplicate.run(data)
print(data_no_duplicate)

print(f"Filtered dataset: {data_no_duplicate.n_users} users, {data_no_duplicate.n_items} items, {data_no_duplicate.transactions} interactions.")

Running filter-out duplicated interactions with strategy first
Filtering DataRec: AmazonOffice
                       user_id     item_id  rating   timestamp
883414   A00001483M88NBD66LEP0  B004WPCQKG     1.0  1353283200
60717     A0001028APITAYQ44NF3  B00006IDP4     5.0  1396310400
1085080  A0002382258OFJJ2UYNTR  B0090684TE     5.0  1358380800
254198   A00031441FXF9AOR9AJK2  B000IV32JW     1.0  1389139200
826647   A00034361UHTFM5E7KU8Q  B004HDY822     5.0  1356739200
...                        ...         ...     ...         ...
1000212          AZZYW4YOE1B6E  B0071EZEEK     5.0  1388534400
1194737          AZZYW4YOE1B6E  B00DG6EGKK     5.0  1404172800
1105634          AZZZ3LGTCGUZF  B009KBGU7M     5.0  1361491200
1105872          AZZZ3LGTCGUZF  B009KBLYEG     5.0  1361491200
922063           AZZZZXVAOWWME  B005D5M12M     1.0  1362182400

[1243186 rows x 4 columns]
Filtered dataset: 909314 users, 130006 items, 1243186 interactions.


In [13]:
from datarec.processing.kcore import UserItemIterativeKCore

k_core_filter = UserItemIterativeKCore(cores=5)
data_filtered = k_core_filter.run(data_no_duplicate)
print(data_filtered)

print(f"Dataset after iterative k-Core: {data_filtered.n_users} users, {data_filtered.n_items} items, {data_filtered.transactions} interactions.")

                       user_id     item_id  rating   timestamp
87024    A00473363TJ8YSZ3YAGG9  B00007E7D2     2.0  1387843200
800137   A00473363TJ8YSZ3YAGG9  B004APM26Q     5.0  1357430400
1004543  A00473363TJ8YSZ3YAGG9  B0073W70BK     4.0  1357430400
1037945  A00473363TJ8YSZ3YAGG9  B007ZYF266     4.0  1387843200
1188324  A00473363TJ8YSZ3YAGG9  B00D51XMLU     4.0  1387843200
...                        ...         ...     ...         ...
309918           AZZD30PYJVGI7  B000SAF07K     4.0  1368748800
334095           AZZD30PYJVGI7  B000VKUXHY     5.0  1370563200
364414           AZZD30PYJVGI7  B00125Q75Y     5.0  1359417600
832889           AZZD30PYJVGI7  B004I9CZ2U     5.0  1364774400
882217           AZZD30PYJVGI7  B004W7IOV4     5.0  1359417600

[53258 rows x 4 columns]
Dataset after iterative k-Core: 4905 users, 2420 items, 53258 interactions.


## Splitting Module

After preprocessing, we need to split the dataset into training, validation, and test sets.

In [14]:
from datarec.splitters import RandomHoldOut

splitter = RandomHoldOut(test_ratio=0.2, val_ratio=0.1, seed=42)
split_result = splitter.run(data_filtered)

train_data = split_result['train']
val_data = split_result['val']
test_data = split_result['test']

print(f'=== Train ===\n\n'
      f'{train_data}\n\n'
      f'=== Validation ===\n\n'
      f'{val_data}\n\n'
      f'=== Test ===\n\n'
      f'{test_data}\n\n')

print(f"Train set has: {train_data.n_users} users, {train_data.n_items} items, {train_data.transactions} interactions.")
print(f"Validation set has: {val_data.n_users} users, {val_data.n_items} items, {val_data.transactions} interactions.")
print(f"Test set has: {test_data.n_users} users, {test_data.n_items} items, {train_data.transactions} interactions.")

=== Train ===

                user_id     item_id  rating   timestamp
1177236  A17CP110C6E9KF  B00CLV8ZIU     3.0  1384387200
469690    AMS2CPERWN7JV  B001GBKTGM     5.0  1229904000
722610   A3H9FJL67HJA3D  B003O3EYTI     4.0  1404086400
874679   A2974R9BTPZPOJ  B004TQ0O66     5.0  1313452800
154849   A1FK6IQ111SJDR  B0006SV7Q2     5.0  1376006400
...                 ...         ...     ...         ...
1225446   ACJ9N7ED37HXS  B00G2UD2P2     5.0  1393977600
278227   A13E849LQCS1BN  B000MFHX3U     4.0  1335484800
844599   A1PI8VBCXXSGC7  B004M23XH4     4.0  1299110400
906072    AOTY596BG2YX7  B0052YQCGK     5.0  1382745600
1043939  A3T1LD0C65QCWK  B0085IPZGS     3.0  1349308800

[38345 rows x 4 columns]

=== Validation ===

                user_id     item_id  rating   timestamp
27985    A3M174IC0VXOS2  B0000538AC     5.0  1290816000
570316   A1V31KX83H4M18  B002E9E358     5.0  1277769600
765196   A1XKS19IPXNWLK  B0040FFNXK     5.0  1404691200
1150880  A1BJOHHLG0D965  B00B80SOWM     5.

## I/O Module - Exporting Results

Once processed, datasets can be exported in various formats (tabular, framework-compatible, etc.).

In [15]:
from datarec.io import write_tabular

file_path = '../reviews_Office_Products_5.tsv'
write_tabular(data_filtered.to_rawdata(), path=file_path, sep='\t', header=False, timestamp=False)

A dataset has been stored at '../reviews_Office_Products_5.tsv'


In [16]:
%cat ../reviews_Office_Products_5.tsv

[1;30;43mOutput streaming troncato alle ultime 5000 righe.[0m
AM6X6BHEO5U19	B000SAF07K	4.0
AM6X6BHEO5U19	B0076BXEVI	5.0
AM78WFHEBDBGM	B00006I58N	5.0
AM78WFHEBDBGM	B00006RSP1	5.0
AM78WFHEBDBGM	B0000AQNKL	5.0
AM78WFHEBDBGM	B0002ABB6K	5.0
AM78WFHEBDBGM	B000GFV00I	1.0
AM78WFHEBDBGM	B000J0B9OW	5.0
AM78WFHEBDBGM	B000VXO4L2	4.0
AM78WFHEBDBGM	B0010Z3LGO	1.0
AM78WFHEBDBGM	B001RU18D0	1.0
AM78WFHEBDBGM	B002AMW2ZC	5.0
AM78WFHEBDBGM	B002FB63EO	1.0
AM78WFHEBDBGM	B004I2GFNC	5.0
AM78WFHEBDBGM	B004TGHIS8	1.0
AM78WFHEBDBGM	B007P8DKZ2	5.0
AM78WFHEBDBGM	B00CPXCVIO	5.0
AM8J1UVGVYAL4	B00006IFKU	4.0
AM8J1UVGVYAL4	B0009F3P3U	4.0
AM8J1UVGVYAL4	B000ZHB2HS	4.0
AM8J1UVGVYAL4	B001CE3AQY	5.0
AM8J1UVGVYAL4	B001JQLHS8	3.0
AM8J1UVGVYAL4	B001PME0VM	4.0
AM8J1UVGVYAL4	B00347A85S	5.0
AM8J1UVGVYAL4	B00347A8GC	5.0
AM8J1UVGVYAL4	B004DM8VKW	5.0
AM8J1UVGVYAL4	B004YGBIVQ	3.0
AM8J1UVGVYAL4	B0052S3ACK	5.0
AM8J1UVGVYAL4	B00AQ2BZZQ	3.0
AM8J1UVGVYAL4	B00BMLQSNO	5.0
AM8J1UVGVYAL4	B00COHAJXA	3.0
AM8J1UVGVYAL4	B00E3KP7HO	3.0
AM8J1UVG

In this case, we choose to export the dataset for the Elliot framework.

In [17]:
from datarec.io import FrameworkExporter

elliot_path = '../elliot'
exporter = FrameworkExporter(output_path=elliot_path)
exporter.to_elliot(train_data=train_data, test_data=test_data, val_data=val_data)

A dataset has been stored at '/content/elliot/train.tsv'
A dataset has been stored at '/content/elliot/test.tsv'
A dataset has been stored at '/content/elliot/validation.tsv'
If you are going to use Elliot don't forget to cite the paper!
Paper: 'Elliot: a Comprehensive and Rigorous Framework for Reproducible Recommender Systems Evaluation'
DOI: https://doi.org/10.1145/3404835.3463245
Bib text from dblp.org:
 
            @inproceedings{DBLP:conf/sigir/AnelliBFMMPDN21,
              author       = {Vito Walter Anelli and
                              Alejandro Bellog{'{\i}}n and
                              Antonio Ferrara and
                              Daniele Malitesta and
                              Felice Antonio Merra and
                              Claudio Pomo and
                              Francesco Maria Donini and
                              Tommaso Di Noia},
              editor       = {Fernando Diaz and
                              Chirag Shah and
            

DataRec also provides a basic configuration file for the Elliot framework.

In [18]:
%cd ../elliot
%cat datarec_config.yml

/content/elliot

experiment:
  dataset: datarec2elliot
  data_config:
    strategy: dataset
    dataset_path: /content/elliot/elliot
  splitting:
    strategy: fixed
    train_path: /content/elliot/train.tsv
    validation_path: /content/elliot/validation.tsv
    test_path: /content/elliot/test.tsv
  models:
    ItemKNN:
      meta:
        hyper_opt_alg: grid
        save_recs: True
      neighbors: [50, 100]
      similarity: cosine
  evaluation:
    simple_metrics: [nDCG]
  top_k: 10

### Automatic Pipeline

DataRec allows you to record all operations performed on a dataset into a pipeline, which can later be re-executed.

In [19]:
print(f"Pipeline length: {len(train_data.pipeline.steps)}\n")

config_filepath = "../experiment_config.yaml"
print(f"Saving pipeline configuration in '{config_filepath}'...")
train_data.save_pipeline(config_filepath)
print(f"Pipeline saved in '{config_filepath}'")

Pipeline length: 5

Saving pipeline configuration in '../experiment_config.yaml'...
Saving pipeline to ../experiment_config.yaml
Pipeline correctly saved to ../experiment_config.yaml
Pipeline saved in '../experiment_config.yaml'


In [20]:
%cat ../experiment_config.yaml

pipeline:
- name: load
  operation: AmazonOffice
  params:
    version: '2014'
- name: process
  operation: FilterOutDuplicatedInteractions
  params:
    keep: first
    random_seed: 42
- name: process
  operation: UserItemIterativeKCore
  params:
    cores: 5
- name: split
  operation: RandomHoldOut
  params:
    seed: 42
    test_ratio: 0.2
    val_ratio: 0.1
- name: export
  operation: Elliot
  params:
    item: true
    output_path: ../elliot
    rating: true
    timestamp: false
    user: true


We can now try re-running the saved processing using the configuration file.

In [21]:
import os
from datarec.pipeline.pipeline import Pipeline

os.rename('train.tsv', 'original_train.tsv')
os.rename('test.tsv', 'original_test.tsv')
os.rename('validation.tsv', 'original_validation.tsv')

print(f"Loading pipeline from '{config_filepath}'...")
repro_pipeline = Pipeline.from_yaml(config_filepath)
print("Loading completed.")
print(f"Number of pipeline steps to reproduce: {len(repro_pipeline.steps)}")

repro_pipeline.apply()

Loading pipeline from '../experiment_config.yaml'...
Loading completed.
Number of pipeline steps to reproduce: 5


 --- Reproducing Pipeline --- 


Pipeline step load.
Loading dataset: <class 'datarec.datasets.amazon_office.amz_office.AmazonOffice'>.
Checksum verified.
Pipeline step process.
Applying <class 'datarec.processing.rating.FilterOutDuplicatedInteractions'>.
Running filter-out duplicated interactions with strategy first
Filtering DataRec: AmazonOffice
Pipeline step process.
Applying <class 'datarec.processing.kcore.UserItemIterativeKCore'>.
Pipeline step split.
Applying <class 'datarec.splitters.uniform.hold_out.RandomHoldOut'>.
Pipeline step export.
Exporting dataset for Elliot.
A dataset has been stored at '/content/elliot/train.tsv'
A dataset has been stored at '/content/elliot/test.tsv'
A dataset has been stored at '/content/elliot/validation.tsv'
If you are going to use Elliot don't forget to cite the paper!
Paper: 'Elliot: a Comprehensive and Rigorous Framework for Repr

And check the results.

In [22]:
original = "original_train.tsv"
reproduced = "train.tsv"

with open(original, "rb") as fo, open(reproduced, "rb") as fr:
    identical = fo.read() == fr.read()

print("Files are identical:", identical)

Files are identical: True


## Resources

### DataRec

[DataRec Repository](https://github.com/sisinflab/DataRec)

[DataRec Documentation Website](https://sisinflab.github.io/DataRec/)

[DataRec Original Paper](https://dl.acm.org/doi/10.1145/3726302.3730320)

### Tutorial

[D&D4Rec Repository](https://github.com/sisinflab/DnD4Rec-tutorial)

[D&D4Rec Website](https://sites.google.com/view/dd4rec-tutorial/home)


## Authors

- Alberto Carlo Maria Mancino (alberto.mancino@poliba.it)
- Salvatore Bufi (salvatore.bufi@poliba.it)
- Angela Di Fazio (angela.difazio@poliba.it)
- Daniele Malitesta (daniele.malitesta@centralesupelec.fr)
- Antonio Ferrara (antonio.ferrara@poliba.it)
- Claudio Pomo (claudio.pomo@poliba.it)
- Tommaso Di Noia (tommaso.dinoia@poliba.it)