# Adding a Public Dataset to the Library

## What you will learn in this tutorial:

- how to add a new dataset to pymovements dataset library, so that it can be accessed easily
- how to test if the dataset integration works properly

**This tutorial is offered on three levels. Pick the one that corresponds to your prior knowledge:**

- [**Basic**](#basic): You have never used pymovements before and don't have any programming experience  
  → You will learn how to create an issue that will allow pymovement maintainers to add your dataset

- [**Intermediate**](#intermediate): You have some experience with programming, but are not familiar with Git  
  → You will learn how to create a pull request with a draft of your dataset definition file

- [**Advanced**](#advanced): You are proficient in Python and familiar with Git  
  → You will learn how to add and test your dataset definition on your local machine

## Prerequisites: hosting your dataset

**pymovements does not _host_ datasets**, it only provides an interface for downloading and reading them. Therefore, you will need to upload your dataset somewhere, such that pymovements will be able to download it.

- Your data must be openly available and downloadable from a simple link without requiring additional steps like logging in. We recommend [OSF](https://osf.io/) for hosting your files, but other platforms like Zenodo or GitHub will also work.
- Your data must be stored in one of the supported formats: CSV, ASC (EyeLink), IPC/Feather.
- Your dataset may consist of multiple files, including ZIP files containing nested folders.
- Trial information (e.g., trial or participant IDs) may be stored as additional columns in the data files, in the filenames, or as messages in ASC files.

## Basic

In order to add your dataset, we will need some information on where and in what format you stored your data, as well as some metadata about your data collection. Specifically, we need:

- [ ] Links to your data files (containing sample-based data, event-based data, and/or aggregated measures)
- [ ] Information on where/how participant IDs, trial IDs, and related data are stored (within the data files, or in the filename)
- [ ] Information on the screen you used to present the stimuli:
  - [ ] Screen size in centimeters
  - [ ] Screen resolution in pixels
  - [ ] Eye-to-screen distance
- [ ] Information on the eye-tracker you used:
  - [ ] Model and manufacturer
  - [ ] Sampling rate
  - [ ] Where the origin (0, 0) of the gaze coordinates recorded by the eye-tracker is (e.g., top left of screen, center of screen)
- [ ] Any paper(s) you would like to be referenced by users of your dataset

Once you have all the information, you can [create an issue in the pymovements repository on GitHub](https://github.com/pymovements/pymovements/issues/new?template=DATASET.md). You need a GitHub account to do this. If you prefer not creating a GitHub account, please send the information above to pymovements@python.org instead.

After receiving your information, the we will start working to include your dataset. It is likely that we will need some additional information from you, so please keep an eye on the GitHub issue. Once the inclusion is completed, your dataset will be included in the next release of pymovements. This process may take several weeks.

## Intermediate



## Advanced

