# RevChem

Have you ever wanted to educate young minds with visual media? Are you a science instructor?
Well look no further, the RevChem project was made with you in mind!

## Purpose
`RevChem` is the software library behind the data processing of data exported from RealEye and Tobii systems.
We are sharing our tools in the spirit of transparency, scientific reproducibility, and friendship; to augment your efforts working with the timeseries aspects of video data that may inform and impower your work as an educator / curriculum designer

## What's inside
- `RevChem/`: The portable source code library that would be used if you wanted to run your own analyses. This is what is installed when you install this project
- `nbs/`: the directory of notebooks that were this project. Some export code, others simply demonstrate the work we are trying to do.

TODO: import the better README from the old codebase# RevChem

## Initial Purpose

RevChem is a Python software library.
It's designed for processing and analyzing eye-tracking data obtained from RealEye and Tobii eye-tracking systems.
The primary goal of this project is to support chemistry educators and curriculum designers.
It helps them understand how students engage with visual media, such as instructional videos and chemical representations.
By providing tools to analyze gaze patterns, fixations, and areas of interest (AOIs), RevChem aims to offer insights into student attention and cognitive processes during learning.

This project is shared in the spirit of transparency, scientific reproducibility, and collaboration.
It's hoped to empower educators to make data-informed decisions for curriculum improvement.

## Code Structure

The RevChem project is organized as follows:

*   **`RevChem/`**: This directory contains the core Python library.
    It includes modules for common utility functions (`common.py`), processing RealEye data (`realeye.py`), and handling Tobii data including alignment with RealEye data (`tobii.py`).
    This is the package that gets installed when you use the project.
*   **`nbs/`**: This directory houses Jupyter notebooks that are central to the project's development and documentation.
    These notebooks are used for:
    *   Literate programming via `nbdev` to develop the `RevChem` library.
    *   Data exploration and initial analysis of eye-tracking datasets.
    *   Documenting the data processing workflow, from raw data ingestion to cleaned, analyzable datasets.
    *   Showcasing key analyses, findings, and addressing challenges like timeseries synchronization.
    *   Key notebooks include explorations of Tobii and RealEye data, alignment strategies, and control trial analyses.
    *   `03_narrative.ipynb` is intended for capturing research questions and their interpretations thus far.
*   **`_proc/`**: Contains additional Jupyter notebooks used for specific, often earlier, data processing tasks and experiments.
*   **`main.py`**: A basic script, currently serving as a placeholder and not a primary functional entry point.
*   **`settings.ini`**: Configuration file for the `nbdev` project, containing metadata and build settings.
*   **`pyproject.toml` and `setup.py`**: Define project dependencies and packaging information.
    (Note: This project uses a mix of both, though `pyproject.toml` is the more modern standard for dependency listing).

## Usage of nbdev

This project utilizes `nbdev` for literate programming.
The core library code in the `RevChem/` directory is primarily developed and exported from the Jupyter notebooks located in the `nbs/` directory.
Code cells marked with `#| export` in the notebooks are automatically transferred to the corresponding Python modules in the `RevChem` library during the build process.
This approach allows for clear documentation and code development within a single environment.
(Note: This can sometimes lead to perceived code duplication between notebooks and the library, which is a characteristic of the `nbdev` workflow).

## Data

RevChem is designed to work with eye-tracking data, including:
*   Gaze points (X, Y coordinates)
*   Timestamps
*   Fixations
*   Areas of Interest (AOIs)
*   Video interaction data

The data is sourced from two main eye-tracking platforms:
*   **RealEye**: Exports data as a raw string within CSV files.
    This string typically contains a list of 6-tuples: `(gaze_X, gaze_Y, time_ms_since_start, scroll_Y_offset, mouse_X, mouse_Y)`.
    A 7th element may be present if a mouse click occurred.
*   **Tobii**: Exports data as Tab-Separated Value (TSV) files, often per participant per stimulus, containing detailed columnar data.
    Participant-level files can have different schemas than aggregated exports.

A significant aspect of the RevChem library is its functionality for parsing these varied formats.
Crucially, it also aligns and synchronizes the timeseries data from RealEye and Tobii systems.
These systems operate at different sampling frequencies (RealEye at ~30Hz, Tobii at higher frequencies like 120Hz or 250Hz).

## Scientific Contributions

The RevChem project aims to make the following scientific contributions to the Chemistry Education community:

*   **Enhanced Data Analysis Tools**: Provides robust tools for processing complex eye-tracking datasets from multiple platforms.
    It addresses challenges such as data parsing, timeseries alignment, and synchronization.
    This enables more sophisticated analysis of how students visually engage with chemistry-related materials.
*   **Transparency and Reproducibility**: By sharing the codebase and documenting the methodologies within Jupyter notebooks, RevChem promotes transparency and reproducibility.
    This is crucial for eye-tracking research within chemistry education.
*   **Insights into Student Learning**: Facilitates deeper investigation into student attention patterns, cognitive load, and problem-solving strategies.
    This applies when students interact with visual stimuli (e.g., chemical animations, diagrams, textual explanations).
*   **Curriculum and Instructional Design**: Empowers educators and curriculum designers with data-driven insights to:
    *   Identify effective (and ineffective) visual elements in instructional materials.
    *   Understand how different representations of chemical concepts impact student focus and comprehension.
    *   Refine instructional strategies and visual media to better support student learning.
*   **Methodological Advancement**: Contributes to the methodology of using eye-tracking in educational research.
    It provides practical solutions for common data processing hurdles and showcases analytical approaches.
    The project specifically addresses the challenge of reconciling data from different eye-tracking systems (webcam-based like RealEye and hardware-based like Tobii).

Key findings and narratives explored within the project's notebooks include the intricacies of timestamp interpretation between RealEye and Tobii.
For example, RealEye's `test_created_at` potentially being an *end-time* rather than a start-time for a segment is a key finding.
The impact of data export formats on analysis pipelines and the validation of data quality through control trials are also documented.
These explorations highlight the careful considerations needed when working with multi-modal, high-frequency data in educational contexts.

## Citation

If you use RevChem in your research, please cite it as follows:

```bibtex
@misc{RevChem_2024,
  author       = {Stephen J. Fox, [Other Authors, if any]},
  title        = {RevChem: A Python library for processing and analyzing eye-tracking data in chemistry education},
  year         = {2024},
  publisher    = {GitHub},
  journal      = {GitHub repository},
  howpublished = {\\url{https://github.com/stephenjfox/RevChem}}
}
```
(Please update with actual publication details or a more formal citation if/when available.)

## How to Get Setup Locally

1.  **Clone the Repository:**
    ```bash
    git clone https://github.com/stephenjfox/RevChem.git
    cd RevChem
    ```

2.  **Set up Environment & Install Dependencies:**
    It's recommended to use a virtual environment (e.g., `venv` or `conda`).
    The project uses `uv` for fast package management, but `pip` can also be used.
    Dependencies are listed in `pyproject.toml`.

    Using `uv` (recommended):
    ```bash
    # Install uv if you haven't already: https://github.com/astral-sh/uv
    uv venv
    source .venv/bin/activate
    uv pip install -e . # Editable install including dependencies
    ```

    Using `pip` and `venv`:
    ```bash
    python -m venv .venv
    source .venv/bin/activate
    pip install -e . # Editable install including dependencies
    ```
    Key dependencies include `polars`, `nbdev`, `jupyter`, `plotnine`, `seaborn`, `fastcore`, and `json5`.

3.  **Using the Library and Notebooks:**
    *   You can import modules from `RevChem` in Python scripts or other notebooks (e.g., `from RevChem.common import group_by`).
    *   The Jupyter notebooks in the `nbs/` directory can be run using Jupyter Lab or Jupyter Notebook.
        This allows exploration of the data processing and analysis steps.
        ```bash
        jupyter lab
        ```
