This repository contains scripts used to scrape and process data from resh.edu.ru. More information and the dataset can be found here.
A version of the dataset for causal language modeling will be coming in a few days, but only the raw version is available for now.