This repo contains a python script that takes as input a relion star file (containing particles data from various mrc stack files) and combines them into a single mrcs stack file and reindexes the old star file with new ImageName column.
- Parsing the star file: The script reads the Relion star file format, which has a specific structure with data blocks and column headers.
- Extracting image references: It parses the image column (default: 'rlnImageName') which contains references in the format "000123@filename.mrc".
- Loading MRC files: Using the mrcfile library, it loads each referenced image in the correct order.
- Creating a combined stack: It concatenates all images into a single numpy array and saves it as a new MRC stack.
- Updating references: It rewrites the image column with new indices pointing to the combined stack.
python3 combine_mrc_stacks.py --star_file particles.star --input_dir ./mrc_files/ --output_stack combined.mrc --output_star updated.star- Using pandas for data handling: Provides a clean way to manipulate the star file data.
- Regular expressions for parsing: The most flexible way to extract indices and filenames from Relion's reference format.
- mrcfile library: This is a well maintained library specifically for MRC files that handles proper header information.
- Preserving original star file format: Rather than just outputting a basic star file, the script attempts to preserve the original format and just update the relevant column.
- In-memory processing: This approach loads all images into memory before saving. For very large datasets, you might need to modify this to process in batches.