# Statistical project – correlations across events in speedcubing

In this project, my goal is to investigate the correlations between different events in speedcubing. The data is sourced from the WCA (World Cube Association) database, which includes results from official competitions.

## Introduction to speedcubing

Have you ever solved a Rubik's Cube? If so, you probably found it quite a challenge. For some, simply solving it is not enough—they strive to solve it faster, learn new methods, algorithms, and find more efficient solutions.

There are many types of twisty puzzles, not just the classic 3x3x3 cube, but also 2x2x2, 4x4x4, 5x5x5, and more.

The people who solve these puzzles are called speedcubers. They compete in official competitions organized by the [World Cube Association](https://www.worldcubeassociation.org) (WCA), which oversees all official speedcubing competitions and maintains the official records.

## Competition rules

When registering for a competition, competitors can choose which events they want to enter. The events include:
- 3x3 Cube
- 2x2 Cube
- 4x4 Cube
- 5x5 Cube
- 6x6 Cube
- 7x7 Cube
- Pyraminx
- Skewb
- Megaminx
- Clock
- Square-1
- 3x3 One-handed
- 3x3 Blindfolded
- 3x3 Fewest Moves
- 3x3 Multi-Blind
- 4x4 Blindfolded
- 5x5 Blindfolded

More on these events later.

Not every competition has to include all events. The organizers can omit events due to time, capacity or other limitations.

For most events, each round consists of 5 attempts. Each round gives the competitor two results: single and average. The single is the fastest time of the five attempts. The average is calculated from the three best solves out of five attempts (excluding the fastest and slowest times).

Before starting an attempt, the competitor has 15 seconds to inspect the puzzle. During this time, they can plan their first moves but cannot turn the puzzle. After inspection, the competitor starts the timer by placing both hands on sensors. When they lift their hands, the timer starts and the solve begins. After finishing, they stop the timer by placing both hands back on the sensors.

Here's a video showing a competition solve: [World Record [former] - 4.73 seconds - Feliks Zemdegs](https://www.youtube.com/watch?v=R07JiT0PlcE).

## Glossary

Let's go over some key terms to understand the rest of this paper:

- **3x3, 4x4, ..., NxN**: Notation for cubes with N layers, shorthand for "NxN Rubik's Cube." For example, a 3x3 has 3 layers, a 4x4 has 4, and so on.
- **Edge**: A piece with two colors, located between the corners on the cube. On a 3x3, there are 12 edge pieces.
- **Corner**: A piece with three colors, located at the corners of the cube. On a 3x3, there are 8 corner pieces.
- **Center**: A piece with one color, located in the center of each face. On a 3x3, centers are fixed; on larger cubes, centers must be solved. 2x2 does not have any centers.
- **Algorithm**: A sequence of moves designed to achieve a specific result, such as swapping or rotating pieces.
- **Method**: A structured approach or set of steps for solving a puzzle.
- **Orientation**: The process of turning pieces so their colored stickers face the correct direction, usually referring to the last layer.
- **Permutation**: The process of moving pieces to their correct locations, often after orientation is complete.
- **Scramble**: A random sequence of moves applied to a solved puzzle to mix it up before solving.
- **Solve**: The process of returning a scrambled puzzle to its solved state, where each face is a single color.
- **Inspection**: The 15-second period before a solve during which competitors can examine the puzzle and plan their solution, but cannot turn the puzzle.
- **Last Layer**: The final layer to be solved, typically the top face of the cube.
- **One-looking**: A technique where a solver plans the entire solution during the inspection time, allowing them to execute the solution without pausing during the solve.
- **BLD** – Shorthand for blindfolded.

## Puzzle types

The original Rubik's Cube is 3x3 pieces, but there are many other types of puzzles. Some are not even cubes. Let's go through the main puzzle types and the most common solving methods.

### 3x3

This is the original puzzle invented by Ernő Rubik. It has 6 colors, most commonly: white, yellow, blue, green, red, and orange. The goal is to arrange the cube so that each face is a single color.

<img src="https://upload.wikimedia.org/wikipedia/commons/e/e5/Rubiks_cube_scrambled.jpg" width="300"/>

The most common method used by speedcubers is CFOP—an abbreviation for Cross, F2L, OLL, PLL. The method consists of four steps:
1. Cross – solving the cross on the first layer
2. F2L – solving the first two layers
3. OLL – orienting the last layer (making all pieces on the top face the same color)
4. PLL – permuting the last layer (moving the pieces on the top layer to their correct positions)

The first two steps are mostly intuitive, while the last two rely on algorithms.

### 2x2

Smaller than the classic cube, this puzzle is much easier to solve, with the world record being under 1 second. There are several methods for solving the 2x2:
- Layer by layer – solve the first layer intuitively, then solve the second layer with algorithms similar to 3x3 OLL and PLL
- Ortega – solve a face (one side of the same color), orient the second layer, then finish the cube with a single algorithm
- EG – solve a face, then finish the cube with a single algorithm (this method uses 126 algorithms)

Top solvers use *one-looking* (see glossary), which allows them to solve this puzzle very quickly.

<img src="https://cuboss.com/wp-content/uploads/2021/04/2x2-speedcube-2x2-Rubiks-cube.jpg" width="300"/>

### 4x4

The 4x4 cube, also known as the Rubik's Revenge, introduces additional complexity compared to the 3x3 due to the lack of fixed center pieces and the presence of more edge and center pieces. This means centers must be solved first, and edge pieces must be paired before the cube can be solved like a 3x3.

The most common method for solving the 4x4 is **Yau**, which is a variant of the **reduction** method. The Yau method typically involves:
1. Solving two opposite centers
2. Pairing three cross edges
3. Solving the remaining centers
4. Pairing the remaining edges
5. Solving the cube as a 3x3

The standard reduction method involves solving all centers first, then pairing all edges, and finally solving the cube as a 3x3. The Yau method is generally faster and more efficient.

A unique challenge with the 4x4 is **parity errors**—situations that cannot occur on a 3x3. The two main parities are:
- **PLL parity**: When only two corner or two edge pieces are swapped.
- **OLL parity**: When only a single edge (technically two edges, but since they are paired, we consider them to be a single edge in the 3x3 stage) is incorrectly oriented.

These parities occur because the 4x4 has an even number of layers, which allows for move combinations not possible on odd-layered cubes. Handling the parity requires a special algorithm.

The reason why the parity happens is mathematically very interesting, but is out of scope for this paper.

<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/5/52/Rubiks_revenge_solved.jpg/500px-Rubiks_revenge_solved.jpg" width="300"/>

### 5x5

More pieces mean more work and more time. The common method is **reduction**, consisting of these steps:
- Solving centers (center pieces have a single color; the goal is to arrange them correctly)
- Pairing edges (edges have two colors; pairing means putting matching edge pieces together)
- 3x3 stage (once centers and edges are solved, the puzzle is essentially reduced to a 3x3 (hence the name) and can be solved using only outer layer turns)

<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/f/fe/Professors_cube.jpg/500px-Professors_cube.jpg" width="300"/>

### 6x6

The common method is **reduction**, involving solving centers, pairing edges, and the 3x3 stage. Sounds familiar? That's right, this method is exactly the same as the method for *5x5*.

There are some differences, mainly **parity**—the same thing that happens on 4x4. But the key parts of the method are the same.

<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/8/82/V-Cube_6_solved.jpg/500px-V-Cube_6_solved.jpg" width="300"/>

### 7x7

As you might expect, the method used is **reduction**, just like the *5x5* and 6x6. Once you know how to solve the *5x5*, you can solve cubes of any size—even a *15x15*—the method remains the same.

The use of the same method will be important later, where we discuss the correlation across these events.

For both 6x6 and 7x7, the standard format is not average of 5 solves, but mean of 3 solves. This is to reduce the time required for the event, as larger cubes take significantly longer to solve.

<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/9/9c/V-Cube_7_solved.jpg/480px-V-Cube_7_solved.jpg" width="300"/>

### Megaminx

The Megaminx is a dodecahedron-shaped puzzle with 12 faces, each a different color. The solving process is similar to the 3x3 cube, solving the cube by layer, and finishing with the top layer. The last layer requires more algorithms due to the increased number of pieces. More pieces mean more work, but also thanks to more faces, there is more freedom in moving pieces around, allowing more efficient solutions not possible on regular 3x3.

<img src="https://upload.wikimedia.org/wikipedia/commons/7/74/Megaminx.jpg" width="300"/>

### Skewb

The Skewb is a cube-shaped puzzle, but it turns around its corners rather than its faces. It has 8 corners and 6 center pieces. The solution is relatively simple, often requiring only a few algorithms. Most methods involve solving the top and bottom layer, then the remaining centers. *One-looking* is commonly used by top solvers.

<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/8/80/Skewb.jpg/500px-Skewb.jpg" width="300"/>

### Pyraminx

The Pyraminx is a tetrahedron-shaped puzzle with 4 faces. It has 4 tips, 4 centers, and 6 edges. *One-looking* is commonly used by top solvers thanks to the small ammount of pieces.

<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/4/4d/Pyraminx_blue_green_cubemeister_com.jpg/500px-Pyraminx_blue_green_cubemeister_com.jpg" width="300"/>

### Square-1

Square-1 is a cube-shaped puzzle that can change shape as it is scrambled, making it a "shape-shifter". It's the only shape-shifting event in WCA competitions. It has 8 corners and 8 edges, but the pieces are not all the same size. The solution involves restoring the cube shape first, then solving the pieces. The puzzle requires unique algorithms due to its unusual mechanics and parity issues. The methods are overally algorithm-based, with very little room for intuitive solutions.

<img src="https://upload.wikimedia.org/wikipedia/commons/d/d8/Square-1_solved.jpg" width="300"/>
<img src="https://upload.wikimedia.org/wikipedia/commons/7/7c/Square-1_scrambled.jpg" width="300"/>

### 3x3 One Handed

The 3x3 One-Handed event is similar to the regular 3x3 event, but competitors must solve the cube using only one hand. The methods used are very similar to stadard 3x3 solving.

Interesting fact: most right handed cubers use their left hand for one-handed solving.

### Blindfolded events

Blindfolded events include 3x3 Blindfolded, 4x4 Blindfolded, 5x5 Blindfolded, and Multi-Blind. In these events, competitors memorize the puzzle during inspection, then solve it blindfolded. There is no inspection in these events and the the memorization part of the solve is included in the total solve time.

Methods used for blindfolded solving require lot more moves than traditional puzzle methods. They essentialy solve the cube "piece by piece", while not moving any other pieces which aren't being solved at the time.

Multi-blind is about solving multiple differently scambled 3x3 cubes blindfolded–first memorizing all of them, then solving. The competitor can choose how many cubes they want to attempt to solve.

### FMC (Fewest Moves Challenge)

In FMC, competitors are given a scramble and have 1 hour to find the solution using the fewest possible moves. It's the only event where time is not the goal.

## Goal of this statistical project, methodology

The goal is to measure the correlation between how good a person is at different speedcubing events.
 
To do this, we need a way to quantify how "good" someone is at an event.

### Measuring how "good" a speedcuber is at some event

There are two main approaches I came up with to quantify this. Let's go over them and list their properties, advantages and disadvantages.

1. **Relative Ranking (Percentile)**
   - There are world, continental and country rankings for each event. For simplicity, world ranking will be used, since it provides most amount of data.
   - Comparison of absolute rankings makes only little sense, since there are events with very different popularity–unsurprisingly, 3x3 is the event almost every competitor has participated. In contrast, events like 5x5 blindfolded aren't as popular. This is because of their difficulty and the fact that not many competitions include them.
   - For the reasons stated above, percentile will be used.
2. **Performance Ratio (Relative to World Record)**
   - For each competitor, their best time divided by the world record time shows how many times they are slower than the record. This simple and intuitive metric which turns out to be quite effective and consistent across events.
   - This method is less affected by the popularity of the event, as it directly compares the competitor's performance to the best in the world.
   - Since FMC (3x3 Fewest moves) is an untimed avent, this method doesn't make much sense to use. 
  
For both approaches, the average of 5 solves will be compared, since it offers more balanced view of competitors ability. Single solves can be more based on luck, while averaging multiple solves also shows consistency of their skill.

If average of 5 can't be used (either due to lack of data, or due to event's different format), the best single solve will be used instead.

As I am unsure of which approach is better, I will calculate correlations using both.


## My hypothesis

As I am a speedcuber myself, I feel I can make some educated guesses about the correlation between different events.
A bit of context: I competed for the first time in Bratislava in 2016. My best 3x3 average is 11.68 seconds, ranking me 94th (out of 703) in the Czech Republic and about 15,000th (out of almost 250,000) competitors worldwide. You can also have a look at my [WCA profile](https://www.worldcubeassociation.org/persons/2016AMBR02).

### What correlations I expect

- 5x5, 6x6, and 7x7 are solved using the same method, and improving times on them means improving very similar skills. Therefore, I expect those to be highly correlated.
  - These events are more "stamina-based" since they all take upwards of several minutes to complete the solve. This also adds to their similarity.
- 3x3 and 4x4 are very popular events many cubers practice together. While their methods differ a bit, I expect them to be quite correlated as well.
- 2x2, Pyraminx, and Skewb are all fast-paced, short events. Top solvers use one-looking in all three. Their methods don't share many similarities, but many cubers I know practice them together. I expect decent correlation there.
- Square-1 is a puzzle very different from all others. I expect the correlation to be quite low, but not zero, since cubers who are generally more willing to practice in general will also practice Square-1, improving their times.
- 3x3 Blindfolded, 4x4 Blindfolded, and 5x5 Blindfolded I expect not to correlate a lot with other non-blindfolded events, but correlate highly with each other. Once a cuber finds blindfolded solving fun, they are probably willing to practice other blindfolded events too. There are also many cubers with good results in other events, but don't compete in blidfolded solving since they find it uninteresting.
- 3x3 Fewest Moves won't correlate with much with anything. It's a thing of its own, also because it's the only untimed event.