# Typing Speeds

How can you improve your typing speed?

The file `typing-speeds.csv` contains typing speed data from >168,000 people typing 15 sentences each. The data was collected via an online typing test published at a free typing speed assessment webpage.

In [8]:
# FOR GOOGLE COLAB ONLY.
# Uncomment and run the code below. A dialog will appear to upload files.
# Upload 'typing-speeds.csv'.

# from google.colab import files
# uploaded = files.upload()

In [1]:
import pandas as pd

df = pd.read_csv('typing-speeds.csv')
df

Unnamed: 0,PARTICIPANT_ID,AGE,HAS_TAKEN_TYPING_COURSE,COUNTRY,LAYOUT,NATIVE_LANGUAGE,FINGERS,KEYBOARD_TYPE,ERROR_RATE,AVG_WPM_15,ROR
0,3,30,0,US,qwerty,en,1-2,full,0.511945,61.9483,0.2288
1,5,27,0,MY,qwerty,en,7-8,laptop,0.871080,72.8871,0.3675
2,7,13,0,AU,qwerty,en,7-8,laptop,6.685633,24.1809,0.0667
3,23,21,0,IN,qwerty,en,3-4,full,2.130493,24.7112,0.0413
4,24,21,0,PH,qwerty,tl,7-8,laptop,1.893287,45.3364,0.2678
...,...,...,...,...,...,...,...,...,...,...,...
168589,517932,20,0,US,qwerty,en,9-10,laptop,8.731466,24.9125,0.1842
168590,517936,25,0,PL,qwerty,pl,9-10,laptop,0.000000,66.2946,0.0639
168591,517943,38,1,US,qwerty,en,9-10,laptop,0.147929,75.6713,0.2021
168592,517944,28,0,GB,qwerty,en,9-10,laptop,0.278552,91.7083,0.5133




| **Variable**             | **Description**                                                                 |
|--------------------------|---------------------------------------------------------------------------------|
| `PARTICIPANT_ID`         | Unique ID of the participant                                                   |
| `AGE`                    | Age of the participant                                                         |
| `HAS_TAKEN_TYPING_COURSE`| Whether the participant has taken a typing course (1 = Yes, 0 = No)            |
| `COUNTRY`                | Country of the participant                                                     |
| `LAYOUT`      		   | Keyboard layout used (QWERTY, AZERTY, or QWERTZ)                               |
| `NATIVE_LANGUAGE`        | Native language of the participant                                             |
| `FINGERS`                | Number of fingers used for typing (options: 1-2, 3-4, 5-6, 7-8, 9-10)          |
| `KEYBOARD_TYPE`          | Type of keyboard used (Full/desktop, laptop, small physical, or touch)         |
| `ERROR_RATE(%)`          | Uncorrected error rate (as a percentage)                                       |
| `AVG_WPM_15`             | Words per minute averaged over 15 typed sentences                              |
| `ROR`                    | Rollover ratio                                                                 |


### Project Ideas
- Remove unnecessary columns, such as PARTICIPANT_ID, to streamline the dataset.

- Rename columns (e.g `AVG_WPM_15` to `wpm`, `ROR` to `ror`, `HAS_TAKEN_TYPING_COURSE` to `course`) for brevity and clarity during analysis.

Finger Count Analysis
- Compare typing speeds across groups using different numbers of fingers, excluding the "10+" category for simplicity.

- Control for consistency by first filtering to similar `AGE`, `KEYBOARD_LAYOUT`, `NATIVE_LANGUAGE`, `KEYBOARD_TYPE`, and `HAS_TAKEN_TYPING_COURSE` values.

- Exclude participants with high error rates (ERROR_RATE > 3%) to focus on reliable data.

- Drop columns after filtering if they now only have a single value.

Rollover Ratio Analysis
- The Rollover Ratio (`ROR`) represents the proportion of keypresses where a new key is pressed before releasing the previous one.

- Compare typing speeds between participants with `ROR` ≤ 20% and those with `ROR` > 80%, keeping `AGE`, `KEYBOARD_TYPE`, `FINGERS`, and other variables constant.

Influence of Typing Course
- Compare typing speeds between participants with a typing course (`HAS_TAKEN_TYPING_COURSE` = 1) and without (`HAS_TAKEN_TYPING_COURSE` = 0), holding other variables such as `KEYBOARD_TYPE`, `AGE` range, and `FINGER_COUNT` constant.


In [80]:
# YOUR CODE HERE (add additional cells as needed)

# Drop unnecessary columns and rename the rest.
typing_df = df.drop(columns=['PARTICIPANT_ID', 'COUNTRY'])

# Rename columns to lowercase and more descriptive names.
rename_map = {
    'AGE': 'age',
    'HAS_TAKEN_TYPING_COURSE': 'course',
    'LAYOUT': 'layout',
    'NATIVE_LANGUAGE': 'native_language',
    'FINGERS': 'fingers',
    'KEYBOARD_TYPE': 'keyboard_type',
    'ERROR_RATE': 'error_rate',
    'AVG_WPM_15': 'wpm',
    'ROR': 'ror',
}

# Rename the columns using the map
typing_df = typing_df.rename(columns=rename_map)

#Exclude participants who used more than 10 fingers.
typing_df = typing_df[typing_df['fingers'] != '10+']

# Calculate the average WPM for each number of fingers used.
fingers_comparison = typing_df.groupby('fingers', as_index=False)['wpm'].mean().sort_values(by='wpm', ascending=False)

# Filter the DataFrame for consistent control group participants.
control_consistent = typing_df.query('error_rate < 3 and layout == "qwerty" and native_language == "en" and keyboard_type == "full"')[['age', 'layout', 'native_language', 'keyboard_type', 'course', 'error_rate']]

# Remove columns with only one unique value.
control_consistent = control_consistent.loc[:, control_consistent.nunique() > 1]

# Filter participants with ROR <= 0.20 or > 0.80 and create a new column for ROR group.
filtered_ror = typing_df[
    (typing_df['ror'] <= 0.20) | 
    (typing_df['ror'] > 0.80)
].copy()

# Create a new column 'ror_group' to categorize ROR values.
filtered_ror['ror_group'] = filtered_ror['ror'].apply(lambda x: 'low_ror' if x <= 0.20 else 'high_ror')

# Calculate the average WPM for each ROR group.
comparison = filtered_ror.groupby('ror_group', as_index=False).agg({'wpm': 'mean'})

# Calculate the average WPM for participants who took a typing course vs those who did not.
speed_with_course = typing_df.groupby('course', as_index=False)['wpm'].mean().sort_values(by='wpm', ascending=False)

# Map the course values to 'Yes' and 'No'.
speed_with_course['course'] = speed_with_course['course'].map({1: 'Yes', 0: 'No'})

print(f"These are the average WPM scores by number of fingers used:\n{fingers_comparison}")
print(f"These are the consistent control group participants:\n{control_consistent}")
print(f"These are the average WPM scores by ROR group:\n{comparison}")
print(f"These are the average WPM scores by typing course participation:\n{speed_with_course}")




These are the average WPM scores by number of fingers used:
  fingers        wpm
4    9-10  57.379572
3     7-8  50.057909
2     5-6  45.731789
1     3-4  41.004952
0     1-2  40.280812
These are the consistent control group participants:
        age  course  error_rate
0        30       0    0.511945
3        21       0    2.130493
8        22       0    2.194357
15       18       0    1.183432
19       27       0    0.759878
...     ...     ...         ...
168576   15       0    1.610306
168577   19       0    0.925926
168580   38       1    0.270898
168587   22       0    2.549575
168588   27       0    2.686567

[56243 rows x 3 columns]
These are the average WPM scores by ROR group:
  ror_group        wpm
0  high_ror  98.214091
1   low_ror  37.356032
These are the average WPM scores by typing course participation:
  course        wpm
1    Yes  54.349929
0     No  49.008138
