## Data Access

The data file is encrypted and hence ought to be decrypted to access the content. This means, one will need a decryption key which can be supplied to the `read_encrypted()` function. Though, you can paste the key into the code (as in Case 1), avoid this and enter the key via the `getpass()` function (as in Case 2). Note that, when entring the key via `getpass()`, the content of what you are typing will be masked or not seen at all. The simplest way is to copy and paste the key when prompted and then hit the enter key.

> When you have been able to access the data, please do not save it to file. Also, remember not to share the data or the key with anyone.

In [1]:
import utils.read_encrypted as ure
import getpass

### Case 1

In [None]:
# df = ure.read_encrypted("data/bece-encrypted.zip", b'paste-key-here')

### Case 2

In [2]:
df = ure.read_encrypted(
    "data/bece-encrypted.zip", 
    bytes(getpass.getpass("Enter key: "), "utf-8")
)

In [3]:
df.shape

(519537, 26)

In [4]:
df.head()

Unnamed: 0,index_no,gender,jhs_code,jhs_district,jhs_region,jhs_type,shs_code,shs_name,options,english,...,best_elective_1_name,best_elective_1_score,best_elective_1_aggregate,best_elective_2_name,best_elective_2_score,best_elective_2_aggregate,count_of_ones,aggregate,raw_score,dtrack
0,22441,Female,15096,LA DADE-KOTOPON MUNICIPAL,Greater Accra Region,PRIVATE,30107.0,"Wesley Girls Senior High, Cape Coast",A,87.0,...,B.D.T./PRE-TECH.,85.0,1.0,FRENCH,86.0,1.0,9.0,6.0,570.0,GOLD
1,257396,Male,13403,SUNYANI WEST,Brong-Ahafo,PRIVATE,60106.0,"St. James Sem & Senior High, Abesim",A,91.0,...,B.D.T./PRE-TECH.,87.0,1.0,FRENCH,89.0,1.0,9.0,6.0,567.0,GREEN
2,401850,Male,13403,SUNYANI WEST,Brong-Ahafo,PRIVATE,60106.0,"St. James Sem & Senior High, Abesim",A,88.0,...,B.D.T./PRE-TECH.,85.0,1.0,FRENCH,90.0,1.0,9.0,6.0,566.0,GREEN
3,351070,Female,6349,AWUTU-SENYA EAST MUNICIPAL,Central Region,PRIVATE,30107.0,"Wesley Girls Senior High, Cape Coast",A,82.0,...,B.D.T./PRE-TECH.,90.0,1.0,FRENCH,91.0,1.0,9.0,6.0,564.0,GREEN
4,230932,Female,8325,ACCRA METROPOLITAN,Greater Accra Region,PRIVATE,30107.0,"Wesley Girls Senior High, Cape Coast",A,88.0,...,FRENCH,93.0,1.0,REL.& MORAL EDUC.,89.0,1.0,9.0,6.0,562.0,GOLD


In [5]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 519537 entries, 0 to 519536
Data columns (total 26 columns):
 #   Column                     Non-Null Count   Dtype  
---  ------                     --------------   -----  
 0   index_no                   519537 non-null  int64  
 1   gender                     519537 non-null  object 
 2   jhs_code                   519537 non-null  int64  
 3   jhs_district               519415 non-null  object 
 4   jhs_region                 519417 non-null  object 
 5   jhs_type                   519415 non-null  object 
 6   shs_code                   459967 non-null  float64
 7   shs_name                   459967 non-null  object 
 8   options                    459967 non-null  object 
 9   english                    519477 non-null  float64
 10  maths                      519477 non-null  float64
 11  socstudies                 519477 non-null  float64
 12  rme                        519477 non-null  float64
 13  intscience                 51

In [6]:
df.dtypes

index_no                       int64
gender                        object
jhs_code                       int64
jhs_district                  object
jhs_region                    object
jhs_type                      object
shs_code                     float64
shs_name                      object
options                       object
english                      float64
maths                        float64
socstudies                   float64
rme                          float64
intscience                   float64
ict                          float64
french                       float64
best_elective_1_name          object
best_elective_1_score        float64
best_elective_1_aggregate    float64
best_elective_2_name          object
best_elective_2_score        float64
best_elective_2_aggregate    float64
count_of_ones                float64
aggregate                    float64
raw_score                    float64
dtrack                        object
dtype: object

In [7]:
df.isnull().sum()

index_no                         0
gender                           0
jhs_code                         0
jhs_district                   122
jhs_region                     120
jhs_type                       122
shs_code                     59570
shs_name                     59570
options                      59570
english                         60
maths                           60
socstudies                      60
rme                             60
intscience                      60
ict                             60
french                          60
best_elective_1_name          5906
best_elective_1_score         1332
best_elective_1_aggregate     1332
best_elective_2_name          5990
best_elective_2_score         1332
best_elective_2_aggregate     1332
count_of_ones                   54
aggregate                       54
raw_score                       54
dtrack                         965
dtype: int64

In [4]:
df['dtrack'].value_counts()

SINGLE           191112
GREEN            178426
GOLD             148832
Gold                 81
NOT AVAILABLE        70
Green                51
Name: dtrack, dtype: int64

In [5]:
df['dtrack'].str.title().value_counts(dropna=False)

Single           191112
Green            178477
Gold             148913
NaN                 965
Not Available        70
Name: dtrack, dtype: int64

In [6]:
df['dtrack'] = df['dtrack'].str.title()

In [7]:
df.jhs_region.value_counts(dropna=False)

Ashanti Region          104368
Greater Accra Region     91115
Central Region           55706
Western Region           52531
Brong-Ahafo              49046
Eastern Region           48258
Northern                 43903
Volta                    37716
Upper East               22031
Upper West               13297
WESTERN                    380
ASHANTI                    331
Unknown                    236
NaN                        120
GR. ACCRA                  101
EASTERN                     91
B.A.                        87
CENTRAL                     59
VOLTA                       56
NORTHERN                    45
U. EAST                     38
U. WEST                     18
Greater Accra                3
U. West                      1
Name: jhs_region, dtype: int64

In [9]:
df['jhs_region'].str.title().value_counts()

Ashanti Region          104368
Greater Accra Region     91115
Central Region           55706
Western Region           52531
Brong-Ahafo              49046
Eastern Region           48258
Northern                 43948
Volta                    37772
Upper East               22031
Upper West               13297
Western                    380
Ashanti                    331
Unknown                    236
Gr. Accra                  101
Eastern                     91
B.A.                        87
Central                     59
U. East                     38
U. West                     19
Greater Accra                3
Name: jhs_region, dtype: int64

In [10]:
df['jhs_region'].str.title().str.replace(" Region", "").str.replace("Gr.", "Greater", regex=False).str.replace("B.A.", "Brong-Ahafo", regex=False).str.replace("U.", "Upper", regex=False).value_counts()

Ashanti          104699
Greater Accra     91219
Central           55765
Western           52911
Brong-Ahafo       49133
Eastern           48349
Northern          43948
Volta             37772
Upper East        22069
Upper West        13316
Unknown             236
Name: jhs_region, dtype: int64

In [22]:
df['jhs_region'].str.title().value_counts()

Ashanti Region          104368
Greater Accra Region     91115
Central Region           55706
Western Region           52531
Brong-Ahafo              49046
Eastern Region           48258
Northern                 43948
Volta                    37772
Upper East               22031
Upper West               13297
Western                    380
Ashanti                    331
Unknown                    236
Gr. Accra                  101
Eastern                     91
B.A.                        87
Central                     59
U. East                     38
U. West                     19
Greater Accra                3
Name: jhs_region, dtype: int64

In [24]:
(
    df['jhs_region']
    .str.title()
    # .str.strip()
    .str.replace(" Region", "")
    .str.replace("Gr.", "Greater", regex=False)
    .str.replace("B.A.", "Brong-Ahafo", regex=False)
    .str.replace("U.", "Upper", regex=False)
    .value_counts()
)

Ashanti          104699
Greater Accra     91219
Central           55765
Western           52911
Brong-Ahafo       49133
Eastern           48349
Northern          43948
Volta             37772
Upper East        22069
Upper West        13316
Unknown             236
Name: jhs_region, dtype: int64

In [12]:
df['best_elective_1_name'].value_counts()

REL.& MORAL EDUC.     127723
INFO. & COMM.TECH.     93492
B.D.T./PRE-TECH.       71844
FRENCH                 45503
B.D.T./HOME ECONS.     42336
ASANTE TWI             38325
AKWAPIM TWI            19449
DAGBANI                17348
FANTE                  15959
EWE                    14014
DAGAARE                 8369
GA                      6167
DANGME                  5949
GONJA                   4526
KASEM                   2042
B.D.T./VISUAL ARTS       576
REL.& MORAL EDUC           5
ICT                        2
RME                        1
Best 1                     1
Name: best_elective_1_name, dtype: int64

In [11]:
df['best_elective_2_name'].value_counts()

REL.& MORAL EDUC.     128328
INFO. & COMM.TECH.    118402
B.D.T./HOME ECONS.     62930
B.D.T./PRE-TECH.       55012
ASANTE TWI             47218
FRENCH                 35440
FANTE                  18137
AKWAPIM TWI            15431
EWE                     9093
GA                      7487
DAGBANI                 7094
DAGAARE                 2941
DANGME                  2539
GONJA                   2200
B.D.T./VISUAL ARTS       927
KASEM                    364
B.D.T/PRE-TECH             1
BDT/PRE-TECH               1
BDT/HOME ECONS             1
Best 2                     1
Name: best_elective_2_name, dtype: int64

In [58]:
(
    df['best_elective_2_name']
    .str.replace('.*PRE-TECH.?', 'BDT Pre-Tech.', regex=True)
    #.str.title()
    .value_counts()
)

REL.& MORAL EDUC.     128328
INFO. & COMM.TECH.    118402
B.D.T./HOME ECONS.     62930
BDT Pre-Tech.          55014
ASANTE TWI             47218
FRENCH                 35440
FANTE                  18137
AKWAPIM TWI            15431
EWE                     9093
GA                      7487
DAGBANI                 7094
DAGAARE                 2941
DANGME                  2539
GONJA                   2200
B.D.T./VISUAL ARTS       927
KASEM                    364
Best 2                     1
BDT/HOME ECONS             1
Name: best_elective_2_name, dtype: int64

In [13]:
elective_ = {
  'B.D.T./PRE-TECH': 'BDT Pre-Tech.', 
  'B.D.T/PRE-TECH': 'BDT Pre-Tech.', 
  'BDT/PRE-TECH': 'BDT Pre-Tech.', 
  'B.D.T./PRE-TECH.': 'BDT Pre-Tech.', 
  'B.D.T./HOME ECONS.': 'BDT Home Econs.', 
  'BDT/HOME ECONS': 'BDT Home Econs.', 
  'REL.& MORAL EDUC.': 'Rel. & Moral Educ.', 
  'REL.& MORAL EDUC': 'Rel. & Moral Educ.', 
  'RME': 'Rel. & Moral Educ.', 
  'INFO. & COMM.TECH.': 'Info. & Comm. Tech.', 
  'ICT': 'Info. & Comm. Tech.', 
  'B.D.T./VISUAL ARTS': 'BDT Visual Arts'  
}

In [14]:
df['best_elective_1_name'].map(elective_).value_counts(dropna=False)

NaN                    183558
Rel. & Moral Educ.     127729
Info. & Comm. Tech.     93494
BDT Pre-Tech.           71844
BDT Home Econs.         42336
BDT Visual Arts           576
Name: best_elective_1_name, dtype: int64

In [15]:
def mutate_elective_(x):
    elective_ = {
  'B.D.T./PRE-TECH': 'BDT Pre-Tech.', 
  'B.D.T/PRE-TECH': 'BDT Pre-Tech.', 
  'BDT/PRE-TECH': 'BDT Pre-Tech.', 
  'B.D.T./PRE-TECH.': 'BDT Pre-Tech.', 
  'B.D.T./HOME ECONS.': 'BDT Home Econs.', 
  'BDT/HOME ECONS': 'BDT Home Econs.', 
  'REL.& MORAL EDUC.': 'Rel. & Moral Educ.', 
  'REL.& MORAL EDUC': 'Rel. & Moral Educ.', 
  'RME': 'Rel. & Moral Educ.', 
  'INFO. & COMM.TECH.': 'Info. & Comm. Tech.', 
  'ICT': 'Info. & Comm. Tech.', 
  'B.D.T./VISUAL ARTS': 'BDT Visual Arts'  
}
    if x in elective_.keys():
        return elective_.get(x)
    return x.title()

In [17]:
df['best_elective_1_name'].map(mutate_elective_, na_action="ignore").value_counts(dropna=False)

Rel. & Moral Educ.     127729
Info. & Comm. Tech.     93494
BDT Pre-Tech.           71844
French                  45503
BDT Home Econs.         42336
Asante Twi              38325
Akwapim Twi             19449
Dagbani                 17348
Fante                   15959
Ewe                     14014
Dagaare                  8369
Ga                       6167
Dangme                   5949
NaN                      5906
Gonja                    4526
Kasem                    2042
BDT Visual Arts           576
Best 1                      1
Name: best_elective_1_name, dtype: int64

In [35]:
elective = {
  "BDT Pre-Tech.": ["B.D.T./PRE-TECH", "B.D.T/PRE-TECH", "BDT/PRE-TECH", "B.D.T./PRE-TECH."],
  "BDT Home Econs.": ["B.D.T./HOME ECONS.", "BDT/HOME ECONS"],
  "Rel. & Moral Educ.": ["REL.& MORAL EDUC.", "RME", "REL.& MORAL EDUC"],
  "Info. & Comm. Tech.": ["INFO. & COMM.TECH.", "ICT"],
  "BDT Visual Arts": ["B.D.T./VISUAL ARTS"]
}

In [None]:
def mutate_elective_(x):
  for key, value in elective.items():
    if x in value:
      return key
  return x.title()

In [18]:
def mutate_elective(x):
    if x in ["B.D.T./PRE-TECH", "B.D.T/PRE-TECH", "BDT/PRE-TECH", "B.D.T./PRE-TECH."]:
        return "BDT Pre-Tech."
    elif x in ["B.D.T./HOME ECONS.", "BDT/HOME ECONS"]:
        return "BDT Home Econs."
    elif x in ["REL.& MORAL EDUC.", "RME", "REL.& MORAL EDUC"]:
        return "Rel. & Moral Educ."
    elif x in ["INFO. & COMM.TECH.", "ICT"]:
        return "Info. & Comm. Tech."
    elif x in ["B.D.T./VISUAL ARTS"]:
        return "BDT Visual Arts"
    else:
        return x.title()

In [19]:
df['best_elective_1_name'] = df['best_elective_1_name'].map(mutate_elective, na_action="ignore")

In [20]:
df['best_elective_2_name'] = df['best_elective_2_name'].map(mutate_elective, na_action="ignore")

In [25]:
df["jhs_type"].value_counts()

PUBLIC     393713
PRIVATE    125466
Unknown       236
Name: jhs_type, dtype: int64

In [24]:
df.head()

Unnamed: 0,index_no,gender,jhs_code,jhs_district,jhs_region,jhs_type,shs_code,shs_name,options,english,...,best_elective_1_name,best_elective_1_score,best_elective_1_aggregate,best_elective_2_name,best_elective_2_score,best_elective_2_aggregate,count_of_ones,aggregate,raw_score,dtrack
0,22441,Female,15096,LA DADE-KOTOPON MUNICIPAL,Greater Accra Region,PRIVATE,30107.0,"Wesley Girls Senior High, Cape Coast",A,87.0,...,BDT Pre-Tech.,85.0,1.0,French,86.0,1.0,9.0,6.0,570.0,Gold
1,257396,Male,13403,SUNYANI WEST,Brong-Ahafo,PRIVATE,60106.0,"St. James Sem & Senior High, Abesim",A,91.0,...,BDT Pre-Tech.,87.0,1.0,French,89.0,1.0,9.0,6.0,567.0,Green
2,401850,Male,13403,SUNYANI WEST,Brong-Ahafo,PRIVATE,60106.0,"St. James Sem & Senior High, Abesim",A,88.0,...,BDT Pre-Tech.,85.0,1.0,French,90.0,1.0,9.0,6.0,566.0,Green
3,351070,Female,6349,AWUTU-SENYA EAST MUNICIPAL,Central Region,PRIVATE,30107.0,"Wesley Girls Senior High, Cape Coast",A,82.0,...,BDT Pre-Tech.,90.0,1.0,French,91.0,1.0,9.0,6.0,564.0,Green
4,230932,Female,8325,ACCRA METROPOLITAN,Greater Accra Region,PRIVATE,30107.0,"Wesley Girls Senior High, Cape Coast",A,88.0,...,French,93.0,1.0,Rel. & Moral Educ.,89.0,1.0,9.0,6.0,562.0,Gold


In [23]:
df['jhs_district'].value_counts()

ACCRA METROPOLITAN               24066
KUMASI METROPOLITAN              23645
GA WEST MUNICIPAL                 9848
GA SOUTH MUNICIPAL                8273
SEKONDI/TAKORADI METROPOLITAN     8273
                                 ...  
Unknown                            236
Not Available                       94
Shai-Osudoku                         2
Accra Metro                          1
WA Municipal                         1
Name: jhs_district, Length: 221, dtype: int64