## Domain:
#### Ofsted rating is a manual process dependent on an ofsted inspector physically visiting the school. 
#### Schools are rated in four categories, 1.Outstanding 2. Good 3. Satisfactory 4. Inadequate
#### Osted can perform a limited number of visits per year(appr 1200 visits every year among  primary16000 schools), so they usually inspect 
- a 'satisfactory' school every 5 years(EDA shows much more) or so.
- a 'good' school every 3 years((EDA shows much more) or so.
- an 'oustanding' school, only when they receive a complaint

## Problem : 
#### A school can have an incorrect ofsted rating for many years, until a new inspection is carried out in who knows 5/10/15 years!!!
#### Ideal solution is to have school inspections every year for each school, but this will be very costly as we need physical school inspection.

## Solution : 
#### Predict ofsted ratings of schools, so that we identify schools, that need ofsted manual inspection the most.

## What does this notebook do?
- load data for primary schools that were ofsted inspected in year 2016-2017
- perform a test-train split on 2016-2017 data.
- baseline model score to predict ofsted ratings
- perform gridsearch on 3 different models with various scorers
- display/plot the scores/metrics of all models for the test data from 2016-2017.
- identify best model, and draw conculsions.
- Bonus: 
 - test how these models perform on year 2015-2016 and 2017-2018 data.
 - display/plot the scores/metrics of all models for 2015-2016 and 2017-2018 data.
 - Testing the solution : identify schools that need ofsted manual inspection the most, for for 2015-2016 and 2017-2018.

## Conclusion : 
- To classify a school as outstanding(1) or inadequate(4) with reduced feature set
 - we can see "LogisticRegression'>_minority_class_recall_score" model works the best with i.e. {'C': 0.001, 'penalty': 'l2', 'solver': 'liblinear'}
 - it has a minority_class_recall_score of 82.55%. 
- With this model, we can identify the schools that need the ofsted physical inspection the most
 - i.e. the schools whose current and predicted ofsted differs a lot
 - e.g. a school with current_ofsted=1 and predicted_ofsted=4 needs ofsted physical inspection the most

#### Note :
- minority_class_recall_score = we have identified x% of the school that fall in category 1 or 4. i.e. 
- we have used feature redcution here, using regularization run in an earlier notebook. We noticed overall recall went up but minority_class_recall_score went down. so this is not the best model/feature
- baseline model had minority_class_recall_score: 34.13%

In [1]:
%run import_util.py
import scan_api
import util

START loading util functions
DONE loading util functions


## Load data for schools that were ofsted inspected in year 2016-2017

In [2]:
## parameters that you might want to reset in the notebook

# util.np.random.seed(42)
# util.trace=False
util.debug=True
# util.info=True
# util.multiclass=True
# util.upsample=True

# util.myscorers = [
#     util.multiclass_accuracy_score,
#     util.multiclass_recall_score,
#     util.multiclass_f1_score,
#     util.minority_class_recall_score,
#     util.poor_class_recall_score
# ]
n=2

In [3]:
# based on EDA, we found 600+ school data missing total_income, total_expenditure,free_school_meal_band . So we should drop them from the feature set
# based on regularization, we found only these features useful

# set features and target
features=['readprog', 'ptread_exp','ptread_high', 'read_average', 
          'writeprog', 'ptgps_exp', 'ptgps_high','gps_average', 
          'matprog', 'ptmat_exp', 'ptmat_high', 'ptmat_average']

target="ofsted"

all=['URN']+features+['ofsted']

In [4]:
# We have three years of school data
start=['2016','2015','2017']
end=['2017','2016','2018']

# we are loading 2016_2017 data, and mdoelling on it
util.data_directory="./data/"+start[0]+"-"+end[0]+"/"
df=util.read_file("cleanData1.csv")
df.head(n)

Unnamed: 0,URN,abscence,persitent_abscence,total_pupils,girls_perc,english_nfl,free_school_meal,is_london,total_income,total_expenditure,...,ptgps_exp,ptgps_high,gps_average,matprog,ptmat_exp,ptmat_high,ptmat_average,ofsted,pub_date,ofsted_phase
0,100000,2.5,2.8,276.0,49.3,59.3,12.0,True,8176.0,8319.0,...,92.0,50.0,109.0,3.1,92.0,38.0,107.0,1.0,2013-04-19,Primary
1,136807,3.8,6.5,92.0,44.6,59.7,13.0,False,,,...,,,,,,,,2.0,2013-07-04,Primary


In [5]:
display(df.shape)
display(df['ofsted'].value_counts())

# filter out schools which were ofsted inspected this year
df['pub_date'] = df.pub_date.apply(util.convert_to_datetime) 
df.dropna(axis=0, how='any', subset=['pub_date'],inplace=True)
start_date = datetime.strptime(start[0]+'-08-01', '%Y-%m-%d')
end_date = datetime.strptime(end[0]+'-07-30', '%Y-%m-%d')
df = df[ (df['pub_date']>start_date) & (df['pub_date']<end_date) ]

# convert boolean feature to float
df['is_london'].replace(True, 1,inplace=True)
df['is_london'].replace(False, 0,inplace=True)

display(df.shape)
display(df['ofsted'].value_counts())

(16785, 37)

2.0    11576
1.0     3063
3.0     1333
4.0      214
Name: ofsted, dtype: int64

(2006, 37)

2.0    943
3.0    736
4.0    166
1.0    161
Name: ofsted, dtype: int64

In [6]:
#  drop all rows which have NAN 

df = df[all]

display(df.shape)
display(df['ofsted'].value_counts())
df.dropna(axis=0, how='any',inplace=True)
display(df.shape)
display(df['ofsted'].value_counts())

display(df.isnull().sum())
df.head(n)

(2006, 14)

2.0    943
3.0    736
4.0    166
1.0    161
Name: ofsted, dtype: int64

(1741, 14)

2.0    803
3.0    646
4.0    150
1.0    142
Name: ofsted, dtype: int64

URN              0
readprog         0
ptread_exp       0
ptread_high      0
read_average     0
writeprog        0
ptgps_exp        0
ptgps_high       0
gps_average      0
matprog          0
ptmat_exp        0
ptmat_high       0
ptmat_average    0
ofsted           0
dtype: int64

Unnamed: 0,URN,readprog,ptread_exp,ptread_high,read_average,writeprog,ptgps_exp,ptgps_high,gps_average,matprog,ptmat_exp,ptmat_high,ptmat_average,ofsted
38,100044,1.6,69.0,12.0,104.0,1.8,96.0,62.0,110.0,4.8,96.0,27.0,107.0,2.0
83,100168,5.4,100.0,56.0,111.0,1.5,97.0,63.0,111.0,2.9,91.0,38.0,108.0,1.0


In [7]:
# if the modelling is to be done on binary label, convert multiclass to binary
if not util.multiclass:
    display(df.shape)
    display(df['ofsted'].value_counts())
    
    df['ofsted'].replace([1, 2], 1,inplace=True)
    df['ofsted'].replace([3, 4], 0,inplace=True)

    display(df.shape)
    display(df['ofsted'].value_counts())

## Perform a test-train split on 2016-2017 data.

In [8]:
X = df[features]
y = df[target].values

#scale features, by default target is not scaled
#note:stratify=y
scalerX, scalery, X_train_scaled, X_test_scaled, y_train_scaled, y_test_scaled = util.split_scale_df(X,y,stratify=y)

display(pd.DataFrame(y_train_scaled,columns=['ofsted'])['ofsted'].value_counts())

2.0    602
3.0    484
4.0    112
1.0    107
Name: ofsted, dtype: int64

In [9]:
# upsample_minority_class 
if util.upsample:
    new_df=X_train_scaled.copy()
    new_df[target]=y_train_scaled
    if util.multiclass:
        new_df = util.upsample_minority_class_multiclass(new_df)
    else:
        new_df = util.upsample_minority_class_binary(new_df)
    X_train_scaled = new_df[features]
    y_train_scaled = new_df[target].values

2.0    602
3.0    484
4.0    112
1.0    107
Name: ofsted, dtype: int64
2.0    602
4.0    560
1.0    535
3.0    484
Name: ofsted, dtype: int64


In [10]:
df.head(n)

Unnamed: 0,URN,readprog,ptread_exp,ptread_high,read_average,writeprog,ptgps_exp,ptgps_high,gps_average,matprog,ptmat_exp,ptmat_high,ptmat_average,ofsted
38,100044,1.6,69.0,12.0,104.0,1.8,96.0,62.0,110.0,4.8,96.0,27.0,107.0,2.0
83,100168,5.4,100.0,56.0,111.0,1.5,97.0,63.0,111.0,2.9,91.0,38.0,108.0,1.0


## Baseline model score to predict ofsted ratings

In [11]:
from sklearn.dummy import DummyClassifier

print("baseline metrics using DummyClassifier")
for new_strategy in ["most_frequent","stratified","uniform","prior"]:
    dummy_classifier = DummyClassifier(strategy=new_strategy)
    dummy_classifier.fit( X_train_scaled,y_train_scaled )
    print(f"{new_strategy} based score : {dummy_classifier.score(X_test_scaled, y_test_scaled)}")
    y_test_pred=dummy_classifier.predict(X_test_scaled)
    to_print=""
    for my_scorer in util.myscorers:
        scorer_str=(str(my_scorer).split(" ")[1])
        to_print=f"test {scorer_str}: {my_scorer(y_test_scaled, y_test_pred)}"+ ", " + to_print
    util.printmd(to_print)

baseline metrics using DummyClassifier
most_frequent based score : 0.4610091743119266


<span style='color:blue'>test poor_class_recall_score: 0.0, test minority_class_recall_score: 0.0, test multiclass_f1_score: 0.631083202511774, test multiclass_recall_score: 0.25, test multiclass_accuracy_score: 0.4610091743119266, </span>

stratified based score : 0.24541284403669725


<span style='color:blue'>test poor_class_recall_score: 0.2894736842105263, test minority_class_recall_score: 0.2733082706766917, test multiclass_f1_score: 0.22314855662728542, test multiclass_recall_score: 0.2599727275624107, test multiclass_accuracy_score: 0.25229357798165136, </span>

uniform based score : 0.24770642201834864


<span style='color:blue'>test poor_class_recall_score: 0.3684210526315789, test minority_class_recall_score: 0.34135338345864663, test multiclass_f1_score: 0.2433931944602081, test multiclass_recall_score: 0.29853278165008984, test multiclass_accuracy_score: 0.26605504587155965, </span>

prior based score : 0.4610091743119266


<span style='color:blue'>test poor_class_recall_score: 0.0, test minority_class_recall_score: 0.0, test multiclass_f1_score: 0.631083202511774, test multiclass_recall_score: 0.25, test multiclass_accuracy_score: 0.4610091743119266, </span>

## Perform gridsearch on 3 different models with various scorers

In [12]:
def classifier_gridsearchcv(X_train, X_test, y_train, y_test, estimator, scorer):
    reg=True
    if str(type(estimator))=="<class 'sklearn.neighbors.classification.KNeighborsClassifier'>":
        param_grid={'n_neighbors': range(3,20,2)}
        reg=False
    elif str(type(estimator))=="<class 'sklearn.ensemble.forest.RandomForestClassifier'>":
        param_grid={"n_estimators": [9, 11, 13, 15, 17, 19, 21],
                                "min_samples_leaf": [5, 10, 25],
                                "max_depth": [3, 7, 10]}
        reg=False
    else:
        param_grid=[{'penalty':['l1'],'C': np.logspace(-3, 3, 10),'solver':['liblinear']},
                    {'penalty':['l2'],'C': np.logspace(-3, 3, 10),'solver':['liblinear']}]
    
    print("start")   
    grid = df_gridsearchcv(X_train, y_train, estimator,param_grid,scorer,4,refit=scorer)
    
    
    y_train_pred = grid.best_estimator_.predict(X_train)
    print("end")
    y_test_pred = grid.best_estimator_.predict(X_test)
    
    to_print=""
    if debug:
        for my_scorer in myscorers:
            a=(str(my_scorer).split(" ")[1])
            to_print=f"train {a}: {my_scorer(y_train, y_train_pred)}"+ ", " + to_print
        print(to_print)
        if reg:
            print(f"intercept: {grid.best_estimator_.intercept_}, coef{X_train.columns} : {grid.best_estimator_.coef_}")
    
    to_print=""
    for my_scorer in myscorers:
        a=(str(my_scorer).split(" ")[1])
        to_print=f"test {a}: {my_scorer(y_test, y_test_pred)}"+ ", " + to_print
    printmd(to_print)
    return grid, y_test,y_test_pred

def df_gridsearchcv(X_train, y_train, estimator, param_grid,scoring,cv,refit):
    grid = GridSearchCV(estimator=estimator,
                    param_grid=param_grid,
                    scoring=scoring,
                    return_train_score=True,
                    cv=cv, 
                    iid=True,
                    refit=refit,
                    error_score=np.nan)

    grid.fit(X=X_train,y=y_train)
    if debug:
        print(f"best_estimator: {grid.best_estimator_}")
        print(f"best estimator score: {grid.best_score_},best params: {grid.best_params_}")  
    if trace:
        display(pd.DataFrame(grid.cv_results_)[['params','mean_test_score']])
    return grid


In [13]:
# Run knn, log_reg and random_forest on various scorers
i=0
df_plot= pd.DataFrame(columns=['model_name','scorer','scores'])
grids = {}
# trace=True
# debug=True
for estimator in [KNeighborsClassifier(),
                   LogisticRegression(multi_class='ovr',max_iter=100),
                   RandomForestClassifier()]:
    for my_scorer in util.myscorers:
        scorer_str=str(my_scorer).split(" ")[1]
        estimator_str=str(type(estimator)).split(".")[3]
        run_name=estimator_str+"_"+scorer_str
        print(run_name)
#         grid, y_test, y_test_pred = util.classifier_gridsearchcv(X_train_scaled, 
        grid, y_test, y_test_pred = classifier_gridsearchcv(X_train_scaled, 
                                                      X_test_scaled,
                                                      y_train_scaled,
                                                      y_test_scaled,
                                                      estimator,scorer=util.make_scorer(my_scorer))
        for my_scorer1 in util.myscorers:
            scorer_str1=str(my_scorer1).split(" ")[1]
            df_plot.loc[i] = [estimator_str+":"+scorer_str,
                              scorer_str1,
                              my_scorer1(y_test,y_test_pred)*100]
            i=i+1
        
        grids[run_name]= (grid, y_test, y_test_pred)

# trace=False
# debug=False

KNeighborsClassifier'>_multiclass_accuracy_score
start


ERROR:root:Internal Python error in the inspect module.
Below is the traceback from this internal error.



Traceback (most recent call last):
  File "C:\Users\priya\dev\applications\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 3326, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-13-089de22e9561>", line 20, in <module>
    estimator,scorer=util.make_scorer(my_scorer))
  File "<ipython-input-12-40284770d253>", line 16, in classifier_gridsearchcv
    grid = df_gridsearchcv(X_train, y_train, estimator,param_grid,scorer,4,refit=scorer)
  File "<ipython-input-12-40284770d253>", line 49, in df_gridsearchcv
    grid.fit(X=X_train,y=y_train)
  File "C:\Users\priya\dev\applications\Anaconda3\lib\site-packages\sklearn\model_selection\_search.py", line 697, in fit
    self.best_index_ = self.refit(results)
TypeError: __call__() missing 2 required positional arguments: 'X' and 'y_true'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\priya\dev\applications\Anaconda3\l

TypeError: __call__() missing 2 required positional arguments: 'X' and 'y_true'

## Display/plot the scores/metrics of all models for the test data from 2016-2017.

In [None]:
# display test data scores of various best models  
df1 = df_plot.pivot(index='model_name', columns='scorer', values='scores')
df1.plot(kind='bar',figsize=(20, 10))

In [None]:
# display test confusion matrix of various best models  
for gridName,gridValues in grids.items():
    print(gridName)
    grid, y_test, y_test_pred = gridValues
    display(pd.crosstab(y_test, y_test_pred, rownames=['Actual'], colnames=['Predicted'], margins=True))


In [None]:
# display test data stats of various best models  

for gridName,gridValues in grids.items():
    print(gridName)
    grid, y_test, y_test_pred = gridValues
    print(classification_report(y_test,y_test_pred))

## Identify best model, and draw conculsions.

In [None]:
# conclusion: 
# To classify a school as outstanding(1) or inadequate(4)
# we can see "LogisticRegression'>_minority_class_recall_score" model works the best with an recall of 82.55%, {'C': 0.001, 'penalty': 'l2', 'solver': 'liblinear'}

# With this model, we can identify the schools that need the ofsted physical inspection the most
# i.e. the schools whose current and predicted ofsted differs a lot
# e.g. a school with current_ofsted=1 and predicted_ofsted=4 needs ofsted physical inspection the most

# Notes:
# recal_minority_class_recall_score = we have identified x% of the school that fall in category 1 or 4. i.e. 
# we have used feature redcution here, using regularization run in an earlier notebook.

## Bonus : Test how these models perform on year 2015-2016 and 2017-2018 data.

In [None]:
# calculate test data scores for other years by running our best models on them

df_plots={}
df_school_visit_plots={}
for i in [1,2]:
    util.data_directory="./data/"+start[i]+"-"+end[i]+"/" 
    util.printmd(f"stats for {util.data_directory}")
    df1 = pd.read_csv(util.data_directory + "cleanData1.csv")

    df1['pub_date'] = df1.pub_date.apply(util.convert_to_datetime) 
    df1.dropna(axis=0, how='any', subset=['pub_date'],inplace=True)
    start_date = datetime.strptime(start[i]+'-08-01', '%Y-%m-%d')
    end_date = datetime.strptime(end[i]+'-07-30', '%Y-%m-%d')
    df1 = df1[ (df1['pub_date']>start_date) & (df1['pub_date']<end_date) ]
    
    if not util.multiclass:
        df1['ofsted'].replace([1, 2], 1,inplace=True)
        df1['ofsted'].replace([3, 4], 0,inplace=True)
            
    df1 = df1[all]
    df1.dropna(axis=0, how='any',inplace=True)
    display(df1['ofsted'].value_counts())
    display(df1.shape)

    X1 = df1[features]
    y1 = df1[target].values

    X1_test_scaled = pd.DataFrame(scalerX.transform(X1), index=X1.index, columns=X1.columns)
    y1_test_scaled = y1

    predicted_school_df={}
    count=0
    df_plot1= pd.DataFrame(columns=['model_name','scorer','scores'])
    for gridName,gridValues in grids.items():
        print(f"stats for {gridName}")
        grid, y_test, y_test_pred = gridValues
        y1_test_pred = grid.best_estimator_.predict(X1_test_scaled)
        new_df=df1[['URN',target]].copy()
        new_df['ofsted_predicted']=y1_test_pred
        predicted_school_df[gridName]=new_df
        print(f"best_estimator_.score: {grid.best_estimator_.score(X1_test_scaled, y1_test_scaled)}")
        
        to_print=""
        for my_scorer in util.myscorers:
            scorer_str=(str(my_scorer).split(" ")[1])
            to_print=f"test {scorer_str}: {my_scorer(y1_test_scaled, y1_test_pred)}"+ ", " + to_print
            df_plot1.loc[count] = [gridName,
                          scorer_str,
                          my_scorer(y1_test_scaled,y1_test_pred)*100]
            count=count+1
        
        util.printmd(to_print)   
        display(pd.crosstab(y1_test_scaled, y1_test_pred, rownames=['Actual'], colnames=['Predicted'], margins=True))
        print(classification_report(y1_test_scaled,y1_test_pred))
    df_plots[start[i]+"-"+end[i]]=df_plot1
    df_school_visit_plots[start[i]+"-"+end[i]]=predicted_school_df

##  Display/plot the scores/metrics of all models for 2015-2016 and 2017-2018 data.

In [None]:
# display test data scores for other years by running our best models on them
for df_plot1 in df_plots.values():
    df1 = df_plot1.pivot(index='model_name', columns='scorer', values='scores')
    df1.plot(kind='bar',figsize=(20, 10))

## Testing the solution : Identify schools that need ofsted manual inspection the most, for 2015-2016 and 2017-2018.

In [None]:
for year,predicted_school_df in df_school_visit_plots.items():
    no_of_schools_to_visit_values=[]
    ofsted_gap_threshold=3
    for model_name,df in predicted_school_df.items():
        df['ofsted_gap']=abs(df['ofsted_predicted']-df['ofsted'])
        no_of_schools_to_visit=(df[df['ofsted_gap']>=ofsted_gap_threshold].shape)[0]
        no_of_schools_to_visit_values=no_of_schools_to_visit_values+[no_of_schools_to_visit]

    df_visit_plot= pd.DataFrame(columns=['model_name','schools_to_visit'])
    df_visit_plot['model_name']=predicted_school_df.keys()
    df_visit_plot['schools_to_visit']=no_of_schools_to_visit_values

    print(f"schools to visit {year}")
    df_visit_plot.set_index('model_name',inplace=True)
    df_visit_plot.plot(kind='bar',figsize=(10, 5))