# Question: 
#### Is there correlation between airports used, Visibility/Ceiling, and ILS or LPV/LP?

1. Lateral Precision Performance with Vertical Guidance (LPV) is defined as an Approach with Vertical Guidance (APV); that is, an instrument approach based on a navigation system that is not required to meet the precision approach standards of ICAO Annex 10 but that provides both course and glidepath deviation information.                                                    

2. Localizer Performance without Vertical Guidance (LP) minima are added in locations where terrain or obstructions do not allow publication of vertically guided LPV minima. Lateral sensitivity increases as an aircraft get closer to the runway (or point in space for helicopters). LP is not a fail-down mode for LPV; LP and LPV are independent. LNAV is not a fail-down mode for LP. LP will not be published with lines of minima that contain approved vertical guidance (i.e. LNAV/VNAV or LPV). 

In [83]:
# import packages
import pandas as pd
import numpy as np
pd.set_option('display.max_columns', None)

In [84]:
# import flights data with metar
fw = pd.read_csv('..//Datasets/NEW/FW_with_metar.csv')

In [85]:
# observe the dataset
fw.head()

Unnamed: 0,tail_number,date,aircraft,origin_code,origin,destination_code,destination,departure,dep_UTC_time,arrival,arr_UTC_time,duration,distance_mi,airport_origin,city_origin,origin_state,country_origin,origin_Latitude,origin_Longitude,airport_destination,city_destination,destination_state,country_destination,destination_Latitude,destination_Longitude,Owner,METAR_origin,METAR_time_origin,report_type_origin,temperature_origin,dew_point_origin,wind_origin,wind_peak_origin,visibility_origin,pressure_origin,press_sea_level_origin,sky_origin,remarks_origin,METAR_destination,METAR_time_destination,report_type_destination,temperature_destination,dew_point_destination,wind_destination,wind_peak_destination,visibility_destination,pressure_destination,press_sea_level_destination,sky_destination,remarks_destination
0,N100KB,2020-11-29,BE9L,KSTP,St Paul Holman Fld (KSTP),KMOT,Minot Intl (KMOT),07:46PM CST,01:46:00,09:53PM CST,03:53:00,2:07,453.428407,St Paul Downtown Holman Field,St Paul - MN,MN,US,44.934502,-93.059998,Minot International Airport,Minot - ND,ND,US,48.259399,-101.279999,"EXECUTIVE AIR TAXI CORPBISMARCK, ND, US(Corpor...",METAR KSTP 290153Z AUTO 13006KT 10SM CLR 01/M0...,,,,,,,,,,,,METAR KMOT 290354Z 29023G34KT 10SM SCT031 SCT0...,,,,,,,,,,,
1,N100KB,2020-12-06,BE9L,KMOT,Minot Intl (KMOT),KMOT,Minot Intl (KMOT),12:07PM CST,18:07:00,12:21PM CST,18:21:00,0:13,0.0,Minot International Airport,Minot - ND,ND,US,48.259399,-101.279999,Minot International Airport,Minot - ND,ND,US,48.259399,-101.279999,"EXECUTIVE AIR TAXI CORPBISMARCK, ND, US(Corpor...",METAR KMOT 061854Z 27004KT 10SM CLR 09/M04 A30...,18:54:00,"routine report, cycle 19 (automatic report)",48.9 F,25.0 F,W at 4 knots,missing,10 miles,1018.0 mb,1019.4 mb,clear,Automated station (type 2),METAR KMOT 061854Z 27004KT 10SM CLR 09/M04 A30...,18:54:00,"routine report, cycle 19 (automatic report)",48.9 F,25.0 F,W at 4 knots,missing,10 miles,1018.0 mb,1019.4 mb,clear,Automated station (type 2)
2,N100KB,2020-12-07,BE9L,KMOT,Minot Intl (KMOT),KFSD,Joe Foss Fld (KFSD),09:54PM CST,03:54:00,11:24PM CST,05:24:00,1:29,389.97921,Minot International Airport,Minot - ND,ND,US,48.259399,-101.279999,Joe Foss Field Airport,Sioux Falls - SD,SD,US,43.582001,-96.741898,"EXECUTIVE AIR TAXI CORPBISMARCK, ND, US(Corpor...",METAR KMOT 070354Z 24012KT 10SM CLR 02/M04 A30...,03:54:00,"routine report, cycle 4 (automatic report)",35.1 F,24.1 F,WSW at 12 knots,missing,10 miles,1016.6 mb,1018.5 mb,clear,Automated station (type 2),METAR KFSD 070556Z 00000KT 10SM CLR M04/M06 A3...,05:56:00,"routine report, cycle 6 (automatic report)",25.0 F,21.9 F,calm,missing,10 miles,1017.3 mb,1019.0 mb,clear,Automated station (type 2); 3-hr pressure chan...
3,N100KB,2020-12-08,BE9L,KRST,Rochester Intl (KRST),KMOT,Minot Intl (KMOT),06:28PM CST,00:28:00,09:26PM CST,03:26:00,2:57,517.521438,Rochester International Airport,Rochester - MN,MN,US,43.908298,-92.5,Minot International Airport,Minot - ND,ND,US,48.259399,-101.279999,"EXECUTIVE AIR TAXI CORPBISMARCK, ND, US(Corpor...",METAR KRST 080054Z 21005KT 6SM BR OVC007 M02/M...,00:54:00,"routine report, cycle 1 (automatic report)",28.9 F,26.1 F,SSW at 5 knots,missing,6 miles,1018.0 mb,1019.5 mb,overcast at 700 feet,Automated station (type 2),METAR KMOT 080354Z 26015KT 10SM CLR 06/M03 A29...,03:54:00,"routine report, cycle 4 (automatic report)",42.1 F,26.1 F,W at 15 knots,missing,10 miles,1014.2 mb,1015.9 mb,clear,Automated station (type 2)
4,N100KB,2020-12-08,BE9L,KMOT,Minot Intl (KMOT),KDVL,Devils Lake Rgnl (KDVL),05:57AM CST,11:57:00,06:25AM CST,12:25:00,0:27,110.009068,Minot International Airport,Minot - ND,ND,US,48.259399,-101.279999,Devils Lake Regional Airport,Devils Lake - ND,ND,US,48.114201,-98.908798,"EXECUTIVE AIR TAXI CORPBISMARCK, ND, US(Corpor...",METAR KMOT 081154Z AUTO 23008KT 10SM CLR 03/M0...,11:54:00,"routine report, cycle 12 (automatic report)",37.9 F,27.0 F,SW at 8 knots,missing,10 miles,1012.5 mb,1014.0 mb,clear,Automated station (type 2); 3-hr pressure chan...,METAR KDVL 081256Z AUTO 27008KT 10SM CLR 01/M0...,12:56:00,"routine report, cycle 13 (automatic report)",33.1 F,27.0 F,W at 8 knots,missing,10 miles,1011.9 mb,970.4 mb,clear,Automated station (type 2)


In [86]:
fw.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 23160 entries, 0 to 23159
Data columns (total 50 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   tail_number                  23160 non-null  object 
 1   date                         23160 non-null  object 
 2   aircraft                     22966 non-null  object 
 3   origin_code                  23142 non-null  object 
 4   origin                       23160 non-null  object 
 5   destination_code             23121 non-null  object 
 6   destination                  23159 non-null  object 
 7   departure                    23160 non-null  object 
 8   dep_UTC_time                 23148 non-null  object 
 9   arrival                      22730 non-null  object 
 10  arr_UTC_time                 22717 non-null  object 
 11  duration                     22680 non-null  object 
 12  distance_mi                  23151 non-null  float64
 13  airport_origin  

In [87]:
# create a subset of data with visibility
fw_sub = fw[['origin_code', 'origin', 'destination_code', 'destination', 'visibility_origin', 'visibility_destination']]

In [88]:
# drop records
fw_sub = fw_sub.loc[fw_sub.origin_code != fw_sub.destination_code]

In [89]:
fw_sub.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 22413 entries, 0 to 23159
Data columns (total 6 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   origin_code             22395 non-null  object
 1   origin                  22413 non-null  object
 2   destination_code        22374 non-null  object
 3   destination             22412 non-null  object
 4   visibility_origin       18009 non-null  object
 5   visibility_destination  17589 non-null  object
dtypes: object(6)
memory usage: 1.2+ MB


In [90]:
# check for null values
fw_sub.isna().sum()

origin_code                 18
origin                       0
destination_code            39
destination                  1
visibility_origin         4404
visibility_destination    4824
dtype: int64

In [91]:
# drop null values and reset index
fw_sub = fw_sub.dropna().reset_index(drop=True)

In [92]:
fw_sub.shape

(16459, 6)

In [93]:
# restracture data
airport_code = []
location = []
visibility = []

for i in range(len(fw_sub)):
    airport_code += [fw_sub.origin_code.iloc[i]] 
    location += [fw_sub.origin.iloc[i]] 
    visibility += [fw_sub.visibility_origin.iloc[i]]
    
    airport_code += [fw_sub.destination_code.iloc[i]] 
    location += [fw_sub.destination.iloc[i]] 
    visibility += [fw_sub.visibility_destination.iloc[i]]

fw = pd.DataFrame(list(zip(airport_code, location, visibility)),
                 columns=['airport_code', 'location', 'visibility'])

In [94]:
# drop the duplicates
fw = fw.drop_duplicates()

In [95]:
# delete 'miles' word from visibility column
fw.loc[:,'visibility'] = fw.visibility.str.rsplit(' ', n=1, expand=True)[0]

In [96]:
# find out the non decimal numbers in visibility
fw[fw.visibility.str.contains('/', na=False)].visibility.unique()

array(['2 1/2', '1 1/2', '1/4', '1/2', '1 3/4', '3/4', 'less than 1/4',
       '1 1/4', '1/8', '3/8', '1/16'], dtype=object)

In [97]:
# replace them to decimal in visibility
fw = fw.replace(regex={'missing':np.nan, 'less than ':''})
fw = fw.replace(regex={'2 1/2': '2.5', 
                      '1 1/2': '1.5', '1 3/4': '1.75', '1 1/4': '1.25',
                      '1/4': '0.25', '1/2': '0.5', '3/4': '0.75', 
                      '1/8': '0.125', '3/8':'0.375', '1/16':'0.0625'})

In [98]:
# convert visibility to numeric
fw.visibility = pd.to_numeric(fw.visibility)

In [99]:
# drop null values and reset index
fw = fw.dropna().reset_index(drop=True)

In [100]:
fw.head()

Unnamed: 0,airport_code,location,visibility
0,KMOT,Minot Intl (KMOT),10.0
1,KFSD,Joe Foss Fld (KFSD),10.0
2,KRST,Rochester Intl (KRST),6.0
3,KDVL,Devils Lake Rgnl (KDVL),10.0
4,US-0571,Williston Basin International Airport (KXWA),10.0


In [103]:
fw = fw.groupby(['airport_code', 'location']).agg(['count','mean', 'median', 'max', 'min', 'std'])\
       .rename({'count':'viz_count','mean':'avg_visibility', 'median':'median_visibility', 'max':'max_visibility',
               'min':'min_visibility', 'std': 'std_visibility'}, axis=1)

In [104]:
# drop multiindex columns name
fw.columns = fw.columns.droplevel()

# reset indexes
fw = fw.reset_index()

In [105]:
fw.head()

Unnamed: 0,airport_code,location,viz_count,avg_visibility,median_visibility,max_visibility,min_visibility,std_visibility
0,BBG,Branson (KBBG),2,8.0,8.0,10.0,6.0,2.828427
1,BDH,Willmar Municipal (KBDH),2,7.0,7.0,10.0,4.0,4.242641
2,BOK,Brookings (KBOK),5,6.05,7.0,10.0,0.25,3.709784
3,CVH,Hollister Muni (KCVH),1,10.0,10.0,10.0,10.0,
4,K06D,Rolla Muni (06D),3,9.0,9.0,10.0,8.0,1.0


In [106]:
# import airport code data
airport_code =  pd.read_csv('..//Datasets/Airports/airport-codes.csv')

In [107]:
# check for duplicates in ident column
airport_code.ident.duplicated().sum()

0

In [108]:
# merge local code 
fw = fw.merge(airport_code[['ident','local_code']], how='left', left_on='airport_code', right_on='ident').drop('ident', axis=1)

In [109]:
fw.head()

Unnamed: 0,airport_code,location,viz_count,avg_visibility,median_visibility,max_visibility,min_visibility,std_visibility,local_code
0,BBG,Branson (KBBG),2,8.0,8.0,10.0,6.0,2.828427,BBG
1,BDH,Willmar Municipal (KBDH),2,7.0,7.0,10.0,4.0,4.242641,BDH
2,BOK,Brookings (KBOK),5,6.05,7.0,10.0,0.25,3.709784,BOK
3,CVH,Hollister Muni (KCVH),1,10.0,10.0,10.0,10.0,,CVH
4,K06D,Rolla Muni (06D),3,9.0,9.0,10.0,8.0,1.0,06D


In [110]:
# load airports infromation
airports = pd.read_csv('..//Datasets/Airports/FAA_Airport_Data.csv')

In [111]:
airports.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4772 entries, 0 to 4771
Data columns (total 17 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   FAA ID               4772 non-null   object 
 1   AIRPORT NAME         4772 non-null   object 
 2   LOCATION             4772 non-null   object 
 3   ST                   4772 non-null   object 
 4   PROCEDURE NAME       4772 non-null   object 
 5   LPV or LP            4772 non-null   object 
 6   DA OR MDA            4772 non-null   int64  
 7   VIS                  4772 non-null   object 
 8   HAT                  4772 non-null   int64  
 9   GPA                  4048 non-null   float64
 10  TCH                  4048 non-null   float64
 11  ILS to Rwy (Y/N)     4772 non-null   object 
 12  Rwy End Cntr         4728 non-null   float64
 13  ILS @ Arpt (Y/N)     4772 non-null   object 
 14  ILS Arpt Cntr (Y/N)  2217 non-null   object 
 15  ILS CAT II or III    187 non-null    o

In [112]:
airports.head()

Unnamed: 0,FAA ID,AIRPORT NAME,LOCATION,ST,PROCEDURE NAME,LPV or LP,DA OR MDA,VIS,HAT,GPA,TCH,ILS to Rwy (Y/N),Rwy End Cntr,ILS @ Arpt (Y/N),ILS Arpt Cntr (Y/N),ILS CAT II or III,WAAS CHANNEL #
0,06N,RANDALL,MIDDLETOWN,NY,RNAV (GPS) RWY 26,LP,1020,1,497,,,N,1.0,N,N,,40030
1,AST,ASTORIA RGNL,ASTORIA,OR,RNAV (GPS) RWY 08,LP,420,1,406,,,N,1.0,Y,Y,,40032
2,C62,KENDALLVILLE MUNI,KENDALLVILLE,IN,RNAV (GPS) RWY 10,LP,1336,1,361,,,N,1.0,N,N,,40035
3,K62,GENE SNYDER,FALMOUTH,KY,RNAV (GPS) RWY 21,LP,1340,1,455,,,N,1.0,N,N,,40036
4,ORS,ORCAS ISLAND,EASTSOUND,WA,RNAV (GPS) RWY 16,LP,340,1,305,,,N,1.0,N,N,,40038


In [113]:
# observe the dataset
airports.isna().sum()

FAA ID                    0
AIRPORT NAME              0
LOCATION                  0
ST                        0
PROCEDURE NAME            0
LPV or LP                 0
DA OR MDA                 0
VIS                       0
HAT                       0
GPA                     724
TCH                     724
ILS to Rwy (Y/N)          0
Rwy End Cntr             44
ILS @ Arpt (Y/N)          0
ILS Arpt Cntr (Y/N)    2555
ILS CAT II or III      4585
WAAS CHANNEL #            0
dtype: int64

In [114]:
# drop columns which has a lot of null values
airports = airports.drop(['GPA','TCH', 'ILS Arpt Cntr (Y/N)', 'ILS CAT II or III'], axis=1)

In [115]:
# drop duplicates in FAA ID column
airports = airports.drop_duplicates(subset='FAA ID')

In [116]:
airports.head()

Unnamed: 0,FAA ID,AIRPORT NAME,LOCATION,ST,PROCEDURE NAME,LPV or LP,DA OR MDA,VIS,HAT,ILS to Rwy (Y/N),Rwy End Cntr,ILS @ Arpt (Y/N),WAAS CHANNEL #
0,06N,RANDALL,MIDDLETOWN,NY,RNAV (GPS) RWY 26,LP,1020,1,497,N,1.0,N,40030
1,AST,ASTORIA RGNL,ASTORIA,OR,RNAV (GPS) RWY 08,LP,420,1,406,N,1.0,Y,40032
2,C62,KENDALLVILLE MUNI,KENDALLVILLE,IN,RNAV (GPS) RWY 10,LP,1336,1,361,N,1.0,N,40035
3,K62,GENE SNYDER,FALMOUTH,KY,RNAV (GPS) RWY 21,LP,1340,1,455,N,1.0,N,40036
4,ORS,ORCAS ISLAND,EASTSOUND,WA,RNAV (GPS) RWY 16,LP,340,1,305,N,1.0,N,40038


In [117]:
airports[airports.VIS.str.contains('/', na=False)].VIS.unique()

array(['1 1/4', ' 3/4', '1 1/2', ' 1/2', '1 3/4', '1 1/8', ' 7/8',
       '1 3/8', '2 1/2', '1 5/8', '2 1/4', '1 7/8', ' 5/8', ' 3/5',
       ' 2/3', '2 3/4'], dtype=object)

In [118]:
airports = airports.replace(regex={'2 1/2': '2.5', '1 1/8': '1.125', ' 7/8': '0.875', '1 3/8': '1.375', '2 3/4':'2.75',
                      '1 1/2': '1.5', '1 3/4': '1.75', '1 1/4': '1.25', '1 5/8': '1.625', ' 3/5':'0.6', '2 1/4': '2.25',
                      '1/4': '0.25', '1/2': '0.5', '3/4': '0.75', '1 7/8': '1.875', 
                      '1/8': '0.125', '3/8':'0.375', '1/16':'0.0625',' 5/8':'0.625', ' 2/3': '0.67'})

In [119]:
airports.shape

(2215, 13)

In [120]:
airports['FAA ID'].isin(fw.local_code).sum()

888

In [121]:
# merge by local code
fw = fw.merge(airports, how='inner', left_on='local_code', right_on='FAA ID')

In [122]:
fw.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 890 entries, 0 to 889
Data columns (total 22 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   airport_code       890 non-null    object 
 1   location           890 non-null    object 
 2   viz_count          890 non-null    int64  
 3   avg_visibility     890 non-null    float64
 4   median_visibility  890 non-null    float64
 5   max_visibility     890 non-null    float64
 6   min_visibility     890 non-null    float64
 7   std_visibility     515 non-null    float64
 8   local_code         890 non-null    object 
 9   FAA ID             890 non-null    object 
 10  AIRPORT NAME       890 non-null    object 
 11  LOCATION           890 non-null    object 
 12  ST                 890 non-null    object 
 13  PROCEDURE NAME     890 non-null    object 
 14  LPV or LP          890 non-null    object 
 15  DA OR MDA          890 non-null    int64  
 16  VIS                890 non

In [123]:
fw.isna().sum()

airport_code           0
location               0
viz_count              0
avg_visibility         0
median_visibility      0
max_visibility         0
min_visibility         0
std_visibility       375
local_code             0
FAA ID                 0
AIRPORT NAME           0
LOCATION               0
ST                     0
PROCEDURE NAME         0
LPV or LP              0
DA OR MDA              0
VIS                    0
HAT                    0
ILS to Rwy (Y/N)       0
Rwy End Cntr           3
ILS @ Arpt (Y/N)       0
WAAS CHANNEL #         0
dtype: int64

In [125]:
# sort by count to observe more diverse airports
fw = fw.sort_values('viz_count' , ascending=False)

# reset indexes
fw = fw.reset_index(drop=True)

In [126]:
# fw = fw.dropna(subset=fw.columns[9:])

In [127]:
fw.shape

(890, 22)

In [128]:
fw.head()

Unnamed: 0,airport_code,location,viz_count,avg_visibility,median_visibility,max_visibility,min_visibility,std_visibility,local_code,FAA ID,AIRPORT NAME,LOCATION,ST,PROCEDURE NAME,LPV or LP,DA OR MDA,VIS,HAT,ILS to Rwy (Y/N),Rwy End Cntr,ILS @ Arpt (Y/N),WAAS CHANNEL #
0,KSLC,Salt Lake City Intl (KSLC),18,3.534722,2.25,10.0,0.125,3.200358,SLC,SLC,SALT LAKE CITY INTL,SALT LAKE CITY,UT,RNAV (GPS) RWY 35,LP,4560,2400,336,N,1.0,Y,77722
1,KFSD,Joe Foss Fld (KFSD),18,3.541667,2.25,10.0,0.25,3.19265,FSD,FSD,JOE FOSS FIELD,SIOUX FALLS,SD,RNAV (GPS) RWY 33,LPV,1729,1,305,N,1.0,Y,40135
2,KCEC,Jack Mc Namara Fld (KCEC),17,3.735294,2.5,10.0,0.25,3.180102,CEC,CEC,JACK MC NAMARA FIELD,CRESCENT CITY,CA,RNAV (GPS) RWY 36,LPV,301,1,250,N,1.0,Y,72927
3,KGRR,Gerald R Ford Intl (KGRR),16,3.859375,2.75,10.0,0.25,3.241616,GRR,GRR,GERALD R FORD INTL,GRAND RAPIDS,MI,RNAV (GPS) RWY 08R,LPV,994,2400,200,Y,1.0,Y,40113
4,KIDA,Idaho Falls Rgnl (KIDA),16,3.875,2.75,10.0,0.25,3.230067,IDA,IDA,IDAHO FALLS RGNL,IDAHO FALLS,ID,RNAV (GPS) Y RWY 21,LPV,4935,2400,200,Y,1.0,Y,40111


In [129]:
fw.drop(['local_code','FAA ID', 'location'], axis=1).to_csv('../Datasets/NEW/FW_airport_analysis.csv', index=False)

In [141]:
ils_air = fw[['ILS @ Arpt (Y/N)', 'avg_visibility']].replace(regex={"Y":1, "N":0})

In [185]:
ils_air = ils_air.loc[ils_air.avg_visibility < 10]

In [188]:
from sklearn import tree
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.metrics import precision_score, recall_score

In [189]:
# Split X and Y into training and testing datasets
train_X, test_X, train_Y, test_Y = train_test_split(ils_air['avg_visibility'], ils_air['ILS @ Arpt (Y/N)'], test_size=0.25, random_state=42)

# Ensure training dataset has only 75% of original X data
print(train_X.shape[0] / ils_air['avg_visibility'].shape[0])

# Ensure testing dataset has only 25% of original X data
print(test_Y.shape[0] / ils_air['ILS @ Arpt (Y/N)'].shape[0])

0.7490494296577946
0.2509505703422053


In [190]:
# Initialize decision tree classifier
mytree = tree.DecisionTreeClassifier()

# Fit the decision tree on training data
mytree.fit(pd.DataFrame(train_X), train_Y)

# Predict churn labels on testing data
pred_test_Y = mytree.predict(pd.DataFrame(test_X))

# Calculate accuracy score on testing data
test_accuracy = accuracy_score(test_Y, pred_test_Y)

# Print test accuracy
print('Test accuracy:', round(test_accuracy, 4))

Test accuracy: 0.5985


In [191]:
depth_list = [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]

depth_tuning = np.array([[ 2.,  0.],
       [ 3.,  0.],
       [ 4.,  0.],
       [ 5.,  0.],
       [ 6.,  0.],
       [ 7.,  0.],
       [ 8.,  0.],
       [ 9.,  0.],
       [10.,  0.],
       [11.,  0.],
       [12.,  0.],
       [13.,  0.],
       [14.,  0.]])

In [197]:
# Run a for loop over the range of depth list length
for index in range(0, len(depth_list)):
  # Initialize and fit decision tree with the `max_depth` candidate
  mytree = tree.DecisionTreeClassifier(max_depth=depth_list[index])
  mytree.fit(pd.DataFrame(train_X), train_Y)
  # Predict churn on the testing data
  pred_test_Y = mytree.predict(pd.DataFrame(test_X))
  # Calculate the recall score 
  depth_tuning[index,1] = accuracy_score(test_Y, pred_test_Y)

# Name the columns and print the array as pandas DataFrame
col_names = ['Max_Depth','accuracy_score']
print(pd.DataFrame(depth_tuning, columns=col_names))

    Max_Depth  accuracy_score
0         2.0        0.606061
1         3.0        0.613636
2         4.0        0.606061
3         5.0        0.590909
4         6.0        0.590909
5         7.0        0.613636
6         8.0        0.606061
7         9.0        0.598485
8        10.0        0.598485
9        11.0        0.598485
10       12.0        0.598485
11       13.0        0.598485
12       14.0        0.598485


In [198]:
# Initialize decision tree classifier
mytree = tree.DecisionTreeClassifier(max_depth=3)

# Fit the decision tree on training data
mytree.fit(pd.DataFrame(train_X), train_Y)

# Predict churn labels on testing data
pred_test_Y = mytree.predict(pd.DataFrame(test_X))

# Calculate accuracy score on testing data
test_accuracy = accuracy_score(test_Y, pred_test_Y)

# Print test accuracy
print('Test accuracy:', round(test_accuracy, 4))

Test accuracy: 0.6136
