# Text Mining using NLTK 

<img src="image.jpg" width="600"/>

### Introduction and Background
This is a customer data for a Telecommunication Company with customer personal and complaints data. The Telcom comapny wants to predict the whether the customer will cancel or continue the current subscription based on his/her personal and complaint records. This is a classification problem where we are also incorporating the complaint data (text data) to our classification model based on perosnal information of the customer.

### Implementation
Our goal is to accurately determine whether the customer cancels the subscription or not. 
- We will extract useful information from the "comments.csv" text data which stores the complaint information from these customers. We need to transform the semi-structured data into a usable matrix in order to incorporate with the customers personal data.
- Secondly, we have to preprocess the Customer data which may contain categoriacl data points as well.
- Combine transformed text data and numerical personal data by matching the Customer ID on both datatsets.
- Feature Selection using **Filter** and **Wrapper** methods
- Model Development & Evaluation : **Decision Tree, Random Forest & Gradient Boosting**

#### References: <br>
- [SckitLearn Library](https://scikit-learn.org/stable/index.html) <br />
- [Pandas Library : DataFrame](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html) <br />
- [Natural Language Toolkit Library](https://www.nltk.org/) <br>
- [Filter Method : SelectKbest Feature Selection](https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.SelectKBest.html)<br>
- [Wrapper Method : SelectFromModel Feature Selection](https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.SelectFromModel.html)

#### Notebook Index: 
- [Data Import](#DataImport) <br />
- [Natural Language Processing](#nlp) <br />
    - [Stemming](#stem) <br />
    - [Count Vectorizer](#count) <br />
    - [TDIDF Transformer](#tdidf) <br />
- [Data Preprocessing : Customer Data](#customer) <br />
- [Feature Selection & Model Development](#feature) <br />
    - [SelectKBest Filter Method](#filter) <br />
    - [SelectFromModel Wrapper Method](#wrapper) <br />
- [Model Evaluation](#evaluate) <br />

In [1]:
#Libraries
# Data Manipulation
import pandas as pd
import numpy as np

# Data Visualization
import matplotlib.pyplot as plt
import plotly.express as px
import seaborn as sn

# Imbalanced Data
from imblearn.over_sampling import RandomOverSampler,SMOTE
from imblearn.under_sampling import RandomUnderSampler,NearMiss

# Machine Learning 
import sklearn
from sklearn import metrics
from sklearn.metrics import accuracy_score,classification_report,confusion_matrix,roc_auc_score,roc_curve,mean_squared_error
from sklearn.model_selection import train_test_split

from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier,GradientBoostingClassifier
from sklearn.svm import LinearSVC,SVC
from sklearn.neighbors import KNeighborsClassifier
from sklearn.neural_network import MLPClassifier

# Hyper Parameter Tuning
from sklearn.model_selection import RandomizedSearchCV,GridSearchCV

# Feature Selection
from sklearn.feature_selection import SelectKBest,SelectFromModel

# Stacking
from vecstack import stacking

import warnings
warnings.filterwarnings("ignore")

# Text Mining
import nltk

#Tokenizer 
from nltk.tokenize import word_tokenize

#Stemmer
from nltk.stem.lancaster import LancasterStemmer
from nltk.stem.porter import PorterStemmer
from nltk.stem.snowball import SnowballStemmer

#sklearn TFIDF and vectorizer
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.feature_extraction.text import CountVectorizer

In [3]:
# Function to evaluate the model with/without feaeture selection

def Evaluate_model_fs(model,XTrain,YTrain,selectedfeatures=[]):
    
    if selectedfeatures:
        XTrain1=XTrain[selectedfeatures]
    else:
        XTrain1=XTrain
    #Train_Test_Split
    xtrain,xtest,ytrain,ytest=train_test_split(XTrain1,YTrain,test_size=0.2)
    model.fit(xtrain,ytrain)
    ypred=model.predict(xtest)
    
    print("Accuracy Score:",accuracy_score(ytest,ypred))
    print("Classification Report :\n",classification_report(ytest,ypred))
    

## Data Import <a id='DataImport'>

In [4]:
Comments_df=pd.read_csv('Comments.csv')
Customers_df=pd.read_csv('Customers.csv')

Comments_df.head()

Unnamed: 0,ID,Comments
0,1309,Does not like the way the phone works. It is t...
1,3556,Wanted to know the nearest store location. Wan...
2,2230,Wants to know how to do text messaging. Referr...
3,2312,Asked how to disable call waiting. referred hi...
4,3327,Needs help learning how to use the phone. I su...


In [5]:
Customers_df.head()

Unnamed: 0,ID,Sex,Status,Children,Est_Income,Car_Owner,Usage,Age,RatePlan,LongDistance,International,Local,Dropped,Paymethod,LocalBilltype,LongDistanceBilltype,TARGET
0,1,F,S,1,38000.0,N,229.64,24.393333,3,23.56,0.0,206.08,0,CC,Budget,Intnl_discount,Cancelled
1,6,M,M,2,29616.0,N,75.29,49.426667,2,29.78,0.0,45.5,0,CH,FreeLocal,Standard,Current
2,8,M,M,0,19732.8,N,47.25,50.673333,3,24.81,0.0,22.44,0,CC,FreeLocal,Standard,Current
3,11,M,S,2,96.33,N,59.01,56.473333,1,26.13,0.0,32.88,1,CC,Budget,Standard,Current
4,14,F,M,2,52004.8,N,28.14,25.14,1,5.03,0.0,23.11,0,CH,Budget,Intnl_discount,Cancelled


## Natural Langauge Processing <a id='nlp'> </a>
#### Getting value from comments dataframe
Tokenize --> Stemmer --> Join back --> Vectorize --> TDIDF Transformer 

**TOKENIZE** : Break sentences into words

In [6]:
Comments_df['CommentsTokenize']=Comments_df['Comments'].apply(word_tokenize)

In [7]:
Comments_df.head()

Unnamed: 0,ID,Comments,CommentsTokenize
0,1309,Does not like the way the phone works. It is t...,"[Does, not, like, the, way, the, phone, works,..."
1,3556,Wanted to know the nearest store location. Wan...,"[Wanted, to, know, the, nearest, store, locati..."
2,2230,Wants to know how to do text messaging. Referr...,"[Wants, to, know, how, to, do, text, messaging..."
3,2312,Asked how to disable call waiting. referred hi...,"[Asked, how, to, disable, call, waiting, ., re..."
4,3327,Needs help learning how to use the phone. I su...,"[Needs, help, learning, how, to, use, the, pho..."


**STEM** : Reducing word to its word stem <a id='nlp'> </a>

In [8]:
StemmedComments_Snowball=pd.DataFrame();
stemmer=SnowballStemmer('english')
StemmedComments_Snowball['CommentsStemed']=Comments_df['CommentsTokenize'].apply(lambda x:[stemmer.stem(y) for y in x])
StemmedComments_Snowball.head()

Unnamed: 0,CommentsStemed
0,"[doe, not, like, the, way, the, phone, work, ...."
1,"[want, to, know, the, nearest, store, locat, ...."
2,"[want, to, know, how, to, do, text, messag, .,..."
3,"[ask, how, to, disabl, call, wait, ., refer, h..."
4,"[need, help, learn, how, to, use, the, phone, ..."


In [25]:
StemmedComments_Porter=pd.DataFrame();
stemmer=PorterStemmer()
StemmedComments_Porter['CommentsStemed']=Comments_df['CommentsTokenize'].apply(lambda x:[stemmer.stem(y) for y in x])
StemmedComments_Porter.head()

Unnamed: 0,CommentsStemed
0,"[doe, not, like, the, way, the, phone, work, ...."
1,"[want, to, know, the, nearest, store, locat, ...."
2,"[want, to, know, how, to, do, text, messag, .,..."
3,"[ask, how, to, disabl, call, wait, ., refer, h..."
4,"[need, help, learn, how, to, use, the, phone, ..."


In [26]:
StemmedComments_Lancaster=pd.DataFrame();
stemmer=LancasterStemmer()
StemmedComments_Lancaster['CommentsStemed']=Comments_df['CommentsTokenize'].apply(lambda x:[stemmer.stem(y) for y in x])
StemmedComments_Lancaster.head()

Unnamed: 0,CommentsStemed
0,"[doe, not, lik, the, way, the, phon, work, ., ..."
1,"[want, to, know, the, nearest, stor, loc, ., w..."
2,"[want, to, know, how, to, do, text, mess, ., r..."
3,"[ask, how, to, dis, cal, wait, ., refer, him, ..."
4,"[nee, help, learn, how, to, us, the, phon, ., ..."


Lancaster Stemmer is not performing well as it breaks down words like 'need' to 'nee'. <br> Snowball and Porter Stemmer perform well! We take Snowball Stemmer for further analysis

**Join Stemmed Words**

In [9]:
StemmedComments_Snowball['CommentsStemed']=StemmedComments_Snowball['CommentsStemed'].apply(lambda x:" ".join(x))

In [10]:
StemmedComments_Snowball.head()

Unnamed: 0,CommentsStemed
0,doe not like the way the phone work . it is to...
1,want to know the nearest store locat . want to...
2,want to know how to do text messag . refer him...
3,ask how to disabl call wait . refer him to web...
4,need help learn how to use the phone . i sugge...


**COUNT VECTORIZER** : Removes stopwords from speicifed language and generates a list of relevant words with counts <a id='count'> </a>

In [11]:
CountVectorizer_Snowball=CountVectorizer(stop_words='english',lowercase=False)
TD_Counts=CountVectorizer_Snowball.fit_transform(StemmedComments_Snowball['CommentsStemed'])
CountVectorizer_Snowball.get_feature_names()

['3399',
 '3g',
 'abysm',
 'access',
 'accessori',
 'adapt',
 'add',
 'addit',
 'additon',
 'address',
 'adit',
 'adress',
 'advertis',
 'afraid',
 'alway',
 'angel',
 'angri',
 'ani',
 'anoth',
 'anyth',
 'anytim',
 'area',
 'asap',
 'ask',
 'bad',
 'basic',
 'bateri',
 'batteri',
 'becaus',
 'believ',
 'better',
 'bigger',
 'book',
 'bought',
 'brain',
 'bring',
 'built',
 'busi',
 'button',
 'buy',
 'cancel',
 'cancer',
 'car',
 'care',
 'carrier',
 'caus',
 'cc',
 'cell',
 'certain',
 'chang',
 'charg',
 'charger',
 'check',
 'chip',
 'citi',
 'claim',
 'cleariti',
 'cold',
 'comapr',
 'compani',
 'compar',
 'competit',
 'complain',
 'complaint',
 'concept',
 'connect',
 'consisit',
 'consist',
 'constan',
 'contact',
 'continu',
 'contract',
 'correct',
 'cost',
 'coupl',
 'cover',
 'coverag',
 'creat',
 'credit',
 'cstmer',
 'cstmr',
 'current',
 'cust',
 'custom',
 'customr',
 'date',
 'day',
 'dead',
 'decent',
 'defect',
 'deo',
 'did',
 'die',
 'differ',
 'difficult',
 'digit

In [12]:
print(pd.DataFrame(TD_Counts.toarray(),columns=CountVectorizer_Snowball.get_feature_names()))

      3399  3g  abysm  access  accessori  adapt  add  addit  additon  address  \
0        0   0      0       0          0      0    0      0        0        0   
1        0   0      0       0          1      0    0      0        0        0   
2        0   0      0       0          0      0    0      0        0        0   
3        0   0      0       0          0      0    0      0        0        0   
4        0   0      0       0          0      0    0      0        0        0   
...    ...  ..    ...     ...        ...    ...  ...    ...      ...      ...   
2065     0   0      0       0          0      0    0      0        0        0   
2066     0   0      0       0          0      0    0      0        0        0   
2067     0   0      0       0          0      0    0      0        0        0   
2068     0   0      0       0          0      0    0      0        0        1   
2069     0   0      0       0          0      0    0      0        0        0   

      ...  wish  wll  wold 

**TRANSFORMER/ TDIDF** : Assign weights based on count in document matrix <a id='tdidf'> </a>

In [13]:
Transformer=TfidfTransformer()
TD_Transform=Transformer.fit_transform(TD_Counts)
# Transformer.get_feature_names_out()
TD_Transform=pd.DataFrame(TD_Transform.toarray(),columns=CountVectorizer_Snowball.get_feature_names())
TD_Transform=pd.concat([Comments_df["ID"],TD_Transform],axis=1)
TD_Transform.head()

Unnamed: 0,ID,3399,3g,abysm,access,accessori,adapt,add,addit,additon,...,wish,wll,wold,work,wors,worst,wrong,xvyx,year,york
0,1309,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.209678,0.0,0.0,0.0,0.0,0.0,0.0
1,3556,0.0,0.0,0.0,0.0,0.27568,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,2230,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,2312,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,3327,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [14]:
TD_Transform.shape

(2070, 355)

### Data Preprocessing Customer Data  <a id='customer'> </a>
Working on Categorical Customer Data: Using get_dummies()

In [16]:
Customers_df.head()

Unnamed: 0,ID,Sex,Status,Children,Est_Income,Car_Owner,Usage,Age,RatePlan,LongDistance,International,Local,Dropped,Paymethod,LocalBilltype,LongDistanceBilltype,TARGET
0,1,F,S,1,38000.0,N,229.64,24.393333,3,23.56,0.0,206.08,0,CC,Budget,Intnl_discount,Cancelled
1,6,M,M,2,29616.0,N,75.29,49.426667,2,29.78,0.0,45.5,0,CH,FreeLocal,Standard,Current
2,8,M,M,0,19732.8,N,47.25,50.673333,3,24.81,0.0,22.44,0,CC,FreeLocal,Standard,Current
3,11,M,S,2,96.33,N,59.01,56.473333,1,26.13,0.0,32.88,1,CC,Budget,Standard,Current
4,14,F,M,2,52004.8,N,28.14,25.14,1,5.03,0.0,23.11,0,CH,Budget,Intnl_discount,Cancelled


In [17]:
XTrain=Customers_df.drop('TARGET',axis=1).copy()
YTrain=Customers_df['TARGET'].copy()

# Categorical Data in Customers_df
categorical_cols=XTrain.select_dtypes(include='object').columns.tolist()
categorical_cols

['Sex',
 'Status',
 'Car_Owner',
 'Paymethod',
 'LocalBilltype',
 'LongDistanceBilltype']

In [18]:
XTrain

Unnamed: 0,ID,Sex,Status,Children,Est_Income,Car_Owner,Usage,Age,RatePlan,LongDistance,International,Local,Dropped,Paymethod,LocalBilltype,LongDistanceBilltype
0,1,F,S,1,38000.00,N,229.64,24.393333,3,23.56,0.00,206.08,0,CC,Budget,Intnl_discount
1,6,M,M,2,29616.00,N,75.29,49.426667,2,29.78,0.00,45.50,0,CH,FreeLocal,Standard
2,8,M,M,0,19732.80,N,47.25,50.673333,3,24.81,0.00,22.44,0,CC,FreeLocal,Standard
3,11,M,S,2,96.33,N,59.01,56.473333,1,26.13,0.00,32.88,1,CC,Budget,Standard
4,14,F,M,2,52004.80,N,28.14,25.140000,1,5.03,0.00,23.11,0,CH,Budget,Intnl_discount
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2065,3821,F,S,0,78851.30,N,29.04,48.373333,4,0.37,0.00,28.66,0,CC,FreeLocal,Standard
2066,3822,F,S,1,17540.70,Y,36.20,62.786667,1,22.17,0.57,13.45,0,Auto,Budget,Standard
2067,3823,F,M,0,83891.90,Y,74.40,61.020000,4,28.92,0.00,45.47,0,CH,Budget,Standard
2068,3824,F,M,2,28220.80,N,38.95,38.766667,4,26.49,0.00,12.46,0,CC,FreeLocal,Standard


In [19]:
pd.set_option('display.max_columns', None)
XTrain=pd.get_dummies(XTrain,columns=categorical_cols)
XTrain

Unnamed: 0,ID,Children,Est_Income,Usage,Age,RatePlan,LongDistance,International,Local,Dropped,Sex_F,Sex_M,Status_D,Status_M,Status_S,Car_Owner_N,Car_Owner_Y,Paymethod_Auto,Paymethod_CC,Paymethod_CH,LocalBilltype_Budget,LocalBilltype_FreeLocal,LongDistanceBilltype_Intnl_discount,LongDistanceBilltype_Standard
0,1,1,38000.00,229.64,24.393333,3,23.56,0.00,206.08,0,1,0,0,0,1,1,0,0,1,0,1,0,1,0
1,6,2,29616.00,75.29,49.426667,2,29.78,0.00,45.50,0,0,1,0,1,0,1,0,0,0,1,0,1,0,1
2,8,0,19732.80,47.25,50.673333,3,24.81,0.00,22.44,0,0,1,0,1,0,1,0,0,1,0,0,1,0,1
3,11,2,96.33,59.01,56.473333,1,26.13,0.00,32.88,1,0,1,0,0,1,1,0,0,1,0,1,0,0,1
4,14,2,52004.80,28.14,25.140000,1,5.03,0.00,23.11,0,1,0,0,1,0,1,0,0,0,1,1,0,1,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2065,3821,0,78851.30,29.04,48.373333,4,0.37,0.00,28.66,0,1,0,0,0,1,1,0,0,1,0,0,1,0,1
2066,3822,1,17540.70,36.20,62.786667,1,22.17,0.57,13.45,0,1,0,0,0,1,0,1,1,0,0,1,0,0,1
2067,3823,0,83891.90,74.40,61.020000,4,28.92,0.00,45.47,0,1,0,0,1,0,0,1,0,0,1,1,0,0,1
2068,3824,2,28220.80,38.95,38.766667,4,26.49,0.00,12.46,0,1,0,0,1,0,1,0,0,1,0,0,1,0,1


## Merge Customer Data and Customer Comments

In [20]:
XTrain=XTrain.merge(TD_Transform,on='ID')
XTrain.drop('ID',axis=1,inplace=True)
YTrain.value_counts()

Current      1266
Cancelled     804
Name: TARGET, dtype: int64

In [24]:
pd.set_option('display.max_columns', None)
XTrain

Unnamed: 0,Children,Est_Income,Usage,Age,RatePlan,LongDistance,International,Local,Dropped,Sex_F,Sex_M,Status_D,Status_M,Status_S,Car_Owner_N,Car_Owner_Y,Paymethod_Auto,Paymethod_CC,Paymethod_CH,LocalBilltype_Budget,LocalBilltype_FreeLocal,LongDistanceBilltype_Intnl_discount,LongDistanceBilltype_Standard,3399,3g,abysm,access,accessori,adapt,add,addit,additon,address,adit,adress,advertis,afraid,alway,angel,angri,ani,anoth,anyth,anytim,area,asap,ask,bad,basic,bateri,batteri,becaus,believ,better,bigger,book,bought,brain,bring,built,busi,button,buy,cancel,cancer,car,care,carrier,caus,cc,cell,certain,chang,charg,charger,check,chip,citi,claim,cleariti,cold,comapr,compani,compar,competit,complain,complaint,concept,connect,consisit,consist,constan,contact,continu,contract,correct,cost,coupl,cover,coverag,creat,credit,cstmer,cstmr,current,cust,custom,customr,date,day,dead,decent,defect,deo,did,die,differ,difficult,digiti,direct,disabl,doe,don,dont,drop,dure,easier,effect,encount,end,enemi,equip,everytim,everywher,evrey,exact,expect,expir,explain,facepl,fals,famili,featur,fed,figur,fine,fix,forev,forward,friend,function,furthermor,futur,gave,goat,good,great,gsm,handset,happi,hard,hate,hear,heard,help,higher,highway,hochi,hole,home,hope,horribl,hous,implement,improv,inadequ,includ,info,inform,ing,internet,intersect,issu,june,just,kid,kno,know,lame,later,lctn,learn,leroy,like,line,list,local,locat,locatn,long,los,lost,lot,love,major,make,manag,mani,manual,market,mean,messag,metropolitian,minut,misl,mistak,model,momma,mr,napeleon,near,nearest,need,network,new,news,notic,number,numer,offer,old,om,open,option,ori,ot,outbound,pass,pay,pda,peopl,perform,person,phone,piec,plan,pleas,point,polici,poor,possibl,probabl,problem,proper,provid,provis,purpos,rate,rater,realiz,realli,reason,receiv,recept,recption,reenter,refer,relat,rep,replac,respect,result,rid,right,ring,roam,roll,rubbish,rude,said,sale,say,screen,self,send,servic,shitti,shut,sign,signal,signific,simm,simpli,sinc,site,slow,sold,someon,sometim,soon,speak,speed,start,static,stole,store,stuff,stupid,substant,subtract,suck,suggest,supervisor,support,sure,surpris,suspect,suspend,switch,teach,technic,tell,terribl,test,text,think,thought,ticket,till,time,tire,today,toilet,told,tone,tower,transeff,transf,transfer,travel,tri,trust,turn,uncomfort,understand,unhappi,unlimit,unreli,unwil,upset,usag,use,useless,valu,veri,vm,wa,wait,want,wast,way,weak,web,websit,week,whi,wife,wish,wll,wold,work,wors,worst,wrong,xvyx,year,york
0,1,38000.00,229.64,24.393333,3,23.56,0.00,206.08,0,1,0,0,0,1,1,0,0,1,0,1,0,1,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.344388,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.336819,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.32972,0.0,0.00000,0.0,0.0,0.0,0.0,0.0,0.472239,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.327212,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.472239,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.325802,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0
1,2,29616.00,75.29,49.426667,2,29.78,0.00,45.50,0,0,1,0,1,0,1,0,0,0,1,0,1,0,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.320855,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.37653,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.310995,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.243227,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00000,0.0,0.37653,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.271105,0.0,0.0,0.0,0.0,0.0,0.347885,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.37653,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.348322,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0
2,0,19732.80,47.25,50.673333,3,24.81,0.00,22.44,0,0,1,0,1,0,1,0,0,1,0,0,1,0,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.320855,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.37653,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.310995,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.243227,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00000,0.0,0.37653,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.271105,0.0,0.0,0.0,0.0,0.0,0.347885,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.37653,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.348322,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0
3,2,96.33,59.01,56.473333,1,26.13,0.00,32.88,1,0,1,0,0,1,1,0,0,1,0,1,0,0,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.320855,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.37653,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.310995,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.243227,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00000,0.0,0.37653,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.271105,0.0,0.0,0.0,0.0,0.0,0.347885,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.37653,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.348322,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0
4,2,52004.80,28.14,25.140000,1,5.03,0.00,23.11,0,1,0,0,1,0,1,0,0,0,1,1,0,1,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.320855,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.37653,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.310995,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.243227,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00000,0.0,0.37653,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.271105,0.0,0.0,0.0,0.0,0.0,0.347885,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.37653,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.348322,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2065,0,78851.30,29.04,48.373333,4,0.37,0.00,28.66,0,1,0,0,0,1,1,0,0,1,0,0,1,0,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.466708,0.000000,0.0,0.0,0.0,0.443664,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.369504,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.264422,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00000,0.0,0.00000,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.453766,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.214868,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.356121,0.0,0.0,0.0,0.0,0.0,0.0
2066,1,17540.70,36.20,62.786667,1,22.17,0.57,13.45,0,1,0,0,0,1,0,1,1,0,0,1,0,0,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.466708,0.000000,0.0,0.0,0.0,0.443664,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.369504,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.264422,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00000,0.0,0.00000,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.453766,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.214868,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.356121,0.0,0.0,0.0,0.0,0.0,0.0
2067,0,83891.90,74.40,61.020000,4,28.92,0.00,45.47,0,1,0,0,1,0,0,1,0,0,1,1,0,0,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.466708,0.000000,0.0,0.0,0.0,0.443664,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.369504,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.264422,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00000,0.0,0.00000,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.453766,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.214868,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.356121,0.0,0.0,0.0,0.0,0.0,0.0
2068,2,28220.80,38.95,38.766667,4,26.49,0.00,12.46,0,1,0,0,1,0,1,0,0,1,0,0,1,0,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.466708,0.000000,0.0,0.0,0.0,0.443664,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.369504,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.264422,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00000,0.0,0.00000,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.453766,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.214868,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.356121,0.0,0.0,0.0,0.0,0.0,0.0


## Feature Selection & Model Development  <a id='feature'> </a>
#### SelectKBest (Filter Method)  <a id='filter'> </a>

In [34]:
XTrain.shape

(2070, 377)

No SelectKBest Feature Selection

In [35]:
#Random Forest Classifier
model01=RandomForestClassifier()
Evaluate_model_fs(model01,XTrain,YTrain)

Accuracy Score: 0.8792270531400966
Classification Report :
               precision    recall  f1-score   support

   Cancelled       0.83      0.87      0.85       163
     Current       0.91      0.88      0.90       251

    accuracy                           0.88       414
   macro avg       0.87      0.88      0.87       414
weighted avg       0.88      0.88      0.88       414



In [36]:
#Gradient Boosting Classifier
model02=GradientBoostingClassifier()
Evaluate_model_fs(model02,XTrain,YTrain)

Accuracy Score: 0.8743961352657005
Classification Report :
               precision    recall  f1-score   support

   Cancelled       0.84      0.86      0.85       166
     Current       0.90      0.89      0.89       248

    accuracy                           0.87       414
   macro avg       0.87      0.87      0.87       414
weighted avg       0.88      0.87      0.87       414



In [37]:
#Decision Tree Classifier
model03=DecisionTreeClassifier()
Evaluate_model_fs(model03,XTrain,YTrain)

Accuracy Score: 0.8792270531400966
Classification Report :
               precision    recall  f1-score   support

   Cancelled       0.80      0.90      0.85       153
     Current       0.93      0.87      0.90       261

    accuracy                           0.88       414
   macro avg       0.87      0.88      0.87       414
weighted avg       0.89      0.88      0.88       414



#### K Best 25 Features

In [38]:
# Select K Best
FilterKBest=SelectKBest(k=25)
Output1=FilterKBest.fit_transform(XTrain,YTrain)

print(FilterKBest.get_feature_names_out())
BestFeatures=FilterKBest.get_feature_names_out().tolist()

['Children' 'Sex_M' 'Status_M' 'Status_S' 'accessori' 'asap' 'better'
 'buy' 'expect' 'featur' 'forward' 'learn' 'locat' 'nearest' 'point'
 'rate' 'rep' 'signific' 'store' 'suggest' 'support' 'teach' 'technic'
 'use' 'work']


In [39]:
model10=RandomForestClassifier()
Evaluate_model_fs(model10,XTrain,YTrain,BestFeatures)

Accuracy Score: 0.8043478260869565
Classification Report :
               precision    recall  f1-score   support

   Cancelled       0.69      0.79      0.74       145
     Current       0.88      0.81      0.84       269

    accuracy                           0.80       414
   macro avg       0.79      0.80      0.79       414
weighted avg       0.81      0.80      0.81       414



In [40]:
model11=GradientBoostingClassifier()
Evaluate_model_fs(model11,XTrain,YTrain,BestFeatures)

Accuracy Score: 0.821256038647343
Classification Report :
               precision    recall  f1-score   support

   Cancelled       0.73      0.87      0.80       165
     Current       0.90      0.79      0.84       249

    accuracy                           0.82       414
   macro avg       0.82      0.83      0.82       414
weighted avg       0.83      0.82      0.82       414



In [41]:
model12=DecisionTreeClassifier()
Evaluate_model_fs(model12,XTrain,YTrain,BestFeatures)

Accuracy Score: 0.8236714975845411
Classification Report :
               precision    recall  f1-score   support

   Cancelled       0.73      0.86      0.79       161
     Current       0.90      0.80      0.85       253

    accuracy                           0.82       414
   macro avg       0.82      0.83      0.82       414
weighted avg       0.84      0.82      0.83       414



#### K Best 50 Features

In [42]:
# Select K Best
FilterKBest=SelectKBest(k=50)
Output1=FilterKBest.fit_transform(XTrain,YTrain)

print(FilterKBest.get_feature_names_out())
BestFeatures=FilterKBest.get_feature_names_out().tolist()

['Children' 'Est_Income' 'Usage' 'LongDistance' 'International' 'Sex_F'
 'Sex_M' 'Status_M' 'Status_S' 'Paymethod_Auto' 'accessori' 'addit' 'asap'
 'batteri' 'becaus' 'better' 'buy' 'cancel' 'coupl' 'dead' 'disabl'
 'expect' 'famili' 'featur' 'forward' 'help' 'info' 'know' 'learn' 'locat'
 'manag' 'nearest' 'need' 'phone' 'point' 'rate' 'rep' 'said' 'signific'
 'store' 'suggest' 'support' 'suspend' 'teach' 'technic' 'told' 'transfer'
 'use' 'want' 'work']


In [43]:
model20=RandomForestClassifier()
Evaluate_model_fs(model20,XTrain,YTrain,BestFeatures)

Accuracy Score: 0.8623188405797102
Classification Report :
               precision    recall  f1-score   support

   Cancelled       0.80      0.84      0.82       152
     Current       0.91      0.87      0.89       262

    accuracy                           0.86       414
   macro avg       0.85      0.86      0.85       414
weighted avg       0.86      0.86      0.86       414



In [44]:
model21=GradientBoostingClassifier()
Evaluate_model_fs(model21,XTrain,YTrain,BestFeatures)

Accuracy Score: 0.8768115942028986
Classification Report :
               precision    recall  f1-score   support

   Cancelled       0.81      0.89      0.85       159
     Current       0.93      0.87      0.90       255

    accuracy                           0.88       414
   macro avg       0.87      0.88      0.87       414
weighted avg       0.88      0.88      0.88       414



In [45]:
model22=DecisionTreeClassifier()
Evaluate_model_fs(model22,XTrain,YTrain,BestFeatures)

Accuracy Score: 0.8768115942028986
Classification Report :
               precision    recall  f1-score   support

   Cancelled       0.84      0.84      0.84       159
     Current       0.90      0.90      0.90       255

    accuracy                           0.88       414
   macro avg       0.87      0.87      0.87       414
weighted avg       0.88      0.88      0.88       414



#### Select from Model (Wrapper Method)  <a id='wrapper'> </a>

In [46]:
# SelectFrom Model
rfc=RandomForestClassifier()
rfc.fit(XTrain,YTrain)

WrapperMethod=SelectFromModel(rfc,max_features=10)
XTrain_new=WrapperMethod.transform(XTrain)
XTrain_new.shape
z=WrapperMethod.get_support()

allfeatures=XTrain.columns.tolist()

print(WrapperMethod.get_feature_names_out(allfeatures),'\n',len(WrapperMethod.get_feature_names_out(allfeatures)))
BestFeatures=WrapperMethod.get_feature_names_out(allfeatures).tolist()

Evaluate_model_fs(rfc,XTrain,YTrain,BestFeatures)

['Children' 'Est_Income' 'Usage' 'Age' 'RatePlan' 'LongDistance' 'Local'
 'Sex_F' 'Status_M' 'Status_S'] 
 10
Accuracy Score: 0.8840579710144928
Classification Report :
               precision    recall  f1-score   support

   Cancelled       0.87      0.83      0.85       164
     Current       0.89      0.92      0.91       250

    accuracy                           0.88       414
   macro avg       0.88      0.87      0.88       414
weighted avg       0.88      0.88      0.88       414



In [47]:
# SelectFrom Model
gbc=GradientBoostingClassifier()
gbc.fit(XTrain,YTrain)

WrapperMethod=SelectFromModel(gbc,max_features=10)
XTrain_new=WrapperMethod.transform(XTrain)
XTrain_new.shape
z=WrapperMethod.get_support()

allfeatures=XTrain.columns.tolist()

print(WrapperMethod.get_feature_names_out(allfeatures),'\n',len(WrapperMethod.get_feature_names_out(allfeatures)))
BestFeatures=WrapperMethod.get_feature_names_out(allfeatures).tolist()

Evaluate_model_fs(gbc,XTrain,YTrain,BestFeatures)

['Children' 'Est_Income' 'Age' 'LongDistance' 'Sex_F' 'Sex_M' 'Status_S'
 'Paymethod_Auto' 'store' 'support'] 
 10
Accuracy Score: 0.8864734299516909
Classification Report :
               precision    recall  f1-score   support

   Cancelled       0.84      0.83      0.84       144
     Current       0.91      0.92      0.91       270

    accuracy                           0.89       414
   macro avg       0.88      0.87      0.87       414
weighted avg       0.89      0.89      0.89       414



In [48]:
# SelectFrom Model
dtc=DecisionTreeClassifier()
dtc.fit(XTrain,YTrain)

WrapperMethod=SelectFromModel(dtc,max_features=10)
XTrain_new=WrapperMethod.transform(XTrain)
XTrain_new.shape
z=WrapperMethod.get_support()

allfeatures=XTrain.columns.tolist()

print(WrapperMethod.get_feature_names_out(allfeatures),'\n',len(WrapperMethod.get_feature_names_out(allfeatures)))
BestFeatures=WrapperMethod.get_feature_names_out(allfeatures).tolist()

Evaluate_model_fs(dtc,XTrain,YTrain,BestFeatures)

['Children' 'Est_Income' 'Age' 'RatePlan' 'LongDistance' 'Status_M'
 'Paymethod_Auto' 'new' 'store' 'technic'] 
 10
Accuracy Score: 0.8429951690821256
Classification Report :
               precision    recall  f1-score   support

   Cancelled       0.78      0.83      0.80       161
     Current       0.89      0.85      0.87       253

    accuracy                           0.84       414
   macro avg       0.83      0.84      0.84       414
weighted avg       0.85      0.84      0.84       414



#### Select from Model 25

In [49]:
# SelectFrom Model
rfc1=RandomForestClassifier()
rfc1.fit(XTrain,YTrain)

WrapperMethod=SelectFromModel(rfc1,max_features=25)
XTrain_new=WrapperMethod.transform(XTrain)
XTrain_new.shape
z=WrapperMethod.get_support()

allfeatures=XTrain.columns.tolist()

print(WrapperMethod.get_feature_names_out(allfeatures),'\n',len(WrapperMethod.get_feature_names_out(allfeatures)))
BestFeatures=WrapperMethod.get_feature_names_out(allfeatures).tolist()

Evaluate_model_fs(rfc1,XTrain,YTrain,BestFeatures)

['Children' 'Est_Income' 'Usage' 'Age' 'RatePlan' 'LongDistance'
 'International' 'Local' 'Sex_F' 'Sex_M' 'Status_M' 'Status_S'
 'Car_Owner_N' 'Paymethod_Auto' 'Paymethod_CC' 'addit' 'expect' 'learn'
 'new' 'phone' 'said' 'store' 'support' 'use' 'want'] 
 25
Accuracy Score: 0.9227053140096618
Classification Report :
               precision    recall  f1-score   support

   Cancelled       0.87      0.93      0.90       157
     Current       0.96      0.92      0.94       257

    accuracy                           0.92       414
   macro avg       0.91      0.92      0.92       414
weighted avg       0.92      0.92      0.92       414



In [50]:
gbc1=GradientBoostingClassifier()
gbc1.fit(XTrain,YTrain)

WrapperMethod=SelectFromModel(gbc1,max_features=25)
XTrain_new=WrapperMethod.transform(XTrain)
XTrain_new.shape
z=WrapperMethod.get_support()

allfeatures=XTrain.columns.tolist()

print(WrapperMethod.get_feature_names_out(allfeatures),'\n',len(WrapperMethod.get_feature_names_out(allfeatures)))
BestFeatures=WrapperMethod.get_feature_names_out(allfeatures).tolist()

Evaluate_model_fs(gbc1,XTrain,YTrain,BestFeatures)

['Children' 'Est_Income' 'Usage' 'Age' 'RatePlan' 'LongDistance'
 'International' 'Local' 'Sex_F' 'Sex_M' 'Status_M' 'Status_S'
 'Paymethod_Auto' 'asap' 'know' 'number' 'pda' 'point' 'said' 'store'
 'support' 'technic' 'tri' 'use' 'want'] 
 25
Accuracy Score: 0.8961352657004831
Classification Report :
               precision    recall  f1-score   support

   Cancelled       0.83      0.87      0.85       142
     Current       0.93      0.91      0.92       272

    accuracy                           0.90       414
   macro avg       0.88      0.89      0.89       414
weighted avg       0.90      0.90      0.90       414



In [51]:
dtc1=DecisionTreeClassifier()
dtc1.fit(XTrain,YTrain)

WrapperMethod=SelectFromModel(dtc1,max_features=25)
XTrain_new=WrapperMethod.transform(XTrain)
XTrain_new.shape
z=WrapperMethod.get_support()

allfeatures=XTrain.columns.tolist()

print(WrapperMethod.get_feature_names_out(allfeatures),'\n',len(WrapperMethod.get_feature_names_out(allfeatures)))
BestFeatures=WrapperMethod.get_feature_names_out(allfeatures).tolist()

Evaluate_model_fs(dtc1,XTrain,YTrain,BestFeatures)

['Children' 'Est_Income' 'Usage' 'Age' 'RatePlan' 'LongDistance' 'Local'
 'Dropped' 'Sex_F' 'Status_D' 'Status_S' 'Car_Owner_Y' 'Paymethod_Auto'
 'LocalBilltype_FreeLocal' 'LongDistanceBilltype_Standard' 'asap' 'help'
 'learn' 'minut' 'need' 'new' 'proper' 'said' 'store' 'support'] 
 25
Accuracy Score: 0.8840579710144928
Classification Report :
               precision    recall  f1-score   support

   Cancelled       0.84      0.87      0.86       165
     Current       0.91      0.89      0.90       249

    accuracy                           0.88       414
   macro avg       0.88      0.88      0.88       414
weighted avg       0.89      0.88      0.88       414



### Model Evaluation  <a id='evaluate'> </a>

| Model Name | Feature Selection Method |No. of k Features| Accuracy Score |
| :---: | :---: | :---: |:---: |
| RandomForestClassifier | None |None | 0.87 |
| GradientBoostingClassifier |  None |None | 0.87 |
| DecisionTreeClassifier |  None |None | 0.87 |
| | | | |
| RandomForestClassifier |  Kbest | 25| 0.80 |
| GradientBoostingClassifier | Kbest | 25 | 0.82|
| DecisionTreeClassifier |Kbest | 25 | 0.82|
| RandomForestClassifier |Kbest | 50 | 0.86|
| GradientBoostingClassifier |Kbest | 50 | 0.87|
| DecisionTreeClassifier |Kbest | 50 | 0.87|
| | | | |
| RandomForestClassifier |SelectFromModel | 10 | 0.88|
| GradientBoostingClassifier |SelectFromModel | 10 | 0.88|
| DecisionTreeClassifier |SelectFromModel | 10 | 0.84|
| RandomForestClassifier |SelectFromModel | 25 | 0.92|
| GradientBoostingClassifier |SelectFromModel | 25 | 0.89|
| DecisionTreeClassifier |SelectFromModel | 25 | 0.88|

In [49]:
model=['RandomForest','GradientBoosting','DecisionTree','RandomForest','GradientBoosting','DecisionTree','RandomForest','GradientBoosting','DecisionTree','RandomForest','GradientBoosting','DecisionTree','RandomForest','GradientBoosting','DecisionTree']
fsmethod=['NoFS','NoFS','NoFS','SelectKBest','SelectKBest','SelectKBest','SelectKBest','SelectKBest','SelectKBest','SelectFromModel','SelectFromModel','SelectFromModel','SelectFromModel','SelectFromModel','SelectFromModel']
maxk=[377,377,377,25,25,25,50,50,50,10,10,10,25,25,25]
score=[87,87,87,80,82,82,86,87,87,88,88,84,92,89,88]

In [50]:
evaluation=pd.DataFrame(columns=['Model','FeatureSelection','Max Features','Score'])

In [51]:
evaluation['Model']=model
evaluation['FeatureSelection']=fsmethod
evaluation['Max Features']=maxk
evaluation['Score']=score

In [52]:
evaluation

Unnamed: 0,Model,FeatureSelection,Max Features,Score
0,RandomForest,NoFS,377,87
1,GradientBoosting,NoFS,377,87
2,DecisionTree,NoFS,377,87
3,RandomForest,SelectKBest,25,80
4,GradientBoosting,SelectKBest,25,82
5,DecisionTree,SelectKBest,25,82
6,RandomForest,SelectKBest,50,86
7,GradientBoosting,SelectKBest,50,87
8,DecisionTree,SelectKBest,50,87
9,RandomForest,SelectFromModel,10,88


In [94]:
# Comparing Feature Selection Methods
import plotly.express as px



fig=px.scatter(evaluation,x=evaluation.index,y='Score',color='Model',symbol='FeatureSelection',
               
               title='Model Evaluation',color_discrete_sequence=px.colors.qualitative.Dark24)
fig.show()