# Table of Contents <a name="toc"></a>

### Part I: [Research Question and Variables](#pt1)

   - A. [Description of Research Question](#a)
    
   - B. [Description of Data Set Variables](#b)
 
### Part II: [Data-Cleaning Plan](#pt2)

   - C1. [Detection Methods](#c1)
     
   - C2. [Justification for Detection Methods](#c2)
     
   - C3. [Justification for Program Language](#c4)
     
   - C4. [Detection Code](#c4)
     
### Part III: [Data Cleaning Treatment](#pt3)

   - D1. [Discussion of Findings](#d1)
   
   - D2. [Treatment Methods and Justification](#d2)
   
   - D3. [Summary of Treatment](#d3)
   
   - D4. [Treatment Code](#d4)
   
   - D5. [Clean Dataset](#d5)
   
   - D6. [Limitations](#d6)
   
   - D7. [Implications](#d7)
   
   - E1. [Variables and PCA Loadings](#e1)
   
   - E2. [PCs Selection](#e2)
   
   - E3. [Benefits](#e3)

### Part IV: [Supporting Documents](#pt4)

   - F. [Video Demonstration of Code Functionality](#f)
   
   - G. [Acknowledge Sources](#g)   

#### Overview of all Variables in Raw Data Set 
|  Original Name |  New Name |  Data Type |  Variable Type|Description | #Missing | #Unique  | Example or Range  |
|:---:|:---:|:---:|:---:|:---|---:|---:|---:|
Unnamed|n/a|int64|UID - not for analysis|DELETED COL - Duplication of CaseOrder column|0|10000|1 - 10000|
CaseOrder|case_order|int64|UID - not for analysis|Index to preserve original order of raw data|0|10000|1 - 10000|
Customer_id|cust_id|object|UID - not for analysis|Unique ID per customer|0|10000|"K409198, S120509, D90850"|
Interaction|interaction_id|object|UID - not for analysis|Unique ID per customer transaction|0|10000|aa90260b-4141-4a24-8e36-b04ce1f4f77b|
City|city|object|categorical - nominal|City of customer residence|0|6058|"Point Baker, West Branch..."|
State|state|object|categorical - nominal|State of customer residence|0|52|"AK, MI, OR, CA..."|
County|county|object|categorical - nominal|County of customer residence|0|1620|"Ogemaw, Yamhill, San Diego..."|
Zip|zip|int64|categorical - ordinal|Zip code of customer residence|0|8583|601 - 99929|
Lat|latitude|float64|numeric - continuous|GPS Latitude of customer residence|0|8563|17.96612 - 70.64066|
Lng|longitude|float64|numeric - continuous|GPS Longitude of customer residence|0|8655|-171.68815 - -65.66785|
Population|population|int64|numeric - discrete|Population within one mile radius of customer|0|5933|0 - 111850|
Area|area|object|categorical - nominal|Area type around customer residence|0|3|"rural, urban, suburban"|
Timezone|timezone|object|categorical - ordinal|Time Zone of customer residence|0|25|"Pacific/Honolulu, America/New_York"|
Job|job|object|categorical - nominal|Job of customer|0|639|"Comptroller, Paramedic"|
Children|children|float64|numeric - discrete|Number of children in customer household|2495|12|0 - 10|
Age|age|float64|numeric - discrete|Age of customer|2475|73|18 - 89|
Education|education|object|categorical - ordinal|Highest degree earned as reported by customer|0|12|"Bachelor's Degree, 9th Grade to 12th Grade"|
Employment|employment|object|categorical - nominal|Employment status as reported by customer|0|5|"Student, Part Time, Full Time, Retired, Unemployed"|
Income|income|float64|numeric - continuous|Annual income as reported by customer|2490|7507|740.66 - 258900.7|
Marital|marital|object|categorical - nominal|Marital status as reported by customer|0|5|"Never Married, Married, Separated, Divorced, Widowed"|
Gender|gender|object|categorical - nominal|Gender as reported by customer|0|3|"Male, Female, Prefer not to answer"|
Churn|churn|object|categorical - nominal|Whether customer discontinued service within last month|0|2|"Yes, No"|
Outage_sec_perweek|outage_sec_wk|float64|numeric - continuous|Average # of seconds/week of system outages in customer's area|0|9993|-1.348571 - 47.04928|
Email|email_contact_yr|int64|numeric - discrete|Number of emails sent to customer |0|23|1 - 23|
Contacts|support_reqs_total|int64|numeric - discrete|Number of times customer contacted tech support|0|8|0 - 7|
Yearly_equip_failure|equip_failure_yr|int64|numeric - discrete|Numer of times annually customer's equipment failed|0|6|0 - 6|
Techie|cust_is_techie|object|categorical - nominal|Whether customer considers themselves technically inclined|2477|3|"Yes, No, NA"|
Contract|contract_term|object|categorical - nominal|Contract term of the customer|0|3|"Month-to-month, One year, Two year"|
Port_modem|portable_modem|object|categorical - nominal|Whether customer has a portable modem|0|2|"Yes, No"|
Tablet|tablet|object|categorical - nominal|Whether customer owns a tablet device|0|2|"Yes, No"|
InternetService|internet_service|object|categorical - nominal|Customer's internet service provider|0|3|"DSL, Fiber Optic, None"|
Phone|phone_service|object|categorical - nominal|Whether the customer has a phone service|1026|3|"Yes, No, NA"|
Multiple|multi_ph_lines|object|categorical - nominal|Whether the customer has multiple phone lines|0|2|"Yes, No"|
OnlineSecurity|online_security|object|categorical - nominal|Whether the customer has an online security add-on|0|2|"Yes, No"|
OnlineBackup|online_backup|object|categorical - nominal|Whether the customer has an online backup add-on|0|2|"Yes, No"|
DeviceProtection|device_protection|object|categorical - nominal|Whether the customer has device protection add-on|0|2|"Yes, No"|
TechSupport|tech_support|object|categorical - nominal|Whether the customer has a technical support add-on|991|3|"Yes, No"|
StreamingTV|streaming_tv|object|categorical - nominal|Whether the customer has streaming TV|0|2|"Yes, No"|
StreamingMovies|streaming_movies|object|categorical - nominal|Whether the customer has streaming movies|0|2|"Yes, No"|
PaperlessBilling|paperless_bill|object|categorical - nominal|Whether the customer has paperless billing|0|2|"Yes, No"|
PaymentMethod|pay_method|object|categorical - nominal|Customer's payment method|0|4|"Electronic Check, Mailed Check, Bank Transfer(automatic), Credit Card (automatic)"|
Tenure|tenure|float64|numeric - continuous|Number of months customer has stayed with the provider|931|9066|1.00025934 - 71.99928|
MonthlyCharge|monthly_charge|float64|numeric - continuous|Monthly amount charged to the customer |0|9984|77.50523 - 315.8786|
Bandwidth_GB_Year|bandwidth_gb_yr|float64|numeric - continuous|Average amount of data used annually by customer (in GB)|1021|8973|155.5067148 - 7158.982|
item1|timely_resp|int64|categorical - ordinal|"Timely response on cust survey (1=most important/8=least important)"|0|7|1 - 7|
item2|timely_fix|int64|categorical - ordinal|"Timely fixes on cust survey (1=most important/8=least important)"|0|7|1 - 7|
item3|timely_replace|int64|categorical - ordinal|"Timely replacements on cust survey (1=most important/8=least important)"|0|8|1 - 8|
item4|reliability|int64|categorical - ordinal|"Reliability on cust survey (1=most important/8=least important)"|0|7|1 - 7|
item5|options|int64|categorical - ordinal|"Options on cust survey (1=most important/8=least important)"|0|7|1 - 7|
item6|respectful|int64|categorical - ordinal|"Respectful response on cust survey (1=most important/8=least important)"|0|8|1 - 8|
item7|courteous|int64|categorical - ordinal|"Courteous exchange on cust survey (1=most important/8=least important)"|0|7|1 - 7|
item8|active_listening|int64|categorical - ordinal|"Evidence of active listening on cust survey (1=most important/8=least important)"|0|8|1 - 8|