# Data Exploration
- This notebook performs exploratory data analysis on the dataset.
- To expand on the analysis, attach this notebook to a cluster with runtime version **13.3.x-cpu-ml-scala2.12**,
edit [the options of pandas-profiling](https://pandas-profiling.ydata.ai/docs/master/rtd/pages/advanced_usage.html), and rerun it.
- Explore completed trials in the [MLflow experiment](#mlflow/experiments/1074920023763113).

In [0]:
import mlflow
import os
import uuid
import shutil
import pandas as pd
import databricks.automl_runtime

# Download input data from mlflow into a pandas DataFrame
# Create temporary directory to download data
temp_dir = os.path.join(os.environ["SPARK_LOCAL_DIRS"], "tmp", str(uuid.uuid4())[:8])
os.makedirs(temp_dir)

# Download the artifact and read it
training_data_path = mlflow.artifacts.download_artifacts(run_id="9abb3639f67f4629bfd4f1f3f38eb0ec", artifact_path="data", dst_path=temp_dir)
df = pd.read_parquet(os.path.join(training_data_path, "training_data"))

# Delete the temporary data
shutil.rmtree(temp_dir)

target_col = "Churn"

# Drop columns created by AutoML before pandas-profiling
df = df.drop(['_automl_split_col_0000'], axis=1)

Sun Dec 22 21:56:00 2024 Connection to spark from PID  4876
Sun Dec 22 21:56:00 2024 Initialized gateway on port 40449


Sun Dec 22 21:56:00 2024 Connected to spark.


## Semantic Type Detection Alerts

For details about the definition of the semantic types and how to override the detection, see
[Databricks documentation on semantic type detection](https://docs.databricks.com/applications/machine-learning/automl.html#semantic-type-detection).

- Semantic type `categorical` detected for column `SeniorCitizen`. Training notebooks will encode features based on categorical transformations.

## Profiling Results

In [0]:
from ydata_profiling import ProfileReport
df_profile = ProfileReport(df,
                           correlations={
                               "auto": {"calculate": True},
                               "pearson": {"calculate": True},
                               "spearman": {"calculate": True},
                               "kendall": {"calculate": True},
                               "phi_k": {"calculate": True},
                               "cramers": {"calculate": True},
                           }, title="Profiling Report", progress_bar=False, infer_dtypes=False)
profile_html = df_profile.to_html()

displayHTML(profile_html)

0,1
Number of variables,10
Number of observations,7043
Missing cells,11
Missing cells (%),< 0.1%
Duplicate rows,29
Duplicate rows (%),0.4%
Total size in memory,522.8 KiB
Average record size in memory,76.0 B

0,1
Text,7
Numeric,3

0,1
Dataset has 29 (0.4%) duplicate rows,Duplicates
Tenure is highly overall correlated with Contract and 1 other fields,High correlation
TotalCharges is highly overall correlated with Tenure and 2 other fields,High correlation
InternetService is highly overall correlated with Contract and 1 other fields,High correlation
Contract is highly overall correlated with Tenure and 2 other fields,High correlation
SeniorCitizen has 5901 (83.8%) zeros,Zeros

0,1
Analysis started,2024-12-22 21:56:03.781122
Analysis finished,2024-12-22 21:56:16.503170
Duration,12.72 seconds
Software version,ydata-profiling vv4.2.0
Download configuration,config.json

0,1
Distinct,2
Distinct (%),< 0.1%
Missing,0
Missing (%),0.0%
Memory size,55.1 KiB

0,1
Max length,6.0
Median length,4.0
Mean length,4.990487
Min length,4.0

0,1
Total characters,35148
Distinct characters,6
Distinct categories,2 ?
Distinct scripts,1 ?
Distinct blocks,1 ?

0,1
Unique,0 ?
Unique (%),0.0%

0,1
1st row,Female
2nd row,Female
3rd row,Male
4th row,Female
5th row,Male

Value,Count,Frequency (%)
male,3555,50.5%
female,3488,49.5%

Value,Count,Frequency (%)
e,10531,30.0%
a,7043,20.0%
l,7043,20.0%
M,3555,10.1%
F,3488,9.9%
m,3488,9.9%

Value,Count,Frequency (%)
Lowercase Letter,28105,80.0%
Uppercase Letter,7043,20.0%

Value,Count,Frequency (%)
e,10531,37.5%
a,7043,25.1%
l,7043,25.1%
m,3488,12.4%

Value,Count,Frequency (%)
M,3555,50.5%
F,3488,49.5%

Value,Count,Frequency (%)
Latin,35148,100.0%

Value,Count,Frequency (%)
e,10531,30.0%
a,7043,20.0%
l,7043,20.0%
M,3555,10.1%
F,3488,9.9%
m,3488,9.9%

Value,Count,Frequency (%)
ASCII,35148,100.0%

Value,Count,Frequency (%)
e,10531,30.0%
a,7043,20.0%
l,7043,20.0%
M,3555,10.1%
F,3488,9.9%
m,3488,9.9%

0,1
Distinct,2
Distinct (%),< 0.1%
Missing,0
Missing (%),0.0%
Infinite,0
Infinite (%),0.0%
Mean,0.16214681

0,1
Minimum,0
Maximum,1
Zeros,5901
Zeros (%),83.8%
Negative,0
Negative (%),0.0%
Memory size,27.6 KiB

0,1
Minimum,0
5-th percentile,0
Q1,0
median,0
Q3,0
95-th percentile,1
Maximum,1
Range,1
Interquartile range (IQR),0

0,1
Standard deviation,0.36861161
Coefficient of variation (CV),2.2733201
Kurtosis,1.3625959
Mean,0.16214681
Median Absolute Deviation (MAD),0
Skewness,1.8336327
Sum,1142
Variance,0.13587452
Monotonicity,Not monotonic

Value,Count,Frequency (%)
0,5901,83.8%
1,1142,16.2%

Value,Count,Frequency (%)
0,5901,83.8%
1,1142,16.2%

Value,Count,Frequency (%)
1,1142,16.2%
0,5901,83.8%

0,1
Distinct,2
Distinct (%),< 0.1%
Missing,0
Missing (%),0.0%
Memory size,55.1 KiB

0,1
Max length,3.0
Median length,2.0
Mean length,2.4830328
Min length,2.0

0,1
Total characters,17488
Distinct characters,5
Distinct categories,2 ?
Distinct scripts,1 ?
Distinct blocks,1 ?

0,1
Unique,0 ?
Unique (%),0.0%

0,1
1st row,Yes
2nd row,No
3rd row,No
4th row,No
5th row,No

Value,Count,Frequency (%)
no,3641,51.7%
yes,3402,48.3%

Value,Count,Frequency (%)
N,3641,20.8%
o,3641,20.8%
Y,3402,19.5%
e,3402,19.5%
s,3402,19.5%

Value,Count,Frequency (%)
Lowercase Letter,10445,59.7%
Uppercase Letter,7043,40.3%

Value,Count,Frequency (%)
o,3641,34.9%
e,3402,32.6%
s,3402,32.6%

Value,Count,Frequency (%)
N,3641,51.7%
Y,3402,48.3%

Value,Count,Frequency (%)
Latin,17488,100.0%

Value,Count,Frequency (%)
N,3641,20.8%
o,3641,20.8%
Y,3402,19.5%
e,3402,19.5%
s,3402,19.5%

Value,Count,Frequency (%)
ASCII,17488,100.0%

Value,Count,Frequency (%)
N,3641,20.8%
o,3641,20.8%
Y,3402,19.5%
e,3402,19.5%
s,3402,19.5%

0,1
Distinct,73
Distinct (%),1.0%
Missing,0
Missing (%),0.0%
Infinite,0
Infinite (%),0.0%
Mean,32.371149

0,1
Minimum,0
Maximum,72
Zeros,11
Zeros (%),0.2%
Negative,0
Negative (%),0.0%
Memory size,55.1 KiB

0,1
Minimum,0
5-th percentile,1
Q1,9
median,29
Q3,55
95-th percentile,72
Maximum,72
Range,72
Interquartile range (IQR),46

0,1
Standard deviation,24.559481
Coefficient of variation (CV),0.75868426
Kurtosis,-1.3873716
Mean,32.371149
Median Absolute Deviation (MAD),22
Skewness,0.23953975
Sum,227990
Variance,603.16811
Monotonicity,Not monotonic

Value,Count,Frequency (%)
1,613,8.7%
72,362,5.1%
2,238,3.4%
3,200,2.8%
4,176,2.5%
71,170,2.4%
5,133,1.9%
7,131,1.9%
8,123,1.7%
70,119,1.7%

Value,Count,Frequency (%)
0,11,0.2%
1,613,8.7%
2,238,3.4%
3,200,2.8%
4,176,2.5%
5,133,1.9%
6,110,1.6%
7,131,1.9%
8,123,1.7%
9,119,1.7%

Value,Count,Frequency (%)
72,362,5.1%
71,170,2.4%
70,119,1.7%
69,95,1.3%
68,100,1.4%
67,98,1.4%
66,89,1.3%
65,76,1.1%
64,80,1.1%
63,72,1.0%

0,1
Distinct,3
Distinct (%),< 0.1%
Missing,0
Missing (%),0.0%
Memory size,55.1 KiB

0,1
Max length,11.0
Median length,3.0
Mean length,6.3000142
Min length,2.0

0,1
Total characters,44371
Distinct characters,14
Distinct categories,3 ?
Distinct scripts,2 ?
Distinct blocks,1 ?

0,1
Unique,0 ?
Unique (%),0.0%

0,1
1st row,DSL
2nd row,Fiber optic
3rd row,Fiber optic
4th row,DSL
5th row,No

Value,Count,Frequency (%)
fiber,3096,30.5%
optic,3096,30.5%
dsl,2421,23.9%
no,1526,15.1%

Value,Count,Frequency (%)
i,6192,14.0%
o,4622,10.4%
F,3096,7.0%
b,3096,7.0%
e,3096,7.0%
r,3096,7.0%
,3096,7.0%
p,3096,7.0%
t,3096,7.0%
c,3096,7.0%

Value,Count,Frequency (%)
Lowercase Letter,29390,66.2%
Uppercase Letter,11885,26.8%
Space Separator,3096,7.0%

Value,Count,Frequency (%)
i,6192,21.1%
o,4622,15.7%
b,3096,10.5%
e,3096,10.5%
r,3096,10.5%
p,3096,10.5%
t,3096,10.5%
c,3096,10.5%

Value,Count,Frequency (%)
F,3096,26.0%
D,2421,20.4%
S,2421,20.4%
L,2421,20.4%
N,1526,12.8%

Value,Count,Frequency (%)
,3096,100.0%

Value,Count,Frequency (%)
Latin,41275,93.0%
Common,3096,7.0%

Value,Count,Frequency (%)
i,6192,15.0%
o,4622,11.2%
F,3096,7.5%
b,3096,7.5%
e,3096,7.5%
r,3096,7.5%
p,3096,7.5%
t,3096,7.5%
c,3096,7.5%
D,2421,5.9%

Value,Count,Frequency (%)
,3096,100.0%

Value,Count,Frequency (%)
ASCII,44371,100.0%

Value,Count,Frequency (%)
i,6192,14.0%
o,4622,10.4%
F,3096,7.0%
b,3096,7.0%
e,3096,7.0%
r,3096,7.0%
,3096,7.0%
p,3096,7.0%
t,3096,7.0%
c,3096,7.0%

0,1
Distinct,3
Distinct (%),< 0.1%
Missing,0
Missing (%),0.0%
Memory size,55.1 KiB

0,1
Max length,14.0
Median length,14.0
Mean length,11.30115
Min length,8.0

0,1
Total characters,79594
Distinct characters,15
Distinct categories,4 ?
Distinct scripts,2 ?
Distinct blocks,1 ?

0,1
Unique,0 ?
Unique (%),0.0%

0,1
1st row,Month-to-month
2nd row,Month-to-month
3rd row,Month-to-month
4th row,Month-to-month
5th row,Two year

Value,Count,Frequency (%)
month-to-month,3875,37.9%
year,3168,31.0%
two,1695,16.6%
one,1473,14.4%

Value,Count,Frequency (%)
o,13320,16.7%
t,11625,14.6%
n,9223,11.6%
h,7750,9.7%
-,7750,9.7%
e,4641,5.8%
M,3875,4.9%
m,3875,4.9%
,3168,4.0%
y,3168,4.0%

Value,Count,Frequency (%)
Lowercase Letter,61633,77.4%
Dash Punctuation,7750,9.7%
Uppercase Letter,7043,8.8%
Space Separator,3168,4.0%

Value,Count,Frequency (%)
o,13320,21.6%
t,11625,18.9%
n,9223,15.0%
h,7750,12.6%
e,4641,7.5%
m,3875,6.3%
y,3168,5.1%
a,3168,5.1%
r,3168,5.1%
w,1695,2.8%

Value,Count,Frequency (%)
M,3875,55.0%
T,1695,24.1%
O,1473,20.9%

Value,Count,Frequency (%)
-,7750,100.0%

Value,Count,Frequency (%)
,3168,100.0%

Value,Count,Frequency (%)
Latin,68676,86.3%
Common,10918,13.7%

Value,Count,Frequency (%)
o,13320,19.4%
t,11625,16.9%
n,9223,13.4%
h,7750,11.3%
e,4641,6.8%
M,3875,5.6%
m,3875,5.6%
y,3168,4.6%
a,3168,4.6%
r,3168,4.6%

Value,Count,Frequency (%)
-,7750,71.0%
,3168,29.0%

Value,Count,Frequency (%)
ASCII,79594,100.0%

Value,Count,Frequency (%)
o,13320,16.7%
t,11625,14.6%
n,9223,11.6%
h,7750,9.7%
-,7750,9.7%
e,4641,5.8%
M,3875,4.9%
m,3875,4.9%
,3168,4.0%
y,3168,4.0%

0,1
Distinct,2
Distinct (%),< 0.1%
Missing,0
Missing (%),0.0%
Memory size,55.1 KiB

0,1
Max length,3.0
Median length,3.0
Mean length,2.5922192
Min length,2.0

0,1
Total characters,18257
Distinct characters,5
Distinct categories,2 ?
Distinct scripts,1 ?
Distinct blocks,1 ?

0,1
Unique,0 ?
Unique (%),0.0%

0,1
1st row,Yes
2nd row,Yes
3rd row,Yes
4th row,No
5th row,No

Value,Count,Frequency (%)
yes,4171,59.2%
no,2872,40.8%

Value,Count,Frequency (%)
Y,4171,22.8%
e,4171,22.8%
s,4171,22.8%
N,2872,15.7%
o,2872,15.7%

Value,Count,Frequency (%)
Lowercase Letter,11214,61.4%
Uppercase Letter,7043,38.6%

Value,Count,Frequency (%)
e,4171,37.2%
s,4171,37.2%
o,2872,25.6%

Value,Count,Frequency (%)
Y,4171,59.2%
N,2872,40.8%

Value,Count,Frequency (%)
Latin,18257,100.0%

Value,Count,Frequency (%)
Y,4171,22.8%
e,4171,22.8%
s,4171,22.8%
N,2872,15.7%
o,2872,15.7%

Value,Count,Frequency (%)
ASCII,18257,100.0%

Value,Count,Frequency (%)
Y,4171,22.8%
e,4171,22.8%
s,4171,22.8%
N,2872,15.7%
o,2872,15.7%

0,1
Distinct,4
Distinct (%),0.1%
Missing,0
Missing (%),0.0%
Memory size,55.1 KiB

0,1
Max length,25.0
Median length,23.0
Mean length,18.570212
Min length,12.0

0,1
Total characters,130790
Distinct characters,23
Distinct categories,5 ?
Distinct scripts,2 ?
Distinct blocks,1 ?

0,1
Unique,0 ?
Unique (%),0.0%

0,1
1st row,Electronic check
2nd row,Electronic check
3rd row,Credit card (automatic)
4th row,Mailed check
5th row,Credit card (automatic)

Value,Count,Frequency (%)
check,3977,23.2%
automatic,3066,17.9%
electronic,2365,13.8%
mailed,1612,9.4%
bank,1544,9.0%
transfer,1544,9.0%
credit,1522,8.9%
card,1522,8.9%

Value,Count,Frequency (%)
c,17272,13.2%
a,12354,9.4%
t,11563,8.8%
e,11020,8.4%
,10109,7.7%
i,8565,6.5%
r,8497,6.5%
k,5521,4.2%
n,5453,4.2%
o,5431,4.2%

Value,Count,Frequency (%)
Lowercase Letter,107506,82.2%
Space Separator,10109,7.7%
Uppercase Letter,7043,5.4%
Open Punctuation,3066,2.3%
Close Punctuation,3066,2.3%

Value,Count,Frequency (%)
c,17272,16.1%
a,12354,11.5%
t,11563,10.8%
e,11020,10.3%
i,8565,8.0%
r,8497,7.9%
k,5521,5.1%
n,5453,5.1%
o,5431,5.1%
d,4656,4.3%

Value,Count,Frequency (%)
E,2365,33.6%
M,1612,22.9%
B,1544,21.9%
C,1522,21.6%

Value,Count,Frequency (%)
,10109,100.0%

Value,Count,Frequency (%)
(,3066,100.0%

Value,Count,Frequency (%)
),3066,100.0%

Value,Count,Frequency (%)
Latin,114549,87.6%
Common,16241,12.4%

Value,Count,Frequency (%)
c,17272,15.1%
a,12354,10.8%
t,11563,10.1%
e,11020,9.6%
i,8565,7.5%
r,8497,7.4%
k,5521,4.8%
n,5453,4.8%
o,5431,4.7%
d,4656,4.1%

Value,Count,Frequency (%)
,10109,62.2%
(,3066,18.9%
),3066,18.9%

Value,Count,Frequency (%)
ASCII,130790,100.0%

Value,Count,Frequency (%)
c,17272,13.2%
a,12354,9.4%
t,11563,8.8%
e,11020,8.4%
,10109,7.7%
i,8565,6.5%
r,8497,6.5%
k,5521,4.2%
n,5453,4.2%
o,5431,4.2%

0,1
Distinct,6530
Distinct (%),92.9%
Missing,11
Missing (%),0.2%
Infinite,0
Infinite (%),0.0%
Mean,2283.3004

0,1
Minimum,18.8
Maximum,8684.8
Zeros,0
Zeros (%),0.0%
Negative,0
Negative (%),0.0%
Memory size,55.1 KiB

0,1
Minimum,18.8
5-th percentile,49.605
Q1,401.45
median,1397.475
Q3,3794.7375
95-th percentile,6923.59
Maximum,8684.8
Range,8666.0
Interquartile range (IQR),3393.2875

0,1
Standard deviation,2266.7714
Coefficient of variation (CV),0.99276088
Kurtosis,-0.23179876
Mean,2283.3004
Median Absolute Deviation (MAD),1222.8
Skewness,0.9616425
Sum,16056169
Variance,5138252.4
Monotonicity,Not monotonic

Value,Count,Frequency (%)
20.2,11,0.2%
19.75,9,0.1%
19.65,8,0.1%
19.9,8,0.1%
20.05,8,0.1%
19.55,7,0.1%
45.3,7,0.1%
20.15,6,0.1%
20.25,6,0.1%
19.45,6,0.1%

Value,Count,Frequency (%)
18.8,1,< 0.1%
18.85,2,< 0.1%
18.9,1,< 0.1%
19.0,1,< 0.1%
19.05,1,< 0.1%
19.1,3,< 0.1%
19.15,1,< 0.1%
19.2,4,0.1%
19.25,3,< 0.1%
19.3,4,0.1%

Value,Count,Frequency (%)
8684.8,1,< 0.1%
8672.45,1,< 0.1%
8670.1,1,< 0.1%
8594.4,1,< 0.1%
8564.75,1,< 0.1%
8547.15,1,< 0.1%
8543.25,1,< 0.1%
8529.5,1,< 0.1%
8496.7,1,< 0.1%
8477.7,1,< 0.1%

0,1
Distinct,2
Distinct (%),< 0.1%
Missing,0
Missing (%),0.0%
Memory size,55.1 KiB

0,1
Max length,3.0
Median length,2.0
Mean length,2.2653699
Min length,2.0

0,1
Total characters,15955
Distinct characters,5
Distinct categories,2 ?
Distinct scripts,1 ?
Distinct blocks,1 ?

0,1
Unique,0 ?
Unique (%),0.0%

0,1
1st row,No
2nd row,Yes
3rd row,No
4th row,No
5th row,No

Value,Count,Frequency (%)
no,5174,73.5%
yes,1869,26.5%

Value,Count,Frequency (%)
N,5174,32.4%
o,5174,32.4%
Y,1869,11.7%
e,1869,11.7%
s,1869,11.7%

Value,Count,Frequency (%)
Lowercase Letter,8912,55.9%
Uppercase Letter,7043,44.1%

Value,Count,Frequency (%)
o,5174,58.1%
e,1869,21.0%
s,1869,21.0%

Value,Count,Frequency (%)
N,5174,73.5%
Y,1869,26.5%

Value,Count,Frequency (%)
Latin,15955,100.0%

Value,Count,Frequency (%)
N,5174,32.4%
o,5174,32.4%
Y,1869,11.7%
e,1869,11.7%
s,1869,11.7%

Value,Count,Frequency (%)
ASCII,15955,100.0%

Value,Count,Frequency (%)
N,5174,32.4%
o,5174,32.4%
Y,1869,11.7%
e,1869,11.7%
s,1869,11.7%

Unnamed: 0,SeniorCitizen,Tenure,TotalCharges
SeniorCitizen,1.0,0.019,0.107
Tenure,0.019,1.0,0.889
TotalCharges,0.107,0.889,1.0

Unnamed: 0,SeniorCitizen,Tenure,TotalCharges
SeniorCitizen,1.0,0.017,0.102
Tenure,0.017,1.0,0.826
TotalCharges,0.102,0.826,1.0

Unnamed: 0,SeniorCitizen,Tenure,TotalCharges
SeniorCitizen,1.0,0.019,0.107
Tenure,0.019,1.0,0.889
TotalCharges,0.107,0.889,1.0

Unnamed: 0,SeniorCitizen,Tenure,TotalCharges
SeniorCitizen,1.0,0.015,0.088
Tenure,0.015,1.0,0.734
TotalCharges,0.088,0.734,1.0

Unnamed: 0,Gender,SeniorCitizen,Partner,Tenure,InternetService,Contract,PaperlessBilling,PaymentMethod,TotalCharges,Churn
Gender,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
SeniorCitizen,0.0,1.0,0.017,0.029,0.161,0.086,0.242,0.293,0.148,0.233
Partner,0.0,0.017,1.0,0.492,0.0,0.18,0.013,0.243,0.425,0.233
Tenure,0.0,0.029,0.492,1.0,0.019,0.665,0.0,0.375,0.842,0.474
InternetService,0.0,0.161,0.0,0.019,1.0,0.505,0.231,0.324,0.508,0.196
Contract,0.0,0.086,0.18,0.665,0.505,1.0,0.107,0.277,0.508,0.252
PaperlessBilling,0.0,0.242,0.013,0.0,0.231,0.107,1.0,0.37,0.205,0.296
PaymentMethod,0.0,0.293,0.243,0.375,0.324,0.277,0.37,1.0,0.35,0.449
TotalCharges,0.0,0.148,0.425,0.842,0.508,0.508,0.205,0.35,1.0,0.281
Churn,0.0,0.233,0.233,0.474,0.196,0.252,0.296,0.449,0.281,1.0

Unnamed: 0,Gender,SeniorCitizen,Partner,Tenure,InternetService,Contract,PaperlessBilling,PaymentMethod,TotalCharges,Churn
0,Female,0,Yes,1.0,DSL,Month-to-month,Yes,Electronic check,29.85,No
1,Female,0,No,2.0,Fiber optic,Month-to-month,Yes,Electronic check,151.65,Yes
2,Male,0,No,22.0,Fiber optic,Month-to-month,Yes,Credit card (automatic),1949.4,No
3,Female,0,No,10.0,DSL,Month-to-month,No,Mailed check,301.9,No
4,Male,0,No,16.0,No,Two year,No,Credit card (automatic),326.8,No
5,Male,0,Yes,58.0,Fiber optic,One year,No,Credit card (automatic),5681.1,No
6,Male,0,No,25.0,Fiber optic,Month-to-month,Yes,Electronic check,2686.05,No
7,Female,0,No,52.0,No,One year,No,Mailed check,1022.95,No
8,Female,0,No,21.0,Fiber optic,Month-to-month,Yes,Electronic check,1862.9,No
9,Male,1,No,1.0,DSL,Month-to-month,Yes,Electronic check,39.65,Yes

Unnamed: 0,Gender,SeniorCitizen,Partner,Tenure,InternetService,Contract,PaperlessBilling,PaymentMethod,TotalCharges,Churn
7033,Male,0,Yes,1.0,Fiber optic,Month-to-month,Yes,Mailed check,70.65,Yes
7034,Male,0,No,72.0,Fiber optic,One year,Yes,Electronic check,7544.3,No
7035,Male,0,No,13.0,DSL,Month-to-month,No,Mailed check,931.55,No
7036,Female,0,Yes,68.0,DSL,Two year,No,Bank transfer (automatic),4326.25,No
7037,Male,0,No,38.0,Fiber optic,Month-to-month,Yes,Credit card (automatic),2625.25,No
7038,Female,0,No,67.0,Fiber optic,Month-to-month,Yes,Credit card (automatic),6886.25,Yes
7039,Male,0,Yes,24.0,DSL,One year,Yes,Mailed check,1990.5,No
7040,Female,0,Yes,72.0,Fiber optic,One year,Yes,Credit card (automatic),7362.9,No
7041,Female,0,Yes,11.0,DSL,Month-to-month,Yes,Electronic check,346.45,No
7042,Male,1,Yes,4.0,Fiber optic,Month-to-month,Yes,Mailed check,306.6,Yes

Unnamed: 0,Gender,SeniorCitizen,Partner,Tenure,InternetService,Contract,PaperlessBilling,PaymentMethod,TotalCharges,Churn,# duplicates
19,Male,0,No,1.0,No,Month-to-month,No,Mailed check,20.05,No,4
21,Male,0,No,1.0,No,Month-to-month,No,Mailed check,20.2,No,3
28,Male,0,Yes,0.0,No,Two year,No,Mailed check,,No,3
0,Female,0,No,1.0,DSL,Month-to-month,No,Mailed check,45.95,Yes,2
1,Female,0,No,1.0,Fiber optic,Month-to-month,Yes,Electronic check,69.2,Yes,2
2,Female,0,No,1.0,Fiber optic,Month-to-month,Yes,Electronic check,70.1,Yes,2
3,Female,0,No,1.0,Fiber optic,Month-to-month,Yes,Mailed check,70.15,Yes,2
4,Female,0,No,1.0,No,Month-to-month,No,Mailed check,19.55,No,2
5,Female,0,No,1.0,No,Month-to-month,No,Mailed check,19.65,No,2
6,Female,0,No,1.0,No,Month-to-month,No,Mailed check,19.9,No,2
