# Telecomunicaciones


## Descomposición de tareas

### Definir objetivos
- **Identificar operadores ineficaces**, operadores con:
    - Muchas llamadas entrantes perdidas (internas y externas)
    - Mucho tiempo de espera para llamadas entrantes
    - Pocas llamadas salientes

### Preprocesamiento de datos
- Estudiar valores ausentes
- Estudiar valores duplicados
- Eliminar duplicados
- Reemplazar valores ausentes
- Convertir tipos
- Estudiar el tipo de correspondencia
- Comprobar la exactitud de los nombres de columnas
- Renombrar las columnas

### Análisis exploratorio
- Organizar data set por operator_id
- Analizar las llamadas entrantes perdidas
      - Separar las llamadas entrantes perdidas internas de las externas
- Revisar los tiempos de espera para las Llamadas entrantes
- Identificar operadores con pocas llamadas salientes
- Graficar cada uno de los puntos pasados

### Conclusiones
- Identificar operadores con
      - Alto número de llamadas entrantes perdidas
      - Valores altos de tiempos de espera
      - Pocas llamadas salientes
- Analizar la relación entre Buenos operadores y malos operadores

## Preprocesamiento

In [25]:
# Cargar librerías

import pandas as pd 
import numpy as np 
import matplotlib.pyplot as plt
import seaborn as sns

In [26]:
# Cargar datasets
clients = pd.read_csv('datasets/telecom_clients.csv')
data = pd.read_csv('datasets/telecom_dataset_new.csv')

### Cients

In [27]:
clients.head()

Unnamed: 0,user_id,tariff_plan,date_start
0,166713,A,2019-08-15
1,166901,A,2019-08-23
2,168527,A,2019-10-29
3,167097,A,2019-09-01
4,168193,A,2019-10-16


In [28]:
clients.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 732 entries, 0 to 731
Data columns (total 3 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   user_id      732 non-null    int64 
 1   tariff_plan  732 non-null    object
 2   date_start   732 non-null    object
dtypes: int64(1), object(2)
memory usage: 17.3+ KB


In [29]:
clients.describe()

Unnamed: 0,user_id
count,732.0
mean,167431.927596
std,633.810383
min,166373.0
25%,166900.75
50%,167432.0
75%,167973.0
max,168606.0


In [30]:
clients.user_id.nunique()

732

In [31]:
clients.tariff_plan.value_counts()

tariff_plan
C    395
B    261
A     76
Name: count, dtype: int64

In [32]:
clients.isnull().sum()

user_id        0
tariff_plan    0
date_start     0
dtype: int64

In [33]:
clients.duplicated().sum()

np.int64(0)

In [34]:
clients.date_start = pd.to_datetime(clients.date_start)
clients.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 732 entries, 0 to 731
Data columns (total 3 columns):
 #   Column       Non-Null Count  Dtype         
---  ------       --------------  -----         
 0   user_id      732 non-null    int64         
 1   tariff_plan  732 non-null    object        
 2   date_start   732 non-null    datetime64[ns]
dtypes: datetime64[ns](1), int64(1), object(1)
memory usage: 17.3+ KB


## Data

In [35]:
data.head()

Unnamed: 0,user_id,date,direction,internal,operator_id,is_missed_call,calls_count,call_duration,total_call_duration
0,166377,2019-08-04 00:00:00+03:00,in,False,,True,2,0,4
1,166377,2019-08-05 00:00:00+03:00,out,True,880022.0,True,3,0,5
2,166377,2019-08-05 00:00:00+03:00,out,True,880020.0,True,1,0,1
3,166377,2019-08-05 00:00:00+03:00,out,True,880020.0,False,1,10,18
4,166377,2019-08-05 00:00:00+03:00,out,False,880022.0,True,3,0,25


In [36]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 53902 entries, 0 to 53901
Data columns (total 9 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   user_id              53902 non-null  int64  
 1   date                 53902 non-null  object 
 2   direction            53902 non-null  object 
 3   internal             53785 non-null  object 
 4   operator_id          45730 non-null  float64
 5   is_missed_call       53902 non-null  bool   
 6   calls_count          53902 non-null  int64  
 7   call_duration        53902 non-null  int64  
 8   total_call_duration  53902 non-null  int64  
dtypes: bool(1), float64(1), int64(4), object(3)
memory usage: 3.3+ MB


In [39]:
data[['calls_count', 'call_duration', 'total_call_duration']].describe()

Unnamed: 0,calls_count,call_duration,total_call_duration
count,53902.0,53902.0,53902.0
mean,16.451245,866.684427,1157.133297
std,62.91717,3731.791202,4403.468763
min,1.0,0.0,0.0
25%,1.0,0.0,47.0
50%,4.0,38.0,210.0
75%,12.0,572.0,902.0
max,4817.0,144395.0,166155.0


In [None]:
# Valores ausentes
data.isnull().sum()

user_id                   0
date                      0
direction                 0
internal                117
operator_id            8172
is_missed_call            0
calls_count               0
call_duration             0
total_call_duration       0
dtype: int64

In [96]:
# Veremos las filas con operator_id ausente
operator_null = data[data['operator_id'].isnull()]
operator_null.sample(10)

Unnamed: 0,user_id,date,direction,internal,operator_id,is_missed_call,calls_count,call_duration,total_call_duration
24695,167112,2019-10-15 00:00:00+03:00,in,False,,True,3,0,101
12128,166717,2019-10-14 00:00:00+03:00,in,False,,True,1,0,0
3579,166485,2019-11-21 00:00:00+03:00,in,False,,True,6,0,86
1800,166406,2019-08-07 00:00:00+03:00,in,False,,True,1,0,4
46435,168091,2019-11-28 00:00:00+03:00,in,False,,True,14,0,542
50110,168252,2019-10-25 00:00:00+03:00,in,True,,False,2,145,159
28,166377,2019-08-12 00:00:00+03:00,in,False,,True,2,0,34
51525,168307,2019-11-26 00:00:00+03:00,in,False,,True,1,0,2
17680,166941,2019-09-07 00:00:00+03:00,in,False,,True,5,0,94
29089,167199,2019-10-29 00:00:00+03:00,in,False,,True,1,0,11


In [104]:
# Veremos las filas con internal ausente
internal_null = data[data['internal'].isnull()]
internal_null.sample(5)

Unnamed: 0,user_id,date,direction,internal,operator_id,is_missed_call,calls_count,call_duration,total_call_duration
41462,167870,2019-11-06 00:00:00+03:00,in,,936110.0,False,1,31,45
24494,167110,2019-09-23 00:00:00+03:00,in,,,True,1,0,12
30048,167272,2019-11-25 00:00:00+03:00,in,,,True,1,0,4
38099,167650,2019-10-17 00:00:00+03:00,in,,921318.0,False,1,109,116
51367,168291,2019-11-28 00:00:00+03:00,out,,,True,3,0,93


In [90]:
print(f'Porcentaje de nulos en Operator id: {(operator_null.shape[0] / data.shape[0]*100):.2f}%')
print(f'Porcentaje de nulos en Internal: {(internal_null.shape[0] / data.shape[0]*100):.2f}%')


Porcentaje de nulos en Operator id: 15.16%
Porcentaje de nulos en Internal: 0.22%


In [105]:
data.internal.value_counts()

internal
False    47621
True      6164
Name: count, dtype: int64

In [106]:
data.direction.value_counts()

direction
out    31917
in     21985
Name: count, dtype: int64