### This jupyter notebook contains practical assignment in the topic "Apply function in pandas"

**Author : Umidjon Sattorov. Machine Learning engineer**

In [1]:
import pandas as pd
import numpy as np

In [2]:
# Telecommunication company customer churn dataset
df_tel = pd.read_csv('./Telco_Cusomer_Churn.csv')
df_tel.head()

Unnamed: 0,customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,...,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
0,7590-VHVEG,Female,0,Yes,No,1,No,No phone service,DSL,No,...,No,No,No,No,Month-to-month,Yes,Electronic check,29.85,29.85,No
1,5575-GNVDE,Male,0,No,No,34,Yes,No,DSL,Yes,...,Yes,No,No,No,One year,No,Mailed check,56.95,1889.5,No
2,3668-QPYBK,Male,0,No,No,2,Yes,No,DSL,Yes,...,No,No,No,No,Month-to-month,Yes,Mailed check,53.85,108.15,Yes
3,7795-CFOCW,Male,0,No,No,45,No,No phone service,DSL,Yes,...,Yes,Yes,No,No,One year,No,Bank transfer (automatic),42.3,1840.75,No
4,9237-HQITU,Female,0,No,No,2,Yes,No,Fiber optic,No,...,No,No,No,No,Month-to-month,Yes,Electronic check,70.7,151.65,Yes


* This dataset is from a telecommunication company that provides phone and internet services. Each row represents a customer record, including demographic information, subscription details, and billing history.

* The goal of such datasets is often to analyze customer churn — whether a customer leaves the service or stays.

| Column               | Description (English)                                             | Description (Русский)                                         | Description (O‘zbek)                                   |
| -------------------- | ----------------------------------------------------------------- | ------------------------------------------------------------- | ------------------------------------------------------ |
| **customerID**       | Unique ID assigned to each customer                               | Уникальный идентификатор клиента                              | Har bir mijoz uchun berilgan noyob ID                  |
| **gender**           | Customer’s gender: Male / Female                                  | Пол клиента: Мужчина / Женщина                                | Mijozning jinsi: Erkak / Ayol                          |
| **SeniorCitizen**    | Whether the customer is a senior citizen (1 = Yes, 0 = No)        | Является ли клиент пожилым (1 = Да, 0 = Нет)                  | Mijoz keksami (1 = Ha, 0 = Yo‘q)                       |
| **Partner**          | Whether the customer has a partner (Yes/No)                       | Есть ли у клиента партнёр (Да/Нет)                            | Mijozning turmush o‘rtog‘i bormi (Ha/Yo‘q)             |
| **Dependents**       | Whether the customer has dependents (children, etc.)              | Есть ли иждивенцы (дети и т.д.)                               | Mijozning boqimandasi bormi (farzand va h.k.)          |
| **tenure**           | Number of months the customer has stayed with the company         | Количество месяцев сотрудничества с компанией                 | Mijoz kompaniya bilan bo‘lgan oylar soni               |
| **PhoneService**     | Does the customer have phone service? (Yes/No)                    | Есть ли у клиента телефонная связь? (Да/Нет)                  | Mijozda telefon xizmati bormi? (Ha/Yo‘q)               |
| **MultipleLines**    | Whether the customer has multiple phone lines                     | Есть ли несколько телефонных линий                            | Mijozda bir nechta telefon liniyalari bormi            |
| **InternetService**  | Type of internet service: DSL / Fiber optic / No                  | Тип интернета: DSL / Оптоволокно / Нет                        | Internet xizmati turi: DSL / Optik tolali / Yo‘q       |
| **OnlineSecurity**   | Whether the customer has online security add-on                   | Есть ли онлайн-безопасность                                   | Mijozda onlayn xavfsizlik xizmati bormi                |
| **DeviceProtection** | Whether the customer has device protection service                | Есть ли защита устройств                                      | Qurilma himoya xizmati bormi                           |
| **TechSupport**      | Whether the customer has technical support service                | Есть ли техническая поддержка                                 | Texnik yordam xizmati bormi                            |
| **StreamingTV**      | Whether the customer has streaming TV service                     | Есть ли потоковое телевидение                                 | Mijozda onlayn TV xizmati bormi                        |
| **StreamingMovies**  | Whether the customer has streaming movies service                 | Есть ли потоковое кино                                        | Mijozda onlayn kino xizmati bormi                      |
| **Contract**         | Type of contract: Month-to-month / One year / Two year            | Тип контракта: помесячный / годовой / двухгодичный            | Shartnoma turi: Oylik / Bir yillik / Ikki yillik       |
| **PaperlessBilling** | Whether the customer uses paperless billing                       | Использует ли клиент электронный счёт                         | Qog‘ozsiz hisob-kitobdan foydalanadimi                 |
| **PaymentMethod**    | Customer’s payment method (e.g., Electronic check, Bank transfer) | Способ оплаты (например, электронный чек, банковский перевод) | To‘lov usuli (masalan, elektron chek, bank o‘tkazmasi) |
| **MonthlyCharges**   | Amount charged to the customer monthly                            | Ежемесячная сумма оплаты                                      | Mijozning oylik to‘lovi                                |
| **TotalCharges**     | Total amount charged during the entire tenure                     | Общая сумма оплат за всё время                                | Mijoz to‘lagan jami summa                              |
| **Churn**            | Whether the customer left the company (Yes/No)                    | Покинул ли клиент компанию (Да/Нет)                           | Mijoz kompaniyani tark etganmi (Ha/Yo‘q)               |


___

### **Tasks**

##### 1. When we work on the datasets, as machine learning engineer we have to be able to convert all string and other types of variables to numerical form. This is also called Feature engineering. For example in the dataset above, we have columns like OnlineSecurity and Device protection. They contain binary variables like Yes or No. We can actually convert this variable to numerical form by assigning all yes to 1 and no variable to 0. In the following task, you have to convert variables to 1 and 0.

Convert yes and no variables in the columns "OnlineSecurity" and "DeviceProtection" to 1 and 0's.

* Yes -> 1
* No -> 0

In [9]:
# Your code goes here
def num(OnlineSecurity):
    if OnlineSecurity == 'Yes':
        return 1
    else:
        return 0
    

df_tel['OnlineSecurity']= df_tel['OnlineSecurity' ].apply(num)
df_tel

Unnamed: 0,customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,...,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
0,7590-VHVEG,Female,0,Yes,No,1,No,No phone service,DSL,0,...,No,No,No,No,Month-to-month,Yes,Electronic check,29.85,29.85,No
1,5575-GNVDE,Male,0,No,No,34,Yes,No,DSL,1,...,Yes,No,No,No,One year,No,Mailed check,56.95,1889.5,No
2,3668-QPYBK,Male,0,No,No,2,Yes,No,DSL,1,...,No,No,No,No,Month-to-month,Yes,Mailed check,53.85,108.15,Yes
3,7795-CFOCW,Male,0,No,No,45,No,No phone service,DSL,1,...,Yes,Yes,No,No,One year,No,Bank transfer (automatic),42.30,1840.75,No
4,9237-HQITU,Female,0,No,No,2,Yes,No,Fiber optic,0,...,No,No,No,No,Month-to-month,Yes,Electronic check,70.70,151.65,Yes
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7038,6840-RESVB,Male,0,Yes,Yes,24,Yes,Yes,DSL,1,...,Yes,Yes,Yes,Yes,One year,Yes,Mailed check,84.80,1990.5,No
7039,2234-XADUH,Female,0,Yes,Yes,72,Yes,Yes,Fiber optic,0,...,Yes,No,Yes,Yes,One year,Yes,Credit card (automatic),103.20,7362.9,No
7040,4801-JZAZL,Female,0,Yes,Yes,11,No,No phone service,DSL,1,...,No,No,No,No,Month-to-month,Yes,Electronic check,29.60,346.45,No
7041,8361-LTMKD,Male,1,Yes,No,4,Yes,Yes,Fiber optic,0,...,No,No,No,No,Month-to-month,Yes,Mailed check,74.40,306.6,Yes


In [10]:
# Your code goes here
def num(DeviceProtection):
    if DeviceProtection == 'Yes':
        return 1
    else:
        return 0
    

df_tel['DeviceProtection']= df_tel['DeviceProtection' ].apply(num)
df_tel

Unnamed: 0,customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,...,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
0,7590-VHVEG,Female,0,Yes,No,1,No,No phone service,DSL,0,...,0,No,No,No,Month-to-month,Yes,Electronic check,29.85,29.85,No
1,5575-GNVDE,Male,0,No,No,34,Yes,No,DSL,1,...,1,No,No,No,One year,No,Mailed check,56.95,1889.5,No
2,3668-QPYBK,Male,0,No,No,2,Yes,No,DSL,1,...,0,No,No,No,Month-to-month,Yes,Mailed check,53.85,108.15,Yes
3,7795-CFOCW,Male,0,No,No,45,No,No phone service,DSL,1,...,1,Yes,No,No,One year,No,Bank transfer (automatic),42.30,1840.75,No
4,9237-HQITU,Female,0,No,No,2,Yes,No,Fiber optic,0,...,0,No,No,No,Month-to-month,Yes,Electronic check,70.70,151.65,Yes
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7038,6840-RESVB,Male,0,Yes,Yes,24,Yes,Yes,DSL,1,...,1,Yes,Yes,Yes,One year,Yes,Mailed check,84.80,1990.5,No
7039,2234-XADUH,Female,0,Yes,Yes,72,Yes,Yes,Fiber optic,0,...,1,No,Yes,Yes,One year,Yes,Credit card (automatic),103.20,7362.9,No
7040,4801-JZAZL,Female,0,Yes,Yes,11,No,No phone service,DSL,1,...,0,No,No,No,Month-to-month,Yes,Electronic check,29.60,346.45,No
7041,8361-LTMKD,Male,1,Yes,No,4,Yes,Yes,Fiber optic,0,...,0,No,No,No,Month-to-month,Yes,Mailed check,74.40,306.6,Yes


#### 2.Because this dataset is primarily focuced on whether the cutomer of the company satisfied or not, we can make new feature out of existing ones which indicate customer's loyalty. That is how long the customer is using the service of the telecom company. This is calculated by dividing "TotalCharges" to "MonthlyCharges". Create new feature called loyalty which contains result of this division.

In [20]:
# Your code goes here
df_tel['loyalty']= df_tel['TotalCharges'] / df_tel['MonthlyCharges'] 
df_tel

TypeError: unsupported operand type(s) for /: 'str' and 'float'

#### 3.Also the feature gender contains non-numerical feature, it is better to encode them to numerical values.

"Male" - 1
"Female" - 0 
 
or reverse, it is your choice.

In [15]:
# Your code goes here
def gen(gender):
    if gender == 'Female':
        return 0
    else:
        return 1
    
df_tel['gender']= df_tel['gender'].apply(gen)
df_tel


Unnamed: 0,customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,...,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
0,7590-VHVEG,0,0,Yes,No,1,No,No phone service,DSL,0,...,0,No,No,No,Month-to-month,Yes,Electronic check,29.85,29.85,No
1,5575-GNVDE,1,0,No,No,34,Yes,No,DSL,1,...,1,No,No,No,One year,No,Mailed check,56.95,1889.5,No
2,3668-QPYBK,1,0,No,No,2,Yes,No,DSL,1,...,0,No,No,No,Month-to-month,Yes,Mailed check,53.85,108.15,Yes
3,7795-CFOCW,1,0,No,No,45,No,No phone service,DSL,1,...,1,Yes,No,No,One year,No,Bank transfer (automatic),42.30,1840.75,No
4,9237-HQITU,0,0,No,No,2,Yes,No,Fiber optic,0,...,0,No,No,No,Month-to-month,Yes,Electronic check,70.70,151.65,Yes
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7038,6840-RESVB,1,0,Yes,Yes,24,Yes,Yes,DSL,1,...,1,Yes,Yes,Yes,One year,Yes,Mailed check,84.80,1990.5,No
7039,2234-XADUH,0,0,Yes,Yes,72,Yes,Yes,Fiber optic,0,...,1,No,Yes,Yes,One year,Yes,Credit card (automatic),103.20,7362.9,No
7040,4801-JZAZL,0,0,Yes,Yes,11,No,No phone service,DSL,1,...,0,No,No,No,Month-to-month,Yes,Electronic check,29.60,346.45,No
7041,8361-LTMKD,1,1,Yes,No,4,Yes,Yes,Fiber optic,0,...,0,No,No,No,Month-to-month,Yes,Mailed check,74.40,306.6,Yes


#### 4. The column MultipleLines indicate whether the customer more than one line of service or not. But if we look at unique values of this column, we can see that No - refers to customer with one line of service, Yes - more than one, No phone service refers to customer with No phone line service. We ca encode this variables with following map :

* Yes -> 2
* No -> 1
* No phone service -> 0

In the following code cell, implement this encoding using apply function.

In [8]:
df_tel['MultipleLines'].value_counts()

MultipleLines
No                  3390
Yes                 2971
No phone service     682
Name: count, dtype: int64

In [16]:
# Your code goes here
def num1(MultipleLines):
    if MultipleLines== 'No':
        return 1
    elif MultipleLines== 'Yes':
        return 2
    else:
        return 0 
    
df_tel['MultipleLines']= df_tel['MultipleLines'].apply(num1)
df_tel


Unnamed: 0,customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,...,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
0,7590-VHVEG,0,0,Yes,No,1,No,0,DSL,0,...,0,No,No,No,Month-to-month,Yes,Electronic check,29.85,29.85,No
1,5575-GNVDE,1,0,No,No,34,Yes,1,DSL,1,...,1,No,No,No,One year,No,Mailed check,56.95,1889.5,No
2,3668-QPYBK,1,0,No,No,2,Yes,1,DSL,1,...,0,No,No,No,Month-to-month,Yes,Mailed check,53.85,108.15,Yes
3,7795-CFOCW,1,0,No,No,45,No,0,DSL,1,...,1,Yes,No,No,One year,No,Bank transfer (automatic),42.30,1840.75,No
4,9237-HQITU,0,0,No,No,2,Yes,1,Fiber optic,0,...,0,No,No,No,Month-to-month,Yes,Electronic check,70.70,151.65,Yes
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7038,6840-RESVB,1,0,Yes,Yes,24,Yes,2,DSL,1,...,1,Yes,Yes,Yes,One year,Yes,Mailed check,84.80,1990.5,No
7039,2234-XADUH,0,0,Yes,Yes,72,Yes,2,Fiber optic,0,...,1,No,Yes,Yes,One year,Yes,Credit card (automatic),103.20,7362.9,No
7040,4801-JZAZL,0,0,Yes,Yes,11,No,0,DSL,1,...,0,No,No,No,Month-to-month,Yes,Electronic check,29.60,346.45,No
7041,8361-LTMKD,1,1,Yes,No,4,Yes,2,Fiber optic,0,...,0,No,No,No,Month-to-month,Yes,Mailed check,74.40,306.6,Yes


#### 5. Convert the variables of the column "InternetService" in following order

DLS -> D
Fiber optic -> F
No -> N

Just first letter of the each variable.

In [18]:
# Your code goes here
df_tel['InternetService'].value_counts()

InternetService
Fiber optic    3096
DSL            2421
No             1526
Name: count, dtype: int64