# Tariffs recommendation

Here is customers behavior data who have already changed their tariffs.

We need to build a classification model, which can select the right tariff. 

Data is already prepared.

Build the model with the highest possible *accuracy* (min 0.75).

No train-valid-test.
Use all data.

In [1]:
#library import
import pandas as pd
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import accuracy_score
import numpy as np

In [2]:
#check data
users_behavior = pd.read_csv('users_behavior.csv')
display(users_behavior.head())

Unnamed: 0,calls,minutes,messages,mb_used,is_ultra
0,40.0,311.9,83.0,19915.42,0
1,85.0,516.75,56.0,22696.96,0
2,77.0,467.66,86.0,21060.45,0
3,106.0,745.53,81.0,8437.39,1
4,66.0,418.74,1.0,14502.75,0


In [3]:
users_behavior.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3214 entries, 0 to 3213
Data columns (total 5 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   calls     3214 non-null   float64
 1   minutes   3214 non-null   float64
 2   messages  3214 non-null   float64
 3   mb_used   3214 non-null   float64
 4   is_ultra  3214 non-null   int64  
dtypes: float64(4), int64(1)
memory usage: 125.7 KB


#### Сonclusion

Data is already prepared. So, we don't need to do anything(duplicates, nulls etc)
3214 entries and 5 columns in table. Data types are normal.
We have:
- сalls - number of calls(count)
- minutes - total duration of calls in minutes
- messages -  number of spent sms 
- mb_used - spent MB
- is_ultra - tariff name. What tariff did customer use during the month ("Ultra" - 1, "Smart" - 0).

Target is ***is_ultra***.

In [4]:
#features&target
features = users_behavior.drop(['is_ultra'], axis=1)
target = users_behavior['is_ultra']

print(features.shape)
print(target.shape)

(3214, 4)
(3214,)


##### Сonclusion

We don't need train, valid and test, so here is features&target for all.

In [5]:
#depth params Decision Tree Regressor
for depth in range(1, 11):
    model =  DecisionTreeRegressor(random_state=12345, max_depth=depth)
    model.fit(features, target)
    model.predict(features)
    accuracy = model.score(features, target)
    print("max_depth =", depth, ": ", end='') 
    print('accuracy:', accuracy)

max_depth = 1 : accuracy: 0.1245422356649214
max_depth = 2 : accuracy: 0.2104165265003096
max_depth = 3 : accuracy: 0.26269514027349605
max_depth = 4 : accuracy: 0.29608888251375887
max_depth = 5 : accuracy: 0.33559267409465265
max_depth = 6 : accuracy: 0.37426716194879983
max_depth = 7 : accuracy: 0.41237224934443245
max_depth = 8 : accuracy: 0.4603412436004965
max_depth = 9 : accuracy: 0.4997908217573446
max_depth = 10 : accuracy: 0.5402448111128757


#### Сonclusion
max_depth 10 is enough. Result is 0.54. Of course, we can use bigger number, but shouldn't. Too big tree is not good

In [6]:
#depth params Random Forest Regressor
for depth in range(5, 16):
    model =  RandomForestRegressor(random_state=12345, max_depth=depth)
    model.fit(features, target)
    accuracy = model.score(features, target)
    print("max_depth =", depth, ": ", end='') 
    print('accuracy:', accuracy)

max_depth = 5 : accuracy: 0.362275481004361
max_depth = 6 : accuracy: 0.4022621192324852
max_depth = 7 : accuracy: 0.44337619786499416
max_depth = 8 : accuracy: 0.4855772254180293
max_depth = 9 : accuracy: 0.5269517929532881
max_depth = 10 : accuracy: 0.5688037743126255
max_depth = 11 : accuracy: 0.6098133807967852
max_depth = 12 : accuracy: 0.6526963399874277
max_depth = 13 : accuracy: 0.69617774724513
max_depth = 14 : accuracy: 0.7382066765544004
max_depth = 15 : accuracy: 0.7764075569687485


In [7]:
#n_estimators params Random Forest Regressor
for est in range(1, 11):
    model =  RandomForestRegressor(random_state=12345, n_estimators = est)
    model.fit(features, target)
    accuracy = model.score(features, target)
    print("max_est =", est, ": ", end='')
    print('accuracy:', accuracy)

max_est = 1 : accuracy: 0.5418117887650786
max_est = 2 : accuracy: 0.7057641199417917
max_est = 3 : accuracy: 0.7799330316645905
max_est = 4 : accuracy: 0.8149131886325388
max_est = 5 : accuracy: 0.8334127206436612
max_est = 6 : accuracy: 0.843204318200048
max_est = 7 : accuracy: 0.8491327856336733
max_est = 8 : accuracy: 0.8546890122815767
max_est = 9 : accuracy: 0.8581501490954562
max_est = 10 : accuracy: 0.8648125197842013


In [8]:
#Random Forest Regressor
model = RandomForestRegressor(random_state=12345, max_depth = 15, n_estimators= 10)
model.fit(features, target)
model.predict(features)
model.score(features, target)

0.7612999939364278

#### Сonclusion

I think, we can choose this model, because accuracy is 0.76.

But we need to chek Linear Regression before.

In [9]:
#Linear Regression
model = LinearRegression()
model.fit(features, target)
model.predict(features)
model.score(features, target)

0.08668370138262516

#### Сonclusion

No. Too small accuracy.

# Final conclusion

So, we should choose Random Forest Regressor with accuracy 0.76.

Linear Regression has only 0.09 and Decision Tree Regressor has 0.54.

My be, we can check Decision Tree Regressor more. But now, we don't need it.