# Precision Phone Predictions

You have recently been brought on as a business analyst in a startup developing new cell phones. There are three new devices that they're in the process of designing, and the team needs to be able to appropriately price the hardware based on its technical specifications. Someone on the team has already done the work of grouping prices into a set of classes for devices currently available in the market. They were also _very_ kind and have already cleaned your data. However, they need your help to develop a model that will allow the the team to _predict_ a value for the new prototypes. 

Your goals for this investigation will be to:
- Create a model for predicting the price range based on the technical features of the devices
- Determine which features have the highest weighting in determining the price of devices
- Make predictions for the new prototypes' price ranges

The classes for the price ranges of the phone correlate to:

- 0: 0 - 600
- 1: 601 - 1200 
- 2: 1201 - 1800
- 3: 1800+

In the process of finding an appropriate model, we will create two: a random forest classifier, as well as a support vector classifier.

## Random Forest Model 

### Data Preprocessing

To get started, we'll need to input the data which we will use to train and test our model, as well as conduct the appropriate separation of data for our features and targets.

- Import `data.csv` using .read_csv(), and preview your data using `.head()`
- Create the `y` target variable by selecting the "price_range" column
- Create the `X` features variable by using `.drop()` and passing the "price_range" columns as the `columns` argument.
- Split your data into the training and testing sets using `.train_test_split()` and assigning values for `X_train, X_test, y_train, and y_test`

In [1]:
# Libraries Needed
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report

In [2]:
# TODO: Read in data.csv using .read_csv() and store in a variable df
df =  pd.read_csv("data.csv")
# TODO: preview your df using .head()
df.head()

Unnamed: 0,battery_power,blue,clock_speed,dual_sim,fc,four_g,int_memory,m_dep,mobile_wt,n_cores,...,px_height,px_width,ram,sc_h,sc_w,talk_time,three_g,touch_screen,wifi,price_range
0,842,0,2.2,0,1,0,7,0.6,188,2,...,20,756,2549,9,7,19,0,0,1,1
1,1021,1,0.5,1,0,1,53,0.7,136,3,...,905,1988,2631,17,3,7,1,1,0,2
2,563,1,0.5,1,2,1,41,0.9,145,5,...,1263,1716,2603,11,2,9,1,1,0,2
3,615,1,2.5,0,0,0,10,0.8,131,6,...,1216,1786,2769,16,8,11,1,0,0,2
4,1821,1,1.2,0,13,1,44,0.6,141,2,...,1208,1212,1411,8,2,15,1,1,0,1


In [3]:
# TODO: Create `y` by selecting the "price_range" column of df
y_data = df["price_range"]
# TODO: Create `X` by dropping the "price_range" column of df
x_data = df.drop(columns=["price_range"])
x_data.head()

Unnamed: 0,battery_power,blue,clock_speed,dual_sim,fc,four_g,int_memory,m_dep,mobile_wt,n_cores,pc,px_height,px_width,ram,sc_h,sc_w,talk_time,three_g,touch_screen,wifi
0,842,0,2.2,0,1,0,7,0.6,188,2,2,20,756,2549,9,7,19,0,0,1
1,1021,1,0.5,1,0,1,53,0.7,136,3,6,905,1988,2631,17,3,7,1,1,0
2,563,1,0.5,1,2,1,41,0.9,145,5,6,1263,1716,2603,11,2,9,1,1,0
3,615,1,2.5,0,0,0,10,0.8,131,6,9,1216,1786,2769,16,8,11,1,0,0
4,1821,1,1.2,0,13,1,44,0.6,141,2,14,1208,1212,1411,8,2,15,1,1,0


In [4]:
# TODO: Use train_test_split() to split your data into training and testing datasets
x_train, x_test, y_train, y_test = train_test_split(x_data,y_data)

### Model, Fit, Predict

Next, we'll put our model together!
- Instantiate a RandomForestClassifier with 5000 estimators, and store as a variable `model`
- `.fit()` your model using your training data
- Create the `y_pred` variable by using your model to predict the results for `X_test`
- Print your `classification_report()`, passing `y_test` and `y_pred` as arguments

In [5]:
# TODO: instantiate RandomForestClassifier and store it to a variable called model
model = RandomForestClassifier(5000)

In [6]:
# TODO: fit your model using the training data
model.fit(x_train, y_train)

RandomForestClassifier(n_estimators=5000)

In [7]:
# TODO: predict the values for X_test
y_pred = model.predict(x_test)
y_pred

array([0, 0, 0, 3, 1, 0, 1, 1, 0, 1, 3, 1, 2, 2, 1, 0, 2, 2, 2, 3, 2, 2,
       1, 1, 2, 2, 3, 3, 2, 0, 0, 2, 1, 0, 1, 2, 1, 0, 2, 1, 1, 1, 3, 3,
       0, 2, 2, 0, 0, 1, 1, 3, 0, 1, 0, 3, 3, 0, 3, 1, 0, 1, 2, 0, 3, 3,
       0, 0, 1, 2, 3, 1, 1, 2, 0, 0, 2, 2, 2, 2, 2, 2, 1, 0, 1, 3, 1, 0,
       0, 3, 2, 0, 3, 1, 1, 1, 3, 0, 3, 0, 1, 3, 0, 0, 2, 0, 1, 2, 2, 3,
       3, 1, 2, 1, 0, 0, 0, 0, 3, 2, 2, 1, 2, 1, 1, 2, 2, 3, 2, 0, 3, 1,
       3, 3, 1, 0, 1, 3, 1, 0, 1, 1, 3, 1, 0, 3, 2, 2, 0, 3, 3, 0, 2, 0,
       0, 3, 0, 0, 0, 2, 1, 0, 1, 2, 1, 2, 2, 1, 1, 3, 2, 2, 0, 0, 3, 3,
       2, 2, 3, 0, 1, 0, 2, 0, 3, 1, 2, 3, 3, 2, 3, 0, 1, 1, 1, 0, 0, 2,
       1, 2, 0, 3, 1, 2, 3, 2, 0, 2, 3, 2, 3, 0, 3, 1, 2, 2, 1, 2, 2, 1,
       0, 0, 0, 2, 1, 2, 1, 1, 1, 1, 1, 3, 0, 1, 3, 3, 3, 3, 0, 2, 3, 2,
       1, 0, 3, 2, 1, 3, 0, 0, 0, 2, 1, 2, 1, 3, 3, 3, 1, 0, 3, 3, 3, 0,
       1, 0, 2, 1, 2, 3, 3, 3, 1, 2, 2, 0, 3, 1, 2, 3, 1, 1, 3, 2, 3, 0,
       2, 3, 0, 3, 3, 0, 0, 2, 1, 2, 1, 2, 1, 1, 3,

In [8]:
# TODO: print the classification report using y_test and y_pred
print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

           0       0.91      0.91      0.91       128
           1       0.83      0.82      0.82       129
           2       0.81      0.84      0.83       116
           3       0.94      0.91      0.92       127

    accuracy                           0.87       500
   macro avg       0.87      0.87      0.87       500
weighted avg       0.87      0.87      0.87       500



What is the overall accuracy of your model? If it is very high (above 80%) or very low (below 20%) do you think there is a reason for that level of accuracy?
> The over all accuracy in the model is 0.90 which is very high and above 80%. I belive the reason for the accuracy is because the overall model found high accurances in the data findings.  

### Feature importances
How important is each one of our features? We can use this analysis to help the engineering team to determine which features are important to design well for our prototypes.

- Create the `feature_importances` variable by using python's `zip()` function on the `X_train`'s columns and the `.feature_importances_` proprerty on your `model`.
- Create `list_importances` by casting the `feature_importances` as a list using the `list()` function. Warning: **Do not simply place `feature_importances` in a set of brackets []. That will not work in this case**
- Make a DataFrame `importance_df` using `pd.DataFrame` on `list_importances`, and make sure to asign column names for "feature" and "importance"
- Sort `importance_df` using `.sort_values()` on the "importance column, and set the `ascending` argument to False, and the `inplace` argument to True
- Preview the top 5 most important features using `.head()`
- Preview the 5 least important features using `.tail()`

In [9]:
# TODO: zip() the X_train's columns and the model's .feature_importances_ together
feature_importances = zip(x_train.columns, model.feature_importances_)
# TODO: cast feature_importances as a list using the list() function, and store the result as list_importances
list_importances = list(feature_importances)
# TODO: cast list_importances as a dataframe, making sure to set the columns to be "feature" and "importance", store the result as importance_df
importance_df = pd.DataFrame(list_importances, columns=["feature", "importance"])
# TODO: sort the values of importance_df by the "importance" column in descending order. Either set the argument inplace to true, or reassign importance_df back to itself
importance_df.sort_values("importance", ascending=False, inplace=True)
# TODO: preview importance_df using .head()
importance_df.head(5)

Unnamed: 0,feature,importance
13,ram,0.476
0,battery_power,0.072972
12,px_width,0.056434
11,px_height,0.054101
8,mobile_wt,0.041036


In [10]:
# TODO: preview importance_df using .tail()
importance_df.tail(5)

Unnamed: 0,feature,importance
3,dual_sim,0.007141
19,wifi,0.006819
5,four_g,0.006659
1,blue,0.006513
17,three_g,0.005846


What were the five most important features to your model?
> ram, battery_power, px_width, px_height, mobile_wt

Why do you think those are the most important? Do any of those surprise you? 
> I would like to think that talk time would be important as well but it didnt seem to be important in my findings. 

What were the five least impactful features to the model?
>  Dual_slim, wifi, four_g, blue, three_g

Why do you think those are _less_ important than other features? Are any of them surprising? Hint: consider how a lack of product differentiation makes it difficult to add additional value to a product.
> I was suprised that touch screens was part of the less important features in the list. 

Of the two sets of five most and least important features, are any of them _overwhelmingly_ more or less important than others in the same category?
> I think three_g is the least important feauture because the three_g are slow and don't bring any value. 

Based on what you have learned about the most and least important features, what would you recommend to the engineering team to focus on in their designs?
> I would recommend the engineering team to consentrate on having the best ram with a good battery power. 

If your engineering team were to completely remove the five features that had the least importance, would you be happy with the product? Why or why not?
> Yes I would be happy because I would still have good feature like ram and a good battery. 

Given your previous answer, what does that tell you about business decisions made _purely_ by arguments with data?
> I think that data can help a business decision on what is most important based on data findings. 

## A Second Opinion: SVC

Now that we have created a model using the Random Forest, and have gathered which features are most important, let's build a second model to see if we can get one that is even more accurate!

### Model, Fit, Predict II: Now with more prediction!

In our next steps, we will create our SVC model to compare with our Random Forest. 

- Instantiate a new model `svc` using SVC() from sklearn
- Fit the model using our `X_train` and `y_train` from before
- Create an array `svc_results` using `.predict()` on our `X_test` data 
- Print the `classification_report()` using `y_test` and our `svc_results`

In [11]:
# Provided Code -- Do NOT Edit!
from sklearn.svm import SVC

In [12]:
# TODO: instantiate SVC() and store to a variable svc
svc = SVC()

In [13]:
# TODO: fit svc using the training data
svc.fit(x_train, y_train)

SVC()

In [14]:
# TODO: make predictions on the test feature data using svc, and store the result to svc_results
svc_results = svc.predict(x_test) 
svc_results

array([0, 0, 0, 3, 1, 0, 1, 1, 0, 1, 3, 1, 1, 2, 1, 0, 2, 2, 2, 3, 2, 1,
       1, 1, 2, 2, 3, 2, 2, 0, 0, 2, 1, 0, 1, 2, 1, 0, 2, 1, 1, 1, 3, 2,
       0, 1, 2, 0, 0, 1, 1, 3, 0, 0, 0, 3, 3, 0, 3, 1, 0, 1, 2, 0, 3, 3,
       0, 0, 1, 2, 3, 1, 1, 2, 1, 0, 2, 2, 2, 2, 2, 1, 1, 0, 0, 3, 1, 0,
       0, 3, 3, 0, 3, 1, 1, 2, 3, 0, 3, 0, 1, 3, 0, 0, 3, 0, 1, 2, 3, 3,
       3, 1, 2, 1, 0, 0, 0, 0, 3, 2, 2, 1, 2, 1, 1, 1, 2, 3, 2, 1, 3, 1,
       3, 3, 1, 0, 1, 3, 1, 0, 1, 1, 3, 0, 0, 3, 2, 2, 0, 3, 3, 0, 2, 0,
       0, 3, 1, 0, 0, 2, 1, 0, 1, 2, 1, 2, 2, 1, 0, 3, 2, 2, 0, 0, 3, 3,
       2, 2, 3, 1, 1, 0, 2, 0, 3, 1, 2, 3, 3, 2, 3, 0, 1, 1, 1, 0, 0, 2,
       1, 2, 1, 3, 1, 2, 3, 2, 0, 2, 3, 2, 3, 0, 3, 1, 3, 2, 1, 2, 2, 1,
       0, 0, 0, 2, 1, 1, 1, 1, 1, 1, 1, 3, 0, 1, 3, 3, 3, 3, 0, 3, 3, 2,
       1, 0, 3, 2, 2, 3, 0, 0, 0, 2, 1, 2, 1, 3, 3, 3, 1, 0, 3, 3, 3, 0,
       1, 0, 2, 1, 2, 3, 3, 3, 1, 1, 2, 1, 3, 1, 1, 3, 1, 1, 3, 2, 3, 0,
       2, 3, 0, 3, 3, 0, 0, 2, 1, 2, 1, 2, 1, 1, 3,

In [15]:
# TODO: print the classification report using y_test and svc_results
print(classification_report(y_test,svc_results))

              precision    recall  f1-score   support

           0       0.96      0.95      0.96       128
           1       0.90      0.95      0.93       129
           2       0.96      0.91      0.94       116
           3       0.98      0.98      0.98       127

    accuracy                           0.95       500
   macro avg       0.95      0.95      0.95       500
weighted avg       0.95      0.95      0.95       500



How Accurate is your SVC model?
> The svc model is about 0.95 accurante which is over 80% in accurancy. 

Is it "better" or "worse" than your Random Forest model?
> Based on the results it is worse than the Random Forest model.

## Perceptively Predicting Prototypes


Finally, let's make some predictions about our prototypes!

- Import the `prototypes.csv` dataset from the data folder and store as `prototypes_df`
- Preview prototypes_df using `.head()`
- Use your trained Random Forest and SVC models, to make predictions, and store into to `forest_prototypes` and `svc_prototypes`, respectively.
- Print `forest_prototypes` and `svc_prototypes`

In [16]:
# TODO: import the prototype data using .read_csv()
prototypes_df = pd.read_csv("prototypes.csv")
# TODO: preview the data using .head()
prototypes_df.head()

Unnamed: 0,battery_power,blue,clock_speed,dual_sim,fc,four_g,int_memory,m_dep,mobile_wt,n_cores,pc,px_height,px_width,ram,sc_h,sc_w,talk_time,three_g,touch_screen,wifi
0,1859,0,0.5,1,3,0,22,0.7,164,1,7,1004,1654,1067,17,1,10,1,0,0
1,503,0,1.2,1,5,1,8,0.4,111,3,13,201,1245,2583,11,0,12,1,0,0
2,1195,1,2.8,0,1,1,20,0.8,110,2,14,1580,1652,504,9,3,12,1,1,0


In [17]:
# TODO: use the `model` from before to predict price ranges on your prototype data and store to a variable forest_prototypes
forest_prototypes = model.predict(prototypes_df)
# TODO: use the `svc` model from before to predict price ranges on your prototype data and store to a variable svc_prototypes
svc_prototypes = svc.predict(prototypes_df)

In [18]:
# TODO: print forest_prototypes
forest_prototypes

array([1, 2, 0])

In [19]:
# TODO: print svc_prototypes
svc_prototypes

array([1, 1, 0])

Do your models agree on pricing for all three phones?
> No, the findings in the model show that it does not agree to have the same arrays but have similar findings of 1 and 0. 

What would be your recommendation to the team that will create pricing for this phone?
> I would recommend that they consider all the top five feautures and the pricing to stay between 600 to 1200 dollars for the company to make more profitable prototypes. 