# Classifying price range for jewelry - Modeling 

## Capstone Project One: Springboard Data Science Career track 

### Notebook by Rupal Gandhre




### Introduction: 
The Jewelry industry has a potential to benefit from data and advanced analytics. Many of the retail industry are already leveraging the benefits. These days most of the sales have been through ecommerce websites. Even I have bought jewelry online!! 

####    Goal:
The goal of this project is to classify price range for jewelry based on the features of jewelry. The features include 
1. Metal of jewelry (18K Gold, 14K Gold, Sterling Silver)
2. Type of Stone(Diamond or Gemstones)
3. Color of the Stone
4. Cut of the Stone 
5. Carat weight of the Stone

This model may help the client to get a price range for the custom jewelry.


####  The Data:

Data is web-scrapped from one of the leading jewelry brand using BeautifulSoup. I am thankful to the web developers for not implementing a script to block my nuisance of an IP address.




#### Import the necessary libraries and the data

## Applying Best Model

In this notebook, we now take our model for jewelry price and leverage it to gain some insights.

In [1]:
import pandas as pd
import numpy as np
import os
import pickle
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn import __version__ as sklearn_version
from sklearn.model_selection import cross_validate

import warnings
warnings.filterwarnings('ignore')  # "error", "ignore", "always", "default", "module" or "once"

### Load Model

In [2]:
# This isn't exactly production-grade, but a quick check for development
# These checks can save some head-scratching in development when moving from
# one python environment to another, for example
expected_model_version = '1.0'
model_path = '../models/gemstones_rings_pricing_model.pkl'

if os.path.exists(model_path):
    with open(model_path, 'rb') as f:
        model = pickle.load(f)
    if model.version != expected_model_version:
        print("Expected model version doesn't match version loaded")
    if model.sklearn_version != sklearn_version:
        print("Warning: model created under different sklearn version")
else:
    print("Expected model not found")

In [3]:
filename= "/Users/rupalgandhre/SpringBoard/DataScience_Capstone2/data/processed/Preprocessing_Gemstone_Rings.csv"
df = pd.read_csv(filename)

In [4]:
df_gemstones = df.drop(columns=['Description', 'Discount_Price', 'Metal',
                                                'Metal_Color', 'Stones','Jewelry_Type',
                                                'Product_Carat', 'Stone1_Desc', 'Price',
                                                'Stone1_Stone','Stone1_Carat','Stone1_Color',
                                                'Stone1_Cut'])

In [5]:
df_gemstones.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 534 entries, 0 to 533
Data columns (total 60 columns):
 #   Column                                Non-Null Count  Dtype 
---  ------                                --------------  ----- 
 0   Price_Category                        534 non-null    object
 1   Stone1_Stone_Alexandrite              534 non-null    int64 
 2   Stone1_Stone_Amethyst                 534 non-null    int64 
 3   Stone1_Stone_Aquamarine               534 non-null    int64 
 4   Stone1_Stone_Citrine                  534 non-null    int64 
 5   Stone1_Stone_Emerald                  534 non-null    int64 
 6   Stone1_Stone_Garnet                   534 non-null    int64 
 7   Stone1_Stone_Jade                     534 non-null    int64 
 8   Stone1_Stone_Lapis-Lazuli             534 non-null    int64 
 9   Stone1_Stone_Malachite                534 non-null    int64 
 10  Stone1_Stone_Morganite                534 non-null    int64 
 11  Stone1_Stone_Multi-Color        

In [6]:
model.X_columns

['Stone1_Stone_Alexandrite',
 'Stone1_Stone_Amethyst',
 'Stone1_Stone_Aquamarine',
 'Stone1_Stone_Citrine',
 'Stone1_Stone_Emerald',
 'Stone1_Stone_Garnet',
 'Stone1_Stone_Jade',
 'Stone1_Stone_Lapis-Lazuli',
 'Stone1_Stone_Malachite',
 'Stone1_Stone_Morganite',
 'Stone1_Stone_Multi-Color',
 'Stone1_Stone_Multi-Sapphire',
 'Stone1_Stone_Onyx',
 'Stone1_Stone_Opal',
 'Stone1_Stone_Quartz',
 'Stone1_Stone_Ruby',
 'Stone1_Stone_Sapphire',
 'Stone1_Stone_Tanzanite',
 'Stone1_Stone_Topaz',
 'Stone1_Stone_Turquoise',
 'stone1_carat_under_0.5',
 'stone1_carat_above_0.5_and_under_1.0',
 'stone1_carat_above_1.0_and_under_1.5',
 'stone1_carat_above_1.5_and_under_2.0',
 'stone1_carat_above_2.0_and_under_2.5',
 'stone1_carat_above_2.5_and_under_3.0',
 'stone1_carat_above_3.0_and_under_3.5',
 'stone1_carat_above_3.5_and_under_4.0',
 'stone1_carat_above_4.0_and_under_4.5',
 'stone1_carat_above_4.5',
 'Stone1_Color_Blue',
 'Stone1_Color_Black',
 'Stone1_Color_Bluish-Green',
 'Stone1_Color_White',
 'S

In [7]:
X = df_gemstones[model.X_columns]
y = df_gemstones.Price_Category

In [8]:
len(X), len(y)

(534, 534)

In [9]:
model.fit(X,y)

LogisticRegression(C=100, solver='liblinear')

In [10]:
filename= "/Users/rupalgandhre/SpringBoard/DataScience_Capstone2/data/processed/Test_Gemstone_Rings.csv"
df_test = pd.read_csv(filename)

In [11]:
X_test = df_test[model.X_columns]
y_test = df_test.Price_Category

In [12]:
model_pred = model.predict(X_test)

In [13]:
y_test

0      above_6000_under_8000
1      above_4000_under_5000
2                 under_2000
3                 under_2000
4                 under_2000
               ...          
371               under_2000
372               under_2000
373    above_3000_under_4000
374               under_2000
375               under_2000
Name: Price_Category, Length: 376, dtype: object

In [14]:
model_pred

array(['above_5000_under_6000', 'above_5000_under_6000', 'under_2000',
       'under_2000', 'under_2000', 'under_2000', 'above_3000_under_4000',
       'above_2000_under_3000', 'above_2000_under_3000',
       'above_3000_under_4000', 'above_2000_under_3000',
       'above_3000_under_4000', 'above_2000_under_3000',
       'above_2000_under_3000', 'above_2000_under_3000',
       'above_2000_under_3000', 'above_2000_under_3000',
       'above_2000_under_3000', 'above_2000_under_3000',
       'above_2000_under_3000', 'above_2000_under_3000', 'under_2000',
       'above_2000_under_3000', 'above_2000_under_3000', 'under_2000',
       'above_3000_under_4000', 'under_2000', 'above_6000_under_8000',
       'above_3000_under_4000', 'above_3000_under_4000',
       'above_6000_under_8000', 'above_3000_under_4000',
       'above_3000_under_4000', 'above_3000_under_4000', 'under_2000',
       'above_6000_under_8000', 'above_3000_under_4000',
       'above_6000_under_8000', 'under_2000', 'above_3000_