# Libraries

### Data Handling
- Pandas
- Regex
- Numpy

### Data Enriching
- Requests
- BeautifulSoup

### Data Exploration
- Missingno

### Data Visualization
- Matplotlib
- Seaborn
- Plotly

# Starting point

# Diamonds 4Cs

- **Cut** is determined by how a diamond’s facets interact with light. It is influenced by three factors:
    - Precision of cut: How the size and angles relate to the different parts of the stone
    - Symmetry: How precisely the various facets of a diamond align and intersect
    - Polish: The details and placement of the facet shapes as well as the outside finish of the diamond
    
    According to those parameters, the Gemological Institue of America ([GIA](https://4cs.gia.edu/en-us/diamond-cut/)) labels each diamonds in cut quality increasing order into Poor, Fair, Good, Very Good, and Excellent. In the case of our sample, we are going to consider Premium and Ideal as to levels of excellency, being Ideal the highest one
    
    
- **Clarity** is a measure of the purity and rarity of the stone graded by the visibility of these characteristics under 10-power magnification. A stone is graded as flawless (FL) if, under 10-power magnification, no inclusions (internal flaws) and no blemishes (external imperfections) are visible. Regarding this measure, we could rank diamonds by 
    - IF: diamonds are Internally Flawless,
    - VVS1 VVS2: Very, Very Slightly Included
    - VS1 VS2: Very Slightly Included
    - SI1 SI2:Slightly Included
    - I1 I2 I3: Imperfect
    

- **Color** refers to the natural tint inherent in white diamonds, being the most appreciated ones the colorless 
    - DEF: Colorless
    - GHIJ: Near colorless
    - KLM: Faint yellow
    - NOPQR: Very light yellow
    - STUVWXYX: Light yellow
    
    
- **Carat** denotes the weight of a diamond. One carat equals .20 grams.  it is important to note that carat weight does not necessarily determine size. The majority of diamonds used in fine jewelry weigh one carat or less


# Diamonds measures

![](diamondmeasures.png)

- **Depth** refers to the diamond, measured from the Culet to the table, divided by its average Girdle Diameter. It gives a glimpse of how proportionated the diamond is

    - A diamond with a lower depth percentage usually appears larger due to its increased width, but often creates a dark appearance as the diamond doesn’t reflect light as well
    - A diamond with too high of a depth percentage loses light out the bottom of the diamond, making it appear dull
    
    Depth = height/width


- **Table** : The Width of the Diamond's Table expressed as a Percentage of its Average Diameter

     - If the table percentage is too low, light gets trapped inside the diamond and leaks out the sides of the diamond (instead of reflecting back through the table). 
     - If the table percentage is too high, light doesn’t reflect off of the diamond’s crown angles and facets leaving the diamond looking dull. 
     
     Table = table facet width/width


- **x** : Length of the Diamond in mm


- **y** : Width of the Diamond in mm


- **z**: Height of the Diamond in mm


- **Price** : the Price of the Diamond


## Let's see our data analysis & conclusions!

![Let see our diamonds conclusions](maxresdefault.jpg)

In [1]:
import pandas as pd
import numpy as np

# Data Visualization
import matplotlib.pyplot as plt
import seaborn as sns
from IPython.display import set_matplotlib_formats
import cufflinks as cf
import plotly.express as px
import matplotlib.gridspec as gridspec
import plotly.graph_objects as go
import plotly.figure_factory as ff

diamonds=pd.read_csv(r'/home/carmencuadrado/Ironhack/ih_datamadpt0420_project_m2/diamonds_result.csv')

# Data Analysis


## Data Exploration

Our primarely exploratory analysis raised our awareness of

- **The small data of our dataframe**, which did have volume - over 40k entries - but not variety - only 10 data attributes - and variability, as the data was static, integrated and consistent
- **The dataframe cleaning needs**, as there were measures taking 0 as a value
- **The dataframe information possibilities** as it offered key diamonds' information which would led us to make inferences and draw business conclusions


## Variables-price correlation

Representing the correlation between the different variables and price, we deducted that

- **Price-carat correlation was the strongest one** - 0.92, even though diamonds with low carat could reach high values and vice versa. Therefore, we further explore the relation among variables an realised
- **Price-color/clarity correlation was relevant** as they were responsible for the price-carat relation distortion
- **Price-cut correlation was negligible**. Price slightly increases whenever the cut quality does, however, this trend it broken in extreme cut classes - poor and excellent 
- **Price-depth/table correlation was irrelevant** as depth and table have no price influence


## Data Manipulation

Digging into the dataframe, we considered that it could be interesting to find out which

- **Diamonds shapes were present in the dataframe**. We relied on diamonds.pro guides on diamonds proportions. Whereas, we found out that price is higher for fancy shapes than for traditional ones, consequently, fancy-shaped diamonds should be at the center of the commercial strategy of these diamonds
- **Potential Clients could be willing to buy this diamonds**, making clear which ones are our priority
    - #1 Tiffany & Co for the potential amount sold at a profitable level
    - #2 Cartier and Harry Winston as these are the most profitable clients
    - #3 Other brands, as diamonds with potential to be sold to other brands made up de greatest part of the diamodns sample
    - #4 Piaget, as without making an offer in first place Cartier and Harry Winston we can not define the strategy for Piaget, who would be our second option for selling the rest of the top quality diamonds
    
- **Ranking could fit better to this sample regarding the quality of the diamonds**. We observe that diamonds do not always have the price their quality suggest, for that purpose, we make a quality and a price ranking, which do not match at the moment


# Conclusions

In [2]:
diamonds.columns

Index(['index', 'carat', 'cut', 'color', 'clarity', 'depth', 'table', 'price',
       'carat_scale', 'L/Wratio', 'shape', 'Potential Client', 'volume'],
      dtype='object')

We have seen the direct relation between cut and price, also between clarity and price, that sometimes is broken by a great or dreadful cut. Contrariwise, color quality seems to be negatively related with price as the average price paid for avery type of cut is higher when the color is yellow-like


In [4]:
fig = px.scatter(diamonds.query("carat>3"), x="price", y="cut", size="table", color="color",
           hover_name="Potential Client", log_x=True, size_max=60)
fig.show()

Potential selling

In [5]:
from IPython.display import HTML
HTML('''<script>
code_show=true; 
function code_toggle() {
 if (code_show){
 $('div.input').hide();
 } else {
 $('div.input').show();
 }
 code_show = !code_show
} 
$( document ).ready(code_toggle);
</script>
The raw code for this IPython notebook is by default hidden for easier reading.
To toggle on/off the raw code, click <a href="javascript:code_toggle()">here</a>.''')