# Finding the best chocolate bars

Now let's now move on to the competition and challenge.

##   📖 Background
You work at a specialty foods import company that wants to expand into gourmet chocolate bars. Your boss needs your team to research this market to inform your initial approach to potential suppliers.

After finding valuable chocolate bar ratings online, you need to explore if the chocolate bars with the highest ratings share any characteristics that could help you narrow your search for suppliers (e.g., cacao percentage, bean country of origin, etc.)

## 💾 The data

#### Your team created a file with the following information ([source](https://flavorsofcacao.com)):
- "id" - id number of the review
- "manufacturer" - Name of the bar manufacturer
- "company_location" - Location of the manufacturer
- "year_reviewed" - From 2006 to 2021
- "bean_origin" - Country of origin of the cacao beans
- "bar_name" - Name of the chocolate bar
- "cocoa_percent" - Cocoa content of the bar (%)
- "num_ingredients" - Number of ingredients
- "ingredients" - B (Beans), S (Sugar), S* (Sweetener other than sugar or beet sugar), C (Cocoa Butter), (V) Vanilla, (L) Lecithin, (Sa) Salt
- "review" - Summary of most memorable characteristics of the chocolate bar
- "rating" - 1.0-1.9 Unpleasant, 2.0-2.9 Disappointing, 3.0-3.49 Recommended, 3.5-3.9 Highly Recommended, 4.0-5.0 Oustanding

***Acknowledgments**: Brady Brelinski, Manhattan Chocolate Society*

## 💪 Challenge
Create a report to summarize your research. Include:

1. What is the average rating by country of origin?
2. How many bars were reviewed for each of those countries?
3. Create plots to visualize findings for questions 1 and 2.
4. Is the cacao bean's origin an indicator of quality? 
5. [Optional] How does cocoa content relate to rating? What is the average cocoa content for bars with higher ratings (above 3.5)?
6. [Optional 2] Your research indicates that some consumers want to avoid bars with lecithin. Compare the average rating of bars with and without lecithin (L in the ingredients).
7. Summarize your findings.

## Importing modules

In [6]:
# pip install pandas

import pandas as pd
#installing other packages 

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd


In [7]:
chocolate = pd.read_csv("chocolate_bars.csv")

In [8]:
chocolate

Unnamed: 0,id,manufacturer,company_location,year_reviewed,bean_origin,bar_name,cocoa_percent,num_ingredients,ingredients,review,rating
0,2454,5150,U.S.A.,2019,Tanzania,"Kokoa Kamili, batch 1",76.0,3.0,"B,S,C","rich cocoa, fatty, bready",3.25
1,2458,5150,U.S.A.,2019,Dominican Republic,"Zorzal, batch 1",76.0,3.0,"B,S,C","cocoa, vegetal, savory",3.50
2,2454,5150,U.S.A.,2019,Madagascar,"Bejofo Estate, batch 1",76.0,3.0,"B,S,C","cocoa, blackberry, full body",3.75
3,2542,5150,U.S.A.,2021,Fiji,"Matasawalevu, batch 1",68.0,3.0,"B,S,C","chewy, off, rubbery",3.00
4,2546,5150,U.S.A.,2021,Venezuela,"Sur del Lago, batch 1",72.0,3.0,"B,S,C","fatty, earthy, moss, nutty,chalky",3.00
...,...,...,...,...,...,...,...,...,...,...,...
2525,1205,Zotter,Austria,2014,Blend,Raw,80.0,4.0,"B,S*,C,Sa","waxy, cloying, vegetal",2.75
2526,1996,Zotter,Austria,2017,Colombia,"APROCAFA, Acandi",75.0,3.0,"B,S,C","strong nutty, marshmallow",3.75
2527,2036,Zotter,Austria,2018,Blend,"Dry Aged, 30 yr Anniversary bar",75.0,3.0,"B,S,C","fatty, earthy, cocoa",3.00
2528,2170,Zotter,Austria,2018,Congo,Mountains of the Moon,70.0,3.0,"B,S,C","fatty, mild nuts, mild fruit",3.25


In [9]:
chocolate['company_location'].describe()

count       2530
unique        67
top       U.S.A.
freq        1136
Name: company_location, dtype: object

In [10]:
chocolate['company_location']

0        U.S.A.
1        U.S.A.
2        U.S.A.
3        U.S.A.
4        U.S.A.
         ...   
2525    Austria
2526    Austria
2527    Austria
2528    Austria
2529    Austria
Name: company_location, Length: 2530, dtype: object

In [11]:
chocolate['company_location'].value_counts()

company_location
U.S.A.                   1136
Canada                    177
France                    176
U.K.                      133
Italy                      78
                         ... 
St.Vincent-Grenadines       1
Martinique                  1
Ghana                       1
Wales                       1
Suriname                    1
Name: count, Length: 67, dtype: int64

In [12]:
choco_vcounts = chocolate['company_location'].value_counts()

#### display all the key and value in the column
    
for key,value in choco_vcounts.items():
    print(key,value)

U.S.A. 1136
Canada 177
France 176
U.K. 133
Italy 78
Belgium 63
Ecuador 58
Australia 53
Switzerland 44
Germany 42
Spain 36
Venezuela 31
Japan 31
Denmark 31
Austria 30
Colombia 29
New Zealand 27
Hungary 26
Brazil 25
Peru 23
Madagascar 17
Vietnam 16
Singapore 15
Amsterdam 12
Scotland 11
South Korea 11
Dominican Republic 11
Taiwan 10
Nicaragua 10
Mexico 10
Guatemala 10
Argentina 9
Israel 9
Costa Rica 9
Netherlands 8
Lithuania 8
Poland 8
Honduras 6
Sweden 6
Thailand 5
Ireland 5
Philippines 5
U.A.E. 5
Fiji 4
Vanuatu 4
Sao Tome & Principe 4
South Africa 4
Sao Tome 4
Iceland 4
Puerto Rico 4
Malaysia 3
Norway 3
Czech Republic 3
Grenada 3
Portugal 3
St. Lucia 3
Russia 3
El Salvador 3
Finland 2
India 2
Bolivia 2
Chile 2
St.Vincent-Grenadines 1
Martinique 1
Ghana 1
Wales 1
Suriname 1


In [13]:
#shape of the dataframe

chocolate.shape

(2530, 11)

In [14]:
#A check on how pandas interpreted each of the column data types can be done by requesting the pandas dtypes
##attribute:

chocolate.dtypes

id                    int64
manufacturer         object
company_location     object
year_reviewed         int64
bean_origin          object
bar_name             object
cocoa_percent       float64
num_ingredients     float64
ingredients          object
review               object
rating              float64
dtype: object

#### "rating" - 1.0-1.9 Unpleasant, 2.0-2.9 Disappointing, 3.0-3.49 Recommended, 3.5-3.9 Highly Recommended, 4.0-5.0 Oustanding


In [15]:
#need for reeplacing some object based on rating value group

#writing a function to group the datasets


def rating_distribution(r_distribution):
    if r_distribution <= 1.9:
        return ('Unpleasant')
    elif r_distribution >= 2.0 and r_distribution <= 2.9:
        return ('Disappointing')
    elif r_distribution >= 3.0 and r_distribution <= 3.49:
        return ('Recommended')
    elif r_distribution >= 3.5 and r_distribution <= 3.9:
        return ('Highly Recommended')
    else:
        return ('Oustanding')

In [16]:
#adding a function to the rating field name::

rating_choco = chocolate['rating'].apply(rating_distribution)

In [17]:
rating_choco

0              Recommended
1       Highly Recommended
2       Highly Recommended
3              Recommended
4              Recommended
               ...        
2525         Disappointing
2526    Highly Recommended
2527           Recommended
2528           Recommended
2529    Highly Recommended
Name: rating, Length: 2530, dtype: object

In [18]:
#adding a new field name.

chocolate['rating_choco'] = rating_choco

In [19]:
chocolate['rating_choco']

0              Recommended
1       Highly Recommended
2       Highly Recommended
3              Recommended
4              Recommended
               ...        
2525         Disappointing
2526    Highly Recommended
2527           Recommended
2528           Recommended
2529    Highly Recommended
Name: rating_choco, Length: 2530, dtype: object

In [20]:
chocolate.head()


# rating_distribution_vcounts = chocolate['rating_choco'].value_counts()

# #### display all the key and value in the column
    
# for k,v in rating_distribution_vcounts.items():
#     print(k,v)

Unnamed: 0,id,manufacturer,company_location,year_reviewed,bean_origin,bar_name,cocoa_percent,num_ingredients,ingredients,review,rating,rating_choco
0,2454,5150,U.S.A.,2019,Tanzania,"Kokoa Kamili, batch 1",76.0,3.0,"B,S,C","rich cocoa, fatty, bready",3.25,Recommended
1,2458,5150,U.S.A.,2019,Dominican Republic,"Zorzal, batch 1",76.0,3.0,"B,S,C","cocoa, vegetal, savory",3.5,Highly Recommended
2,2454,5150,U.S.A.,2019,Madagascar,"Bejofo Estate, batch 1",76.0,3.0,"B,S,C","cocoa, blackberry, full body",3.75,Highly Recommended
3,2542,5150,U.S.A.,2021,Fiji,"Matasawalevu, batch 1",68.0,3.0,"B,S,C","chewy, off, rubbery",3.0,Recommended
4,2546,5150,U.S.A.,2021,Venezuela,"Sur del Lago, batch 1",72.0,3.0,"B,S,C","fatty, earthy, moss, nutty,chalky",3.0,Recommended


In [21]:
#checking distribution

chocolate['rating_choco'].value_counts()

rating_choco
Recommended           987
Highly Recommended    865
Disappointing         549
Oustanding            112
Unpleasant             17
Name: count, dtype: int64

In [22]:
chocolate['rating_choco'].describe()

count            2530
unique              5
top       Recommended
freq              987
Name: rating_choco, dtype: object

In [23]:
# What is the average age of the rating?

chocolate['rating'].mean()

3.1963438735177867

In [25]:
chocolate[["bean_origin", "rating"]].groupby("bean_origin").mean()

Unnamed: 0_level_0,rating
bean_origin,Unnamed: 1_level_1
Australia,3.250000
Belize,3.233553
Blend,3.038462
Bolivia,3.181250
Brazil,3.262821
...,...
U.S.A.,3.242424
Uganda,3.065789
Vanuatu,3.115385
Venezuela,3.231225


In [None]:
#plot a graph

