# Report

In [1]:
# The code was removed by Watson Studio for sharing.

#### Preparation and imports

In [2]:
!conda install -c conda-forge spacy
!python -m spacy download en

import requests
import pickle
import json
import pandas as pd
from sklearn.cluster import MiniBatchKMeans
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.feature_extraction.text import ENGLISH_STOP_WORDS

import spacy
from html import unescape

Fetching package metadata .............
Solving package specifications: .

Package plan for installation in environment /opt/conda/envs/DSX-Python35:

The following NEW packages will be INSTALLED:

    cymem:         1.31.2-py35_0         conda-forge
    dill:          0.2.8.2-py35_0        conda-forge
    msgpack-numpy: 0.4.1-py_0            conda-forge
    murmurhash:    0.28.0-py35hfc679d8_0 conda-forge
    plac:          0.9.6-py_1            conda-forge
    preshed:       1.0.1-py35hfc679d8_0  conda-forge
    regex:         2017.11.09-py35_0     conda-forge
    spacy:         2.0.11-py35hf8a1672_1 conda-forge
    termcolor:     1.1.0-py_2            conda-forge
    thinc:         6.10.3-py35hf8a1672_1 conda-forge
    ujson:         1.35-py35h470a237_1   conda-forge

cymem-1.31.2-p 100% |################################| Time: 0:00:00 218.33 kB/s
dill-0.2.8.2-p 100% |################################| Time: 0:00:00 479.57 kB/s
murmurhash-0.2 100% |################################| T

## Description

The owner of a restaurant (or a similar venue, eg. café) asked me to help her understand what her customers think about the venue and maybe find a way to improve the business.

I would like to approach this by analysing the tips the customers wrote about the venue and see if some common/repeating ideas can be extracted.
By doing this, I'm hoping to get some insight about how to improve the business in order to better serve the customers.

NOTE:

Since the tips are usually not in big numbers, it would probably be more efficient to just read them.
I'm doing this analysis more as an exercise. One could imagine a similar analysis on social network posts where reading might not be an option because of the large volume.

## Data extraction

I am planning to use the Foursquare API to get the list of tips, extract the words from the tips, perform some pre-processing (eg: eliminating stop-words like "and", "but", etc.), convert the extracted words in features for the tip in question, put them in a dataframe and run a clustering algorithm on it.

I would like to use a coffee shop from New York named Little Collins: https://foursquare.com/v/little-collins/51c9b6e1498e263056040a69
This seems to be popular and it has a lot of tips (400 at the moment).

### Get tips for venue
    - GET https://api.foursquare.com/v2/venues/VENUE_ID/tips
    - endpoint: https://developer.foursquare.com/docs/api/venues/tips

### Get data from Foursquare

In [4]:
VENUE_ID = "51c9b6e1498e263056040a69"  # Link: https://foursquare.com/v/little-collins/51c9b6e1498e263056040a69


PATTERN = "https://api.foursquare.com/v2/{{}}?limit=500&oauth_token={}&v={}".format(TOKEN, VERSION)
TIPS_URL = PATTERN.format("venues/{}/tips".format(VENUE_ID))


def get_venue_tips(url):
    users = []
    response = requests.get(url).json() # get response
    return response['response']['tips']


tips = get_venue_tips(TIPS_URL)

#### The data looks like this:

In [23]:
len(tips['items'])

200

In [90]:
# The response is a dictionary with the following keys
display(tips.keys())

# We are interested in the 'items' which is a list of dictionaries with the following keys
display(tips['items'][0].keys())

# I am only going to use the `id` and the `text` fields
display(tips['items'][0]['id'], tips['items'][0]['text'])

dict_keys(['count', 'items'])

dict_keys(['text', 'likes', 'authorInteractionType', 'user', 'disagreeCount', 'logView', 'canonicalUrl', 'todo', 'like', 'createdAt', 'type', 'photourl', 'photo', 'agreeCount', 'id', 'lastUpvoteTimestamp', 'lastVoteText'])

'571402e0498e27874973d636'

"A little taste of Melbourne in the heart of New York City. Great coffee (made the way it should be!) and classic Australian food to match. Can't go wrong with Vegemite on toast!"

Save the data for later usage:

In [25]:
project.save_data('capstone.tips.p', pickle.dumps(tips), overwrite=True)

{'asset_id': '04f58cc3-e071-45aa-8cf3-72bf32e61d38',
 'bucket_name': 'courseracapstone-donotdelete-pr-duvaojegpzermh',
 'file_name': 'capstone.tips.p',
 'message': 'File capstone.tips.p has been written successfully to the associated OS'}

## Methodology - Clustering text documents using k-means

The tips would have different lenghts and this could be a problem when comparing two tips.
Once I get the cluster, I would make lists of N most used words in each cluster and I hope to be able to interpret the results and extract some ideas.

I have found an useful example of doing k-means text documents clustering: http://scikit-learn.org/stable/auto_examples/text/document_clustering.html

### Clustering

In the following section I am converting the tips contents to vectors that can be passed to the K-means algorithm.
I am using the `spacy` natural language processing library to perform the `Lemmatization`

```
Lemmatization:	Assigning the base forms of words. For example, the lemma of "was" is "be", and the lemma of "rats" is "rat".
```

This should allow the vectorizer to assign similar values to words from the same group but with slightly different form.

I am passing the generated data to the MiniBatchKMeans algorithm which is a faster variant of the K-means (http://scikit-learn.org/stable/modules/clustering.html#mini-batch-kmeans)

```
The MiniBatchKMeans is a variant of the KMeans algorithm which uses mini-batches to reduce the computation time, while still attempting to optimise the same objective function. Mini-batches are subsets of the input data, randomly sampled in each training iteration. 
```

### Sentiment analysis

For the sentiment analysis part I am using the `textblob` library to extract the sentiment from each tip.

The sentiment, according to the documentation is: https://textblob.readthedocs.io/en/dev/api_reference.html#textblob.blob.TextBlob.sentiment

```
Return a tuple of form (polarity, subjectivity ) where polarity is a float within the range [-1.0, 1.0] and subjectivity is a float within the range [0.0, 1.0] where 0.0 is very objective and 1.0 is very subjective.
```

After extracting the sentiments I'm going to compute the average polarity score for all the sentiment and for objective sentiments only.

In [3]:
tips = pickle.load(project.get_file('capstone.tips.p'))

#### Extract the contents of the tips and save them in a dataframe

In [4]:
tips_content = []

for item in tips['items']:
    tips_content.append((item['id'], item['text']))

data = pd.DataFrame(tips_content, columns=('Id', 'Text'))
data.head()

Unnamed: 0,Id,Text
0,571402e0498e27874973d636,A little taste of Melbourne in the heart of Ne...
1,538747ad11d2268955cb3f23,"The best coffee and bites in Midtown, dare I s..."
2,56c003cc498e9f9d799fb644,They serve counter culture beans! Great pour-o...
3,53e2e877498eefdb1b26af19,1st of the now many Aussie coffee houses that ...
4,55d67810498e24523636a3a6,An Australian coffee oasis amidst the midtown ...


#### Load the spacy model and add helper function for the vectorization

In [5]:
# create a spaCy tokenizer
nlp = spacy.load('en')
lemmatizer = spacy.lang.en.English()


# remove html entities from docs and
# set everything to lowercase
def my_preprocessor(doc):
    return (unescape(doc).lower())


# tokenize the doc and lemmatize its tokens
def my_tokenizer(doc):
    tokens = lemmatizer(doc)
    return([token.lemma_ for token in tokens])

#### Create vectorizer and fit it to the data

In [10]:
vectorizer = TfidfVectorizer(
    max_df=0.5, max_features=100000,
    min_df=2, stop_words=ENGLISH_STOP_WORDS,
    preprocessor=my_preprocessor,
    tokenizer=my_tokenizer,
    use_idf=True)

In [13]:
X = vectorizer.fit_transform(data['Text'])

#### Apply MiniBatch K-means to the data obtained after vectorization

In [43]:
km = MiniBatchKMeans(
    n_clusters=20, init='k-means++', n_init=1,
    init_size=1000, batch_size=10, verbose=True)

In [44]:
km.fit(X)

Init 1/1 with method: k-means++
Inertia for init 1/1: 136.237490
Minibatch iteration 1/2000: mean batch inertia: 0.818748, ewa inertia: 0.818748 
Minibatch iteration 2/2000: mean batch inertia: 0.861918, ewa inertia: 0.823043 
Minibatch iteration 3/2000: mean batch inertia: 0.786946, ewa inertia: 0.819451 
Minibatch iteration 4/2000: mean batch inertia: 0.671047, ewa inertia: 0.804685 
Minibatch iteration 5/2000: mean batch inertia: 0.868364, ewa inertia: 0.811021 
Minibatch iteration 6/2000: mean batch inertia: 0.671635, ewa inertia: 0.797152 
Minibatch iteration 7/2000: mean batch inertia: 0.780987, ewa inertia: 0.795543 
Minibatch iteration 8/2000: mean batch inertia: 0.849813, ewa inertia: 0.800943 
Minibatch iteration 9/2000: mean batch inertia: 0.820330, ewa inertia: 0.802872 
Minibatch iteration 10/2000: mean batch inertia: 0.791899, ewa inertia: 0.801780 
Minibatch iteration 11/2000: mean batch inertia: 0.866772, ewa inertia: 0.808247 
Minibatch iteration 12/2000: mean batch in

MiniBatchKMeans(batch_size=10, compute_labels=True, init='k-means++',
        init_size=1000, max_iter=100, max_no_improvement=10, n_clusters=20,
        n_init=1, random_state=None, reassignment_ratio=0.01, tol=0.0,
        verbose=True)

#### Merging the labels with the data and saving

In [46]:
labeled_data = data.copy()
labeled_data['Label'] = km.labels_

project.save_data('capstone.labeled_data.p', labeled_data.to_json(), overwrite=True)

{'asset_id': '51a875b8-3a69-44e6-87a8-63b9868a7d82',
 'bucket_name': 'courseracapstone-donotdelete-pr-duvaojegpzermh',
 'file_name': 'capstone.labeled_data.p',
 'message': 'File capstone.labeled_data.p has been written successfully to the associated OS'}

In [21]:
labeled_data = pd.read_json(project.get_file('capstone.labeled_data.p'))

## Results

#### Generating the top of the clusters by counting the members

In [51]:
top = labeled_data[['Id', 'Label']].groupby(by='Label').count()['Id'].sort_values(ascending=False)
display(top)

Label
12    29
13    18
18    18
4     18
0     17
11    10
10    10
2     10
7      9
17     8
1      8
5      7
16     7
15     6
9      5
6      5
8      5
14     5
3      4
19     1
Name: Id, dtype: int64

In [78]:
# Helper function to extract the words top from the cluster and print the tips in the cluster

def words_top_per_label(label):

    words = []

    for item in labeled_data[labeled_data['Label'] == label]['Text']:
        for c in '(){}[]:;,.!?-_0123456789"\'/\\=+&':
            item = item.replace(c, '').lower()
        words += item.split(' ')

    s = pd.Series(words)

    display(s[~s.index.isin(vectorizer.stop_words_)].value_counts().where(lambda x : x > 1).dropna())
    display(labeled_data[labeled_data['Label'] == label]['Text'].to_dict())

### Analyzing the clusters

In [60]:
words_top_per_label(top.index[0])

the          28.0
and          15.0
a            14.0
coffee       12.0
is           10.0
for           9.0
good          9.0
great         9.0
sandwich      7.0
i             7.0
to            7.0
              7.0
you           6.0
love          6.0
in            6.0
well          5.0
was           5.0
with          4.0
if            4.0
brew          4.0
of            4.0
service       4.0
me            4.0
it            4.0
are           4.0
cookie        3.0
so            3.0
pretty        3.0
but           3.0
cold          3.0
             ... 
healthy       2.0
get           2.0
had           2.0
ask           2.0
be            2.0
pick          2.0
an            2.0
when          2.0
midtown       2.0
melbourne     2.0
avocado       2.0
find          2.0
brunch        2.0
lattes        2.0
made          2.0
small         2.0
that          2.0
even          2.0
drinks        2.0
up            2.0
counter       2.0
this          2.0
on            2.0
have          2.0
food      

{23: 'Avocado toast ftw!!  Properly brewed coffee is refreshing.  They brew counter culture coffee, an excellent roaster that is hard to find in ny.  Try the Idido if the have it - herbal and floral!',
 34: 'If you are in the neighborhood, LITTLE COLLINS is a must do for a healthy and tasty breakfast.Excellent coffee and pastries!',
 42: 'Avo smash is popular but the best thing on the me I is the cured salmon sandwich. Great cold brew and the cookie that comes with the hot drinks is delicious!',
 44: "Best mocha I've had in be city! And their complimentary hazelnut cookie served when you dine-in is worth sitting yourself down to truly enjoy your beverage!",
 69: 'Well made macchiato and lattes! Cute little green cups accompanied by a small biscuit on the saucer.',
 70: 'A lovely taste of Melbourne in a mundane part of midtown... great food, coffee and atmosphere... And they have Anzac bikkies!! Love!',
 73: 'No words for the PBJ sandwich!!!!! So good.',
 77: 'Great vibe and an awesome 

#### It seems that the sandwich preferences are spread across multiple choices.

In [63]:
words_top_per_label(top.index[3])

flat         16.0
white        15.0
the          14.0
a             9.0
              8.0
in            7.0
coffee        7.0
and           6.0
best          6.0
get           6.0
is            5.0
but           4.0
for           4.0
with          4.0
excellent     3.0
you           3.0
breakfast     3.0
melbourne     3.0
small         3.0
i             2.0
whites        2.0
to            2.0
smash         2.0
amazing       2.0
not           2.0
strong        2.0
culture       2.0
delicious     2.0
chicken       2.0
perfect       2.0
city          2.0
miss          2.0
too           2.0
of            2.0
dtype: float64

{7: 'Best flat white you’ll get in NYC. Makes me miss my life in Sydney where I was surrounded by amazing coffee.',
 36: 'top 3 flat white in the city 👍🏼👍🏼 too small for working - even eating is a bit uncomfortable due to small tables. but get FW to go and savor it while walking around the city!',
 67: 'The chicken + squash + arugula salad is a lunch I get at least once a week. Absolutely lovely with a hefty amount of dark and white chicken meat 👌🏼',
 86: 'The flat white is obviously amazing, but come for breakfast and impress whoever you bring with. Outstanding everything.',
 97: 'The best flat white. The best coffee shop near Bloomingdales!',
 129: 'The flat white is the best! Just like Melbourne!',
 132: 'Excellent flat white (the real thing, not half a latte as in other places)',
 134: 'Flat white and the aesthetic are both perfect.',
 145: 'If you miss having coffee in Australia, you will be SO happy here. Best flat whites.',
 152: 'Excellent coffee. The flat white is delicious! S

#### The flat white coffee is mentioned a lot. It seems to be a coffee speciality from Australia.

In [64]:
words_top_per_label(top.index[4])

and          14.0
coffee       10.0
really       10.0
the          10.0
is            9.0
good          7.0
              6.0
but           6.0
great         6.0
super         5.0
crowded       5.0
a             4.0
white         4.0
flat          4.0
friendly      4.0
with          4.0
all           3.0
smash         3.0
place         3.0
tiny          3.0
amazing       3.0
it            3.0
get           3.0
toast         3.0
an            2.0
so            2.0
very          2.0
sandwich      2.0
awesome       2.0
cake          2.0
food          2.0
go            2.0
service       2.0
away          2.0
small         2.0
seat          2.0
gets          2.0
shop          2.0
avocado       2.0
delicious     2.0
fast          2.0
its           2.0
as            2.0
well          2.0
for           2.0
breakfast     2.0
too           2.0
bread         2.0
in            2.0
dtype: float64

{9: 'Unique selections, great coffee and good ambience in general❤️ such a tiny place but worth it. Their macchiato with the PBJ toast is the best combination ❤️',
 10: 'A great little coffee bar with very good breakfast and lunch options. The flat white is really good and can confirm all other reviews - The avocado-feta sandwich is awesome.',
 37: "Get the olive oil cake! It's perfect - crunchy top and delicious, moist, flavourful cake. Or the banana bread, you really can't go wrong. Super friendly service as well.",
 47: 'The Smash was an amazing breakfast. Really good coffee as well, and they serve Australian flat white and piccolo.',
 64: 'Epic Smash toast. Must get. Love this place! Tiny, 5 tables but really fresh and flavorful!',
 66: 'Flat white is my favorite. All the food is delicious. Place gets super crowded and aggressive which is annoying. Try & go off peak times.',
 75: "Awesome avocado smash and coffee, but small and very crowded. Great for take away, but don't count on 

#### We can see that some consider it to be very crowded in the peak times but they consider the service to be friendly.

In [65]:
words_top_per_label(top.index[5])

sweet           11.0
a               10.0
and             10.0
uncle            9.0
fred             8.0
the              8.0
ricotta          5.0
bread            5.0
banana           5.0
get              5.0
coffee           4.0
white            4.0
honey            4.0
flat             4.0
with             4.0
menu             3.0
great            3.0
toasted          3.0
                 2.0
on               2.0
to               2.0
latte            2.0
berries          2.0
was              2.0
strawberries     2.0
too              2.0
is               2.0
but              2.0
for              2.0
recommend        2.0
try              2.0
dtype: float64

{14: 'The Sweet Uncle Fred, get it. The combination of banana bread, ricotta, honey, and berries is heavenly. They make a great flat white too, much better than your average coffee.',
 25: 'Small space and no lunch menu available on weekends. Get the sweet uncle fred: toasted banana bread with ricotta, strawberries, honey, and toasted almonds.',
 31: 'Just opened today! Delicious menu and friendly staff. Try the Sweet Uncle Fred--toasted banana bread with ricotta, strawberries, honey, and toasted almonds--and a piccolo latte. Happiness.',
 38: 'This place is tiny, but serve a mean flat white and have a great Aussie cafe menu. The Sweet Uncle Fred (banana bread with ricotta and berries) was excellent.',
 65: "Great food. I recommend the 'Sweet Uncle Fred' for afternoon coffee break. Chai latte was good too although a little on the sweet side.",
 92: 'Avocado Toast & Sweet Uncle Fred are my favorites! Would recommend getting here early to get a seat.',
 105: "Get a flat white and a Sweet

#### Another popular option seems to be `The Sweet Uncle Fred` which seems to be a `combination of banana bread, ricotta, honey, and berries`

In [66]:
words_top_per_label(top.index[6])

and           12.0
toast         11.0
avocado       10.0
the            8.0
coffee         5.0
is             4.0
are            4.0
great          4.0
a              4.0
good           3.0
food           3.0
for            3.0
australian     3.0
too            3.0
it             2.0
cafe           2.0
this           2.0
in             2.0
midtown        2.0
my             2.0
well           2.0
white          2.0
nice           2.0
got            2.0
flat           2.0
as             2.0
seeds          2.0
with           2.0
pumpkin        2.0
options        2.0
back           2.0
dtype: float64

{4: 'An Australian coffee oasis amidst the midtown culinary desert. Get a flat white and avocado toast with pumpkin seeds and pepper.',
 12: 'This is the best Australian specialty coffee in NYC hands down! Try their PB&J toast and avocado toast too! Staff are very friendly and skilled despite crazy crowds around 2pm.',
 15: 'The food is great here! I got avocado toast and my sister got Spanish omelette and both are delicious. Coffee is good too. Will come back and absolutely recommend.',
 49: "Far too many good options for just one visit, I'll definitely be coming back. The flat white, Vegemite toast, avocado toast, and avocado omelet were all amazing",
 103: 'My first stop from the LGA airport for avocado toast and a latte. ☕️ Well worth it. Great food, pastries, and service.',
 110: "These folks know coffee. They are also nice about it if you aren't as knowledgeable. Good pour over options.",
 117: 'Avocado \U0001f951 toast with poached egg and pepperoncino.... mouthwatering. Great c

In [67]:
words_top_per_label(top.index[7])

the          16.0
coffee       11.0
a             8.0
is            8.0
with          5.0
aussie        4.0
avo           4.0
of            3.0
toast         3.0
in            3.0
milk          3.0
for           3.0
flat          3.0
to            3.0
up            2.0
that          2.0
by            2.0
white         2.0
iced          2.0
little        2.0
just          2.0
st            2.0
great         2.0
but           2.0
too           2.0
get           2.0
and           2.0
your          2.0
breakfast     2.0
dtype: float64

{3: '1st of the now many Aussie coffee houses that pairs amazing espresso drinks with great food that goes beyond just your typical pastries. The "Little Collins st" sign was given by the Melbourne mayor!',
 11: "The most delicious coffee I've ever had. Don't be fooled by the smaller sizes- the coffee is strong! The avo toast is also worth a mention- especially with an egg on top.",
 20: 'A must for breakfast takeout. Grabbing a table is impossible. The "Pick Me Up" is my goto breakfast sandwich. But add the "Sweet Uncle Fred" to whatever you order...it will make your day.',
 21: 'This Aussie spot is known mostly for its flat whites and avo toast, but give the iced latte a chance, too. It’s smooth with just the right amount of milk.',
 51: 'A great little Australian inspired cafe in the heart of manhattan. Awesome avo-toast. Check the sun dried coffee.',
 60: 'Quality coffee in a friendly, hip atmosphere. Welcome alternative to the corporate, midtown world. Get the iced coffee w almond

#### From the above two clusters we can deduce that another popular choice is the avocado toast

In [68]:
words_top_per_label(top.index[8])

the          8.0
a            7.0
with         7.0
piccolo      6.0
latte        5.0
and          5.0
olive        4.0
it           4.0
oil          4.0
coffee       4.0
for          3.0
cake         3.0
delicious    3.0
midtown      3.0
to           3.0
cortado      3.0
small        2.0
bread        2.0
gem          2.0
is           2.0
in           2.0
smash        2.0
they         2.0
of           2.0
dtype: float64

{16: 'What to Order: A piccolo latte, a traditional Australian drink with a double shot of espresso and a small amount of steamed milk. It’s similar to a cortado.',
 58: 'Midtown gem. Ask for a cortado and pair it with the banana bread. They can toast it for you.',
 68: 'Uses rotating counter culture coffee. Piccolo with olive oil cake makes for the perfect breakfast. Avocado feta smash is great too.',
 71: 'I agree with all the Smash comments. Simple, delicious, healthy start to day. Accompany that with an expertly prepared piccolo latte',
 96: 'Lovely small coffee shop with delicious coffee. Try the olive oil bread- it will change your world.',
 98: 'The Spanish tortilla is delicious. Follow it with. Piccolo Latte',
 102: 'Sharp joint in Midtown East. Get a piccolo latte (cortado), and eat the cookie they put on the plate.',
 131: "Excellent piccolo latte and olive oil cake. Didn't know this gem existed in midtown",
 196: 'Coffee and the olive oil spice cake'}

#### It looks like for this cluster, the `piccolo latte` is often paired with the `olive oil cake`.

#### It looks like from the other clusters no useful information can be extracted.

### Sentiment analysis [extra]

I am going to use textblob to perform a sentiment analysis.
I am going to extract the sentiments from all the tips.
The sentiment, according to documentation https://textblob.readthedocs.io/en/dev/api_reference.html#textblob.blob.TextBlob.sentiment

```
Return a tuple of form (polarity, subjectivity ) where polarity is a float within the range [-1.0, 1.0] and subjectivity is a float within the range [0.0, 1.0] where 0.0 is very objective and 1.0 is very subjective.
```

After extracting the sentiments I'm going to compute the average polarity score for all the sentiment and for objective sentiments only.

In [14]:
!pip install textblob

Collecting textblob
  Downloading https://files.pythonhosted.org/packages/11/18/7f55c8be6d68ddc4036ffda5382ca51e23a1075987f708b9123712091af1/textblob-0.15.1-py2.py3-none-any.whl (631kB)
[K    100% |████████████████████████████████| 634kB 1.4MB/s ta 0:00:01
[?25hRequirement not upgraded as not directly required: nltk>=3.1 in /opt/conda/envs/DSX-Python35/lib/python3.5/site-packages (from textblob)
Requirement not upgraded as not directly required: six in /opt/conda/envs/DSX-Python35/lib/python3.5/site-packages (from nltk>=3.1->textblob)
Installing collected packages: textblob
Successfully installed textblob-0.15.1


In [45]:
from textblob import TextBlob

sentiments = []

for item in tips_content:
    sentiment = TextBlob(item[1]).sentiment
    sentiments.append((sentiment.polarity, sentiment.subjectivity))

sentiments_df = pd.concat([labeled_data.copy(), pd.DataFrame(sentiments, columns=['Polarity', 'Subjectivity'])], axis=1)

filtered_sentiments = sentiments_df[sentiments_df['Subjectivity'] <= 0.5]['Polarity']
print("Polarity score: ", sentiments_df['Polarity'].mean())
print("Objective polarity score: ", filtered_sentiments.mean())
print("Objective sentiments used: ", filtered_sentiments.count())

Polarity score:  0.35531281103982
Objective polarity score:  0.21514148285576853
Objective sentiments used:  70


Let's look at some of the negative sentiments.

In [85]:
negative_df = sentiments_df[(sentiments_df['Polarity'] < -0.1)].sort_values('Polarity')

for index, row in negative_df[['Polarity', 'Text']].iterrows():
    print("[", row[0], "] ", row[1])

[ -0.6 ]  Breakfast smash: avocado + feta mash, chili flakes, pepitas on hearty toast. Cold brew on the side. Yes.
[ -0.4 ]  For some reason they keep the sugar behind the counter. You need to ask for it.
[ -0.2916666666666667 ]  Avocado Benedict - either Bacon or Salmon - is to die for. You can have a hard time looking for a table or a space in the counter though.
[ -0.19166666666666665 ]  The chicken + squash + arugula salad is a lunch I get at least once a week. Absolutely lovely with a hefty amount of dark and white chicken meat 👌🏼
[ -0.175 ]  Call me crazy, but if I'm in an Aussie cafe I'd really like at least one person serving to be Australian. #dissapointed
[ -0.16666666666666666 ]  Gem of a coffee shop in corporate midtown east - they serve some serious automated pourover coffee
[ -0.15833333333333333 ]  A must for breakfast takeout. Grabbing a table is impossible. The "Pick Me Up" is my goto breakfast sandwich. But add the "Sweet Uncle Fred" to whatever you order...it will ma

## Discussion / Observations

- It seems that the sandwich preferences are spread across multiple choices.
- The flat white coffee is mentioned a lot. It seems to be a coffee speciality from Australia.
- We can see that some consider it to be very crowded in the peak times but they consider the service to be friendly.
- Another popular option seems to be `The Sweet Uncle Fred` which seems to be a `combination of banana bread, ricotta, honey, and berries`
- From the above two clusters we can deduce that another popular choice is the avocado toast
- It looks like for this cluster, the `piccolo latte` is often paired with the `olive oil cake`.

From the sentiment analysis we can see that the overall objective sentiment score is 21% in the positive side.
The sentiments extraction is not always right as it can be seen in the first example of the negative sentiments:

```
Breakfast smash: avocado + feta mash, chili flakes, pepitas on hearty toast. Cold brew on the side. Yes.
```

Maybe it was confused by the word "smash" and/or "cold".

But there are some valid points like "eating is a bit uncomfortable due to small tables"

## Conclusions

Just by looking at the clusters we can describe the venue as follows:

Midtown Australian coffee shop with very friendly service, crowded at peak times.
It is prefered mostly for the breakfast, but also for lunch.

The avocado toast seems to be one of the most popular choice and The Sweet Uncle Fred is also appreciated.

A popular combination is `picollo latte and oliv oil cake`.

In general, the shop is already doing great and I could not see too many negative points about it, except being crowded at peak hours.
It could be a good idea to make some special offers out of those popular combinations or to try to pair popular choices with not so popular ones to promote the later.
    
While the tips are not the best data source to draw conclusions about the top choices and combination (the orders receipts would serve better), it can still be used as a confirmation that some combinations are indeed appreciated and not just ordered together.