# Feedback Summarisation

The summarisation here uses the extractive summarisation technique, to be specific Weighted Word Occurrences and TF-IDF. The BERT summarisation library is used just to have slight comparison with deep learning technique.

In [1]:
import pandas as pd
import numpy as np
from nltk.corpus import stopwords, wordnet
from nltk.tokenize import word_tokenize, sent_tokenize
from nltk import pos_tag
from nltk.stem import WordNetLemmatizer
from rouge import Rouge 
import re
from summarizer import Summarizer
stopwords = stopwords.words('english')

In [2]:
df = pd.read_csv('../Dataset/trustpilot_reviews.csv')

In [3]:
df.head(2)

Unnamed: 0,datePublished,headline,reviewBody,itemReviewed_name,reviewRating_ratingValue,head_word_count,body_word_count,total_word_count,headline_clean,reviewBody_clean,lemmatized_headline,lemmatized_reviewBody,lemmatized_reviewBody_original,lemmatized_feedback,overall_rating
0,2021-01-15T13:50:34+00:00,When speaking to one advisors on…,When speaking to one of the advisors on the ph...,Avado,3,6,153,159,speaking one advisors,speaking one advisors phone felt answers given...,speak one advisor,speak one advisor phone felt answer give quest...,When speak to one of the advisor on the phone ...,speak one advisor speak one advisor phone felt...,0
1,2021-01-15T10:45:08+00:00,CIPD Level 5 Diploma,I enrolled today to study my CIPD Level 5 Dipl...,Avado,5,4,64,68,cipd level diploma,enrolled today study cipd level diploma hr man...,cipd level diploma,enrol today study cipd level diploma hr manage...,I enrol today to study my CIPD Level 5 Diploma...,cipd level diploma enrol today study cipd leve...,1


In [4]:
# prepare the data by splitting into good and bad reviews, then combining the 
df_good = df[df['overall_rating']==1]
df_bad = df[df['overall_rating']==0]

In [5]:
# prepare the data by splitting into each company
companies = list(set(df['itemReviewed_name']))
df_companies = {'df_' + str(company):df[df['itemReviewed_name']==company] for company in companies}

In [6]:
df_companies['df_Avado'].head(2)

Unnamed: 0,datePublished,headline,reviewBody,itemReviewed_name,reviewRating_ratingValue,head_word_count,body_word_count,total_word_count,headline_clean,reviewBody_clean,lemmatized_headline,lemmatized_reviewBody,lemmatized_reviewBody_original,lemmatized_feedback,overall_rating
0,2021-01-15T13:50:34+00:00,When speaking to one advisors on…,When speaking to one of the advisors on the ph...,Avado,3,6,153,159,speaking one advisors,speaking one advisors phone felt answers given...,speak one advisor,speak one advisor phone felt answer give quest...,When speak to one of the advisor on the phone ...,speak one advisor speak one advisor phone felt...,0
1,2021-01-15T10:45:08+00:00,CIPD Level 5 Diploma,I enrolled today to study my CIPD Level 5 Dipl...,Avado,5,4,64,68,cipd level diploma,enrolled today study cipd level diploma hr man...,cipd level diploma,enrol today study cipd level diploma hr manage...,I enrol today to study my CIPD Level 5 Diploma...,cipd level diploma enrol today study cipd leve...,1


In [7]:
# prepare the data by splitting into each company and further by good and bad reviews
for company in companies:
    name = 'df_{0}'.format(company)
    df_companies['{0}_good'.format(name)] = df_companies[name][df_companies[name]['overall_rating']==1]
    df_companies['{0}_bad'.format(name)] = df_companies[name][df_companies[name]['overall_rating']==0]

In [8]:
df_companies['df_Avado_good'].head(2)

Unnamed: 0,datePublished,headline,reviewBody,itemReviewed_name,reviewRating_ratingValue,head_word_count,body_word_count,total_word_count,headline_clean,reviewBody_clean,lemmatized_headline,lemmatized_reviewBody,lemmatized_reviewBody_original,lemmatized_feedback,overall_rating
1,2021-01-15T10:45:08+00:00,CIPD Level 5 Diploma,I enrolled today to study my CIPD Level 5 Dipl...,Avado,5,4,64,68,cipd level diploma,enrolled today study cipd level diploma hr man...,cipd level diploma,enrol today study cipd level diploma hr manage...,I enrol today to study my CIPD Level 5 Diploma...,cipd level diploma enrol today study cipd leve...,1
2,2021-01-12T23:13:19+00:00,Great!,I'm enjoying the course that I'm doing. The st...,Avado,5,1,105,106,great,enjoying course study material great tutors re...,great,enjoy course study material great tutor really...,I'm enjoy the course that I'm doing. The study...,great enjoy course study material great tutor ...,1


In [9]:
df_companies['df_Avado_bad'].head(2)

Unnamed: 0,datePublished,headline,reviewBody,itemReviewed_name,reviewRating_ratingValue,head_word_count,body_word_count,total_word_count,headline_clean,reviewBody_clean,lemmatized_headline,lemmatized_reviewBody,lemmatized_reviewBody_original,lemmatized_feedback,overall_rating
0,2021-01-15T13:50:34+00:00,When speaking to one advisors on…,When speaking to one of the advisors on the ph...,Avado,3,6,153,159,speaking one advisors,speaking one advisors phone felt answers given...,speak one advisor,speak one advisor phone felt answer give quest...,When speak to one of the advisor on the phone ...,speak one advisor speak one advisor phone felt...,0
3,2021-01-08T20:26:16+00:00,Very disappointed with the service,Very disappointed with the service I have rece...,Avado,2,5,291,296,disappointed service,disappointed service received bought paid aat ...,disappointed service,disappointed service receive bought paid aat l...,Very disappointed with the service I have rece...,disappointed service disappointed service rece...,0


In [10]:
# Function: lemmatize the word based on the pos tag
def get_simple_pos(tag):
    if tag.startswith('J'):
        return wordnet.ADJ
    elif tag.startswith('V'):
        return wordnet.VERB
    elif tag.startswith('N'):
        return wordnet.NOUN
    elif tag.startswith('R'):
        return wordnet.ADV
    else:
        return wordnet.NOUN

lemmatizer = WordNetLemmatizer()
def lemmatize_word(word):
    pos = pos_tag([word.strip()])
    return lemmatizer.lemmatize(word.strip(),get_simple_pos(pos[0][1]))

def lemmatize_sentence(text):
    final_text = []
    for i in text.split():
        pos = pos_tag([i.strip()])
        word = lemmatizer.lemmatize(i.strip(),get_simple_pos(pos[0][1]))
        final_text.append(word)
    return " ".join(final_text)

## 1. Weighted Word Occurence Method

Count the occurence for each word in the document, take the highest occurence and divide each word's occurence with the highest occurence to get the weighted word. The score is calculated by summing the occurence scrore for each word in the sentence.

In [11]:
def weighted_occurence_summarisation(combined_cleaned_feedback, original_feedback_sentences, n_sentence, is_lemmatized):
    '''
    combined_cleaned_feedback: a string of all cleaned feedback combined (with or without lemmatization)
    original_feedback_sentences: an array of original feedback sentences
    n_sentence: n number of sentences to be taken to form the summary
    is_lemmatized: whether to lemmatize words in calculating the word occurrences and the score
    '''
    # calculate word occurences from the combined feedback text
    word_occurences = {}
    for word in word_tokenize(combined_cleaned_feedback):
        if word not in word_occurences.keys():
            word_occurences[word] = 1
        else:
            word_occurences[word] += 1
    
    # calculate weighted word occurence
    if len(word_occurences.values()) == 0:
        return 'N/A'
    highest_occurence = max(word_occurences.values())

    for word in word_occurences.keys():
        word_occurences[word] = (word_occurences[word]/highest_occurence)
    
    # iterates each sentence and give it a score
    sentence_scores = {}
    for sent in original_feedback_sentences:
        for word in word_tokenize(sent.lower()):
            if is_lemmatized:
                word = lemmatize_word(word)
            if word in word_occurences.keys():
                if len(sent.split(' ')) < 30: # to filter out sentences
                    if sent not in sentence_scores.keys():
                        sentence_scores[sent] = word_occurences[word]
                    else:
                        sentence_scores[sent] += word_occurences[word]
    
    # sort based on the score value
    sorted_sentences = dict(sorted(sentence_scores.items(), key=lambda item: item[1], reverse=True))
    
    # take n number of sentences
    summary = []
    n = 0
    for key, value in sorted_sentences.items():
        if n == n_sentence: break
        if value <= 5: # filter out too high score to avoid short generic feedback, such as Thank you, Wonderful, etc
            summary.append(key)
            n += 1
            
    return ' '.join(summary)

### 1.1. Weighted Word from Whole Dataset by Sentiment (no lemmatization)

In [38]:
good_feedback_combined = ' '.join(str(body) for body in df_good['reviewBody_clean'].to_list())
reference = ' '.join(str(body).replace('\n','') for body in df_good['reviewBody'].to_list())
sentences = sent_tokenize(reference)
summary = weighted_occurence_summarisation(good_feedback_combined, sentences, 7, False)

# recall: how much the words in the reference appeared in the summaries
# precision: how much the words in the summary appeared in the reference
# f1 score: 2*(precision*recall)/(precision+recall)

rouge = Rouge()
score_rouge = rouge.get_scores(summary, reference)

print(score_rouge[0]['rouge-1'])
print(summary)

{'f': 0.0004809778440702294, 'p': 1.0, 'r': 0.00024054677327420764}
Really excited to work on with this course Good Course & Helpful staff.Very easy and stress-free sign up, all information provided to make easy to set up and login. Really helpful, easy to enrol, changed my course before I started, this was no hassle at all... Great course, really informative, loads of help. Keep up the good work! so far so good everything about the learn direct is perfect Great experience,great start up and materials.really happy Lovely course and very helpful customer service. Highly recommend. honest and easy to understand process - would recommend Alison was really helpful with booking the course Great learning and experience and online set up! Thoroughly enjoyed doing my course via Avado The course material was excellent, the portal looked very professional and I had great tutor support throughout my course.


In [39]:
bad_feedback_combined = ' '.join(str(body) for body in df_bad['reviewBody_clean'].to_list())
reference = ' '.join(str(body).replace('\n','') for body in df_bad['reviewBody'].to_list())
sentences = sent_tokenize(reference)
summary = weighted_occurence_summarisation(bad_feedback_combined, sentences, 7, False)

# recall: how much the words in the reference appeared in the summaries
# precision: how much the words in the summary appeared in the reference
# f1 score: 2*(precision*recall)/(precision+recall)

rouge = Rouge()
score_rouge = rouge.get_scores(summary, reference)

print(score_rouge[0]['rouge-1'])
print(summary)

{'f': 0.0018506572292130937, 'p': 1.0, 'r': 0.0009261856499441642}
INCORRECT COURSE MATERIAL - The course would often provide guidelines that were needed to complete assignments for the course then assessors would fail the assignment for using their guidelines.5. The course highlights once you get a free nail kit with the course after paying at least 50% for the course which is fine by me. Daughter found course information on this company's website to be in-correct and misleading; customer service advisor also gave in-correct course information. This is the first time I have studied through a distant learning course and the materials you receive are enough to get you through the course. It was a really good course and the instructor was flexible with being able to complete the required course work on time. Biggest mistake I made was going with this course provider , I had to pay about £600 for my course plus pay for exam fees every time . Material, course content, late exam I liked the

### 1.2. Weighted Word from Each Company (no lemmatization)

In [40]:
companies = list(set(df['itemReviewed_name']))

for company in companies:
    print('Summary for {0}:'.format(company))
    df_name = 'df_{0}'.format(company)
    feedback_combined = ' '.join(str(body) for body in df_companies[df_name]['reviewBody_clean'].to_list())
    reference = ' '.join(str(body).replace('\n','') for body in df_companies[df_name]['reviewBody'].to_list())
    sentences = sent_tokenize(reference)
    summary = weighted_occurence_summarisation(feedback_combined, sentences, 7, False)
    
    # recall: how much the words in the reference appeared in the summaries
    # precision: how much the words in the summary appeared in the reference
    # f1 score: 2*(precision*recall)/(precision+recall)

    rouge = Rouge()
    score_rouge = rouge.get_scores(summary, reference)
    
    print(score_rouge[0]['rouge-1'])
    print(summary, '\n')

Summary for BPP:
{'f': 0.0967494231898788, 'p': 1.0, 'r': 0.05083378160301237}
To anyone doing ACA exams and thinking of studying with BPP, especially towards the later or more harder ACA exams, I would strongly advise you to not use BPP. Which takes even more time and there is always a risk you may never get this second feedback due to how poor BPP is run operationally. There were also many other parts of the course that were overlooked, this has happened with other courses too at BPP I have been on at BPP. BPP give false hope to students (inc. letting students with a 2.2 onto the course, who have almost no chance of getting a TC at any City firms). To anyone reading this review I was hoping it would give you some insight into how my experience studying for my ACA exams at BPP has been. 7 phone calls waiting on hold for a long time for them to finally send my study material, by which time I had two lectures without any material. The library facilities are very poor and the tutors unre

{'f': 0.003996619961483935, 'p': 1.0, 'r': 0.0020023112392590302}
Thoroughly enjoyed doing my course via Avado The course material was excellent, the portal looked very professional and I had great tutor support throughout my course. honest and easy to understand process - would recommend Alison was really helpful with booking the course Great learning and experience and online set up! The course advisors at Avado have been extremely helpful, and all of the course information sent through so far has been very informative, clear and easy to understand. So easy to start a course, well explained, course is easy to work your way through Very helpful and informative. Excellent communication and efficient service Great customer service David was a great explaining the course and making the enrolment process really clear and easy. INCORRECT COURSE MATERIAL - The course would often provide guidelines that were needed to complete assignments for the course then assessors would fail the assignme

### 1.3. Weighted Word from Each Company and Sentiment (no lemmatization)

In [42]:
companies = list(set(df['itemReviewed_name']))

for company in companies:
    print('Summary for {0} - good feedback:'.format(company))
    df_name_good = 'df_{0}_good'.format(company)
    feedback_combined = ' '.join(str(body) for body in df_companies[df_name_good]['reviewBody_clean'].to_list())
    reference = ' '.join(str(body).replace('\n','') for body in df_companies[df_name_good]['reviewBody'].to_list())
    sentences = sent_tokenize(reference)
    summary = weighted_occurence_summarisation(feedback_combined, sentences, 7, False)
    rouge = Rouge()
    if len(reference) > 0:
        score_rouge = rouge.get_scores(summary, reference)
        print(score_rouge[0]['rouge-1'])       
    print(summary, '\n')
    
    print('Summary for {0} - bad feedback:'.format(company))
    df_name_bad = 'df_{0}_bad'.format(company)
    feedback_combined = ' '.join(str(body) for body in df_companies[df_name_bad]['reviewBody_clean'].to_list())
    reference = ' '.join(str(body).replace('\n','') for body in df_companies[df_name_bad]['reviewBody'].to_list())
    sentences = sent_tokenize(reference)
    summary = weighted_occurence_summarisation(feedback_combined, sentences, 7, False)
    rouge = Rouge()
    if len(reference) > 0:
        score_rouge = rouge.get_scores(summary, reference)
        print(score_rouge[0]['rouge-1'])
    print(summary, '\n')

Summary for BPP - good feedback:
N/A 

Summary for BPP - bad feedback:
{'f': 0.0967494231898788, 'p': 1.0, 'r': 0.05083378160301237}
To anyone doing ACA exams and thinking of studying with BPP, especially towards the later or more harder ACA exams, I would strongly advise you to not use BPP. Which takes even more time and there is always a risk you may never get this second feedback due to how poor BPP is run operationally. There were also many other parts of the course that were overlooked, this has happened with other courses too at BPP I have been on at BPP. BPP give false hope to students (inc. letting students with a 2.2 onto the course, who have almost no chance of getting a TC at any City firms). To anyone reading this review I was hoping it would give you some insight into how my experience studying for my ACA exams at BPP has been. 7 phone calls waiting on hold for a long time for them to finally send my study material, by which time I had two lectures without any material. Th

{'f': 0.23420387323525216, 'p': 0.9928571428571429, 'r': 0.13276026743075453}
upon upgrade the  some courses they will denied you to get the certificate with argument (you did not pass the test). Unfortunately if YOU HAVE TAKEN THE TEST OR RECIEVED  A DIGITAL  / ORIGINAL PAPER CERTIFICATE you can not claim back the money!!! I previously did such courses in other platforms, my experience, I really regret signing up the course. They are refusing to refund this amount because I continued a free course which I had started prior to subscribing. Paid for a course did it and passed test. It's like learning the definition of the word "finances"... good for you now you know what it is, but you cannot find a job in the finances. Go to any market broker and tell them that you would like a job because you know what the definition of their job is. 

Summary for QA Ltd - good feedback:
{'f': 0.3707165078834154, 'p': 1.0, 'r': 0.22753346080305928}
Value for Money!I have completed multiple AXELOS Foun

{'f': 0.0037885498197749506, 'p': 1.0, 'r': 0.0018978700163844893}
Really helpful, easy to enrol, changed my course before I started, this was no hassle at all... Great course, really informative, loads of help. I did my course a while back, very efficient service, everyone was helpful and got through my course very quickly Had to cancel my course due to financial problems. so far so good everything about the learn direct is perfect Great experience,great start up and materials.really happy Lovely course and very helpful customer service. Looking forward to completing my course. Thank you. Only just started the course but very easy to assess the course website is very easy to use Really enjoying my experience so far! It’s great to know that someone from learn direct are always available to answer any questions Brilliant course very informative and great to study and do the course . 

Summary for learndirect Limited - bad feedback:
{'f': 0.00950953528186205, 'p': 1.0, 'r': 0.00477748351

{'f': 0.1871559615963939, 'p': 0.9935064935064936, 'r': 0.10330857528696827}
I enquired about workshop and exam days prior to booking the course and paying my huge costs as I knew that I would be getting married. I would still like to attend the March group and feel that I should not have to suffer due to the poor service of the consultants representing your company. Unfortunately, their 3 weeks period of waiting to receive assessment results is really frustrating especially as other providers are able to mark assessment within just a few hours! I purchased the level 5 diploma and after I completed the course, I was issued with the certificate. The system also isn’t the best and I had issues from time to time with it and had to call up and request support with errors. The worst experience in a course, read before enrol, please! I've got so far as to sign up for a course and that process has been satisfactory. 

Summary for Deloitte - good feedback:
{'f': 0.999999995, 'p': 1.0, 'r': 1.0

### 1.4. Weighted Word from Whole Dataset by Sentiment (with lemmatization)

In [48]:
good_feedback_combined = ' '.join(str(body) for body in df_good['lemmatized_reviewBody'].to_list())
reference = ' '.join(str(body).replace('\n','') for body in df_good['reviewBody'].to_list())
sentences = sent_tokenize(reference)
summary = weighted_occurence_summarisation(good_feedback_combined, sentences, 7, True)

# recall: how much the words in the reference appeared in the summaries
# precision: how much the words in the summary appeared in the reference
# f1 score: 2*(precision*recall)/(precision+recall)

rouge = Rouge()
score_rouge = rouge.get_scores(summary, reference)

print(score_rouge[0]['rouge-1'])
print(summary)

{'f': 0.0004426475157027678, 'p': 1.0, 'r': 0.00022137275511466934}
Excellent customer service. Interesting courses available Great choice of courses and very easy to enrol An easy to follow well thought out course and very informative. Really excited to work on with this course Good Course & Helpful staff.Very easy and stress-free sign up, all information provided to make easy to set up and login. I learned a lot from their copywriting course, creative course, marketing course and UI/UX design course. Really helpful, easy to enrol, changed my course before I started, this was no hassle at all... Great course, really informative, loads of help. Keep up the good work! Thoroughly enjoyed doing my course via Avado The course material was excellent, the portal looked very professional and I had great tutor support throughout my course.


In [49]:
bad_feedback_combined = ' '.join(str(body) for body in df_bad['lemmatized_reviewBody'].to_list())
reference = ' '.join(str(body).replace('\n','') for body in df_bad['reviewBody'].to_list())
sentences = sent_tokenize(reference)
summary = weighted_occurence_summarisation(bad_feedback_combined, sentences, 7, True)

# recall: how much the words in the reference appeared in the summaries
# precision: how much the words in the summary appeared in the reference
# f1 score: 2*(precision*recall)/(precision+recall)

rouge = Rouge()
score_rouge = rouge.get_scores(summary, reference)

print(score_rouge[0]['rouge-1'])
print(summary)

{'f': 0.0018295262084558224, 'p': 1.0, 'r': 0.0009156006710876595}
INCORRECT COURSE MATERIAL - The course would often provide guidelines that were needed to complete assignments for the course then assessors would fail the assignment for using their guidelines.5. Some Courses are good, some are really bad.I enrolled in some courses with my college email id to get premium courses free. I know a lot of work goes into creating courses, but without considered course delivery and implementation the course is little more than chunked information. The course highlights once you get a free nail kit with the course after paying at least 50% for the course which is fine by me. I took one course 3 years ago and didn't like it, I took another course recently but they are just getting worse. Firebrand do offer good courses and accelerated training, I have had 3 courses with Firebrand over the course of my apprenticeship. This is the first time I have studied through a distant learning course and th

### 1.5. Weighted Word from Each Company (with lemmatization)

In [46]:
companies = list(set(df['itemReviewed_name']))

for company in companies:
    print('Summary for {0}:'.format(company))
    df_name = 'df_{0}'.format(company)
    feedback_combined = ' '.join(str(body) for body in df_companies[df_name]['lemmatized_reviewBody'].to_list())
    reference = ' '.join(str(body).replace('\n','') for body in df_companies[df_name]['reviewBody'].to_list())
    sentences = sent_tokenize(reference)
    summary = weighted_occurence_summarisation(feedback_combined, sentences, 7, True)
    
    # recall: how much the words in the reference appeared in the summaries
    # precision: how much the words in the summary appeared in the reference
    # f1 score: 2*(precision*recall)/(precision+recall)

    rouge = Rouge()
    score_rouge = rouge.get_scores(summary, reference)
    
    print(score_rouge[0]['rouge-1'])
    print(summary, '\n')

Summary for BPP:
{'f': 0.09626215986200023, 'p': 1.0, 'r': 0.05056481979558903}
BPP give false hope to students (inc. letting students with a 2.2 onto the course, who have almost no chance of getting a TC at any City firms). Which takes even more time and there is always a risk you may never get this second feedback due to how poor BPP is run operationally. For students booking themselves onto their courses, look out for their terms and conditions if you do book courses with them as they can be quite unfavourable. There were also many other parts of the course that were overlooked, this has happened with other courses too at BPP I have been on at BPP. To anyone reading this review I was hoping it would give you some insight into how my experience studying for my ACA exams at BPP has been. 7 phone calls waiting on hold for a long time for them to finally send my study material, by which time I had two lectures without any material. I paid for Audit and Assurance course material (ACCA) o

{'f': 0.00390544703193649, 'p': 1.0, 'r': 0.001956544125218824}
honest and easy to understand process - would recommend Alison was really helpful with booking the course Great learning and experience and online set up! So easy to start a course, well explained, course is easy to work your way through Very helpful and informative. INCORRECT COURSE MATERIAL - The course would often provide guidelines that were needed to complete assignments for the course then assessors would fail the assignment for using their guidelines.5. Excellent communication and efficient service Great customer service David was a great explaining the course and making the enrolment process really clear and easy. The course advisors at Avado have been extremely helpful, and all of the course information sent through so far has been very informative, clear and easy to understand. Really easy process, with very helpful account managers I didn’t start the course  yet, so I cannot say anything about the course. This c

### 1.6. Weighted Word from Each Company and Sentiment (with lemmatization)

In [47]:
companies = list(set(df['itemReviewed_name']))

for company in companies:
    print('Summary for {0} - good feedback:'.format(company))
    df_name_good = 'df_{0}_good'.format(company)
    feedback_combined = ' '.join(str(body) for body in df_companies[df_name_good]['lemmatized_reviewBody'].to_list())
    reference = ' '.join(str(body).replace('\n','') for body in df_companies[df_name_good]['reviewBody'].to_list())
    sentences = sent_tokenize(reference)
    summary = weighted_occurence_summarisation(feedback_combined, sentences, 7, True)
    rouge = Rouge()
    if len(reference) > 0:
        score_rouge = rouge.get_scores(summary, reference)
        print(score_rouge[0]['rouge-1'])
    print(summary, '\n')
    
    print('Summary for {0} - bad feedback:'.format(company))
    df_name_bad = 'df_{0}_bad'.format(company)
    feedback_combined = ' '.join(str(body) for body in df_companies[df_name_bad]['lemmatized_reviewBody'].to_list())
    reference = ' '.join(str(body).replace('\n','') for body in df_companies[df_name_bad]['reviewBody'].to_list())
    sentences = sent_tokenize(reference)
    summary = weighted_occurence_summarisation(feedback_combined, sentences, 7, True)
    rouge = Rouge()
    if len(reference) > 0:
        score_rouge = rouge.get_scores(summary, reference)
        print(score_rouge[0]['rouge-1'])
    print(summary, '\n')

Summary for BPP - good feedback:
N/A 

Summary for BPP - bad feedback:
{'f': 0.09626215986200023, 'p': 1.0, 'r': 0.05056481979558903}
BPP give false hope to students (inc. letting students with a 2.2 onto the course, who have almost no chance of getting a TC at any City firms). Which takes even more time and there is always a risk you may never get this second feedback due to how poor BPP is run operationally. For students booking themselves onto their courses, look out for their terms and conditions if you do book courses with them as they can be quite unfavourable. There were also many other parts of the course that were overlooked, this has happened with other courses too at BPP I have been on at BPP. To anyone reading this review I was hoping it would give you some insight into how my experience studying for my ACA exams at BPP has been. 7 phone calls waiting on hold for a long time for them to finally send my study material, by which time I had two lectures without any material. I

{'f': 0.06485025832321198, 'p': 1.0, 'r': 0.03351175238538515}
I took one course 3 years ago and didn't like it, I took another course recently but they are just getting worse. Course: DevOps I have recently completed two Udacity "nanodegree" courses: UX Design and Product Management. A fraud, the free courses are taught well but the main (PAID) courses are taught badly and don't make sense. Each course gave me a better understanding of what the jobs are about (something I couldn't find in short courses from coursera or udemy). I have done a previous IT degree  -  the course is FAR TOO EXPENSIVE and the quality of teaching for the paid courses are terrible. The refund policy: You have only two days to decide if you want to continue doing the course or not. Ultimately, as interesting as the course seems, the inability to receive help made me ask for a refund. 

Summary for FutureLearn - good feedback:
{'f': 0.2577092488562994, 'p': 1.0, 'r': 0.14791403286978508}
For those of us lucky en

{'f': 0.005307062789685593, 'p': 1.0, 'r': 0.002660591384144787}
Excellent communication and efficient service Great customer service David was a great explaining the course and making the enrolment process really clear and easy. honest and easy to understand process - would recommend Alison was really helpful with booking the course Great learning and experience and online set up! So easy to start a course, well explained, course is easy to work your way through Very helpful and informative. The course advisors at Avado have been extremely helpful, and all of the course information sent through so far has been very informative, clear and easy to understand. This course was really simple to set up, great communication, the course looks great and the site looks easy to navigate round. Can’t wait to start the course in a few weeks Quick answer, quick proposal, great service Straight forward process with a most helpful advisor supporting you along the way. Great learning systems Very easy

{'f': 0.016419919083447667, 'p': 1.0, 'r': 0.008277921020491246}
Excellent courses for my needs as developer Pluralsight is awesome, for me the video courses hits the right spot with clear concise and well presented videos on the subject. I was trainsignal user so i expect more system administration courses, currently it appears more application development courses are released I really love the courses available on Pluralsight. The only problem is that I have not enough time to watch all courses :-) High quality courses, great authors.I Recommend Pluralsight to anyone who is interested in development. Pluralsight is great learning portal to get started to any technological courses you would like to pursue and get insight on that technology. Great place to learn new stuff Glad that i am long term user of Pluralsight, It is helping me to learn .Net as well as other latest technologies. updated topics with good teachers Use it every day to learn new things I have learned a lot from the f

## 2. TF-IDF Method

The difference with the weighted word occurences method should be that in TF-IDF, highly frequent words (excluding stopwords) in many sentences are penalised. Hence, the scoring algorithm is different (using TD-IDF method) which processes per sentence rather than per document.

In [50]:
def tf_idf_summarisation(combined_cleaned_feedback, original_feedback_sentences, n_sentence, is_lemmatized, original_lemmatized_feedback_sentences):
    '''
    combined_cleaned_feedback: a string of all cleaned feedback combined (with or without lemmatization)
    original_feedback_sentences: an array of original feedback sentences
    n_sentence: n number of sentences to be taken to form the summary
    is_lemmatized: whether to lemmatize words in calculating the word occurrences and the score
    original_lemmatized_feedback_sentences: an array of original lemmatized feedback sentences (used only if is_lemmatized)
    '''
        
    # calculate inverse document frequency idf
    # IDF(w) = log(Total number of sentences / Number of sentences with word w in it)
    word_idf = {}
    for word in word_tokenize(combined_cleaned_feedback):
        if word in word_idf.keys():
            continue
        contained_sentence = 0 
                
        if is_lemmatized:
            for sent in original_lemmatized_feedback_sentences:
                if word in sent.lower():
                    contained_sentence += 1
        else:
            for sent in original_feedback_sentences:
                if word in sent.lower():
                    contained_sentence += 1
        
        if contained_sentence == 0: contained_sentence += 1 # smoothing, to avoid zero division
        
        word_idf[word] = np.log(len(original_feedback_sentences)/contained_sentence)
        
    # calculate sentence score
    sentence_scores = {}

    for sent in original_feedback_sentences:
        sent_cleaned = " ".join(re.findall("[a-zA-Z]+", sent)).lower()
        sent_cleaned = " ".join([i for i in sent_cleaned.split() if i not in stopwords])
        sentence_scores[sent] = calculate_sentence_score(sent_cleaned, word_idf, is_lemmatized)
        
    sorted_sentences = dict(sorted(sentence_scores.items(), key=lambda item: item[1], reverse=True))
    
    # take n number of sentences
    summary = []
    n = 0
    for key, value in sorted_sentences.items():
        if n == n_sentence: break
        if value <= 15: # filter out too high score to avoid short generic feedback, such as Thank you, Wonderful, etc
            summary.append(key)
            n += 1

    return ' '.join(summary)

# calculate sentence score
def calculate_sentence_score(sentence, word_idf, is_lemmatized):
    sentence_score = 0
    tokenized_sentence = word_tokenize(sentence)
    for word in tokenized_sentence:
        # calculate term frequency tf
        # TF(w) = (Number of times word w appears in a sentence) / (Total number of words in the sentence) 
        tf = sentence.count(word)/len(tokenized_sentence)

        # get the IDF score from dictionary
        if is_lemmatized:
            word = lemmatize_word(word)
        if word in word_idf.keys():
            idf = word_idf[word]
        else:
            idf = 0

        sentence_score += tf*idf
    return sentence_score

### 2.1. TF-IDF from Whole Dataset by Sentiment (no lemmatization)

In [51]:
good_feedback_combined = ' '.join(str(body) for body in df_good['reviewBody_clean'].to_list())
reference = ' '.join(str(body).replace('\n','') for body in df_good['reviewBody'].to_list())
sentences = sent_tokenize(reference)
summary = tf_idf_summarisation(good_feedback_combined, sentences, 7, False, None)

# recall: how much the words in the reference appeared in the summaries
# precision: how much the words in the summary appeared in the reference
# f1 score: 2*(precision*recall)/(precision+recall)

rouge = Rouge()
score_rouge = rouge.get_scores(summary, reference)

print(score_rouge[0]['rouge-1'])
print(summary)

{'f': 0.001198526914775933, 'p': 0.9971014492753624, 'r': 0.0005996238406255611}
The Kafka module covers the majority of components/tools used in Kafka ecosystem, the Teacher has a very good knowledge and excellent slides complemented by code examples to explain how Kafka works.The project gave me an excellent overview about kafka and related components of kafka ecosystem. !Leaders inspire other leaders to make more leaders that’s the only way! It was a wonderful education journey…It was a wonderful education journey that will satisfy my needs and meet all my expectations,In Digital Marketing Nanodegree they will teach all points and have a great practicing with live monitoring for all the sectionsThank you Udacity It was a wonderful education journey…It was a wonderful education journey that will satisfy my needs and meet all my expectations,In Digital Marketing Nanodegree they will teach all points and have a great practicing with live monitoring for all the sectionsThank you Udacity

In [52]:
bad_feedback_combined = ' '.join(str(body) for body in df_bad['reviewBody_clean'].to_list())
reference = ' '.join(str(body).replace('\n','') for body in df_bad['reviewBody'].to_list())
sentences = sent_tokenize(reference)
summary = tf_idf_summarisation(bad_feedback_combined, sentences, 7, False, None)

# recall: how much the words in the reference appeared in the summaries
# precision: how much the words in the summary appeared in the reference
# f1 score: 2*(precision*recall)/(precision+recall)

rouge = Rouge()
score_rouge = rouge.get_scores(summary, reference)

print(score_rouge[0]['rouge-1'])
print(summary)

{'f': 0.0017555641160730056, 'p': 1.0, 'r': 0.000878553245089893}
scam scam scam Most pathetic scam ever, never ever fall for such scam. Horrible Horrible horrible company. Horrid Horrid experience. Excuse after excuse! There is 1 dish choice for lunch and 1 dish choice for dinner. I believe sample questions provided by was wastage of time as exam question was no where related to those.If instructor advised to buy the sample questions book from ISC2 website then that should have been advised well before so people can buy in advance.i believe sample questions provided should be replaced witht he sample questions book from ISC as only around 35-40 candidates passed which was well below the % claimed. I am too used to XML rather than http and you should see the bizarre effects you can get:if instead of writing < i glyphicon glypohicon-world > < / i > you write  < i  glyphicon glypohicon-world / > In my instance I only got 5 distinct instances of the icon instead of the expected 1.


### 2.2. TF-IDF from Each Company (no lemmatization)

In [53]:
companies = list(set(df['itemReviewed_name']))

for company in companies:
    print('Summary for {0}:'.format(company))
    df_name = 'df_{0}'.format(company)
    feedback_combined = ' '.join(str(body) for body in df_companies[df_name]['reviewBody_clean'].to_list())
    reference = ' '.join(str(body).replace('\n','') for body in df_companies[df_name]['reviewBody'].to_list())
    sentences = sent_tokenize(reference)
    summary = tf_idf_summarisation(feedback_combined, sentences, 7, False, None)
    # recall: how much the words in the reference appeared in the summaries
    # precision: how much the words in the summary appeared in the reference
    # f1 score: 2*(precision*recall)/(precision+recall)

    rouge = Rouge()
    score_rouge = rouge.get_scores(summary, reference)

    print(score_rouge[0]['rouge-1'])
    print(summary, '\n')

Summary for BPP:
{'f': 0.26261682014857635, 'p': 1.0, 'r': 0.1511565357719204}
I then rang customer service who said that I had emailed the wrong email address and should have emailed the first address which is why it never got booked, even though I had emailed that address in the first place.There has been another occasion where i emailed in and didn't get a response after 2 weeks.I have had a number of calls to customer service where they have been rude and unhelpful, and in one case I was told an issue would be rectified, I rang again a few days later (a lovely lady answered) and she informed me that the previous call handler didn't even log an issue in the first place.I have had issues with access to my hub account, which have taken 2 weeks to sort, which meant i was 2 weeks behind on my course as i had no access to learning materials; however they refused to move me to a course which had a later start date, so I have been unable to qualify for pass assurance.On a number of times I

{'f': 0.4410517353237782, 'p': 1.0, 'r': 0.28291621327529926}
Second my certificate says .... ''The certificate and transcript do not implythe award of credit or the conferment of a qualification from Universityof Leeds and National Institute for Health Research''...the course was clearly by NIHR and University   if Leeds I was not warned or told about this .. Is there any value in this certificate?Look at this from Futurelearn;  Is FutureLearn accredited?All certificates state the minimum number of hours required per week and the length of the course, so you can claim that many hours of CPD (Continuing Professional Development) by showing your employer or organisation the certificate.The majority of learners who purchase them add them to their CV, but it's important to note that certificates do not imply the award of credit points or the conferment of a university qualification.Where a course or program is accredited this information will be included on the course  .... this is from f

{'f': 0.38126009383807047, 'p': 0.9971830985915493, 'r': 0.23568575233022637}
In my experience the revenue-model of Linkedin has changed into a no “matter what” revenue-model. Focus on skills, request skills endorsements and recommendations. Disable their app but it's re-enabled every time motorola send a "security update" - or rather an insecurity update. Also connect with many colleges with similar roles and exchange in intelligent STORIES that can a) attract comments of experts with similar roles b) most importantly when recruiters see your cv they will realise you CAN EN-LIGHT THE EXPERTS!! At the time I accepted this, the text on the website assured me that I would be contacted before any payment for this subscription was taken, after the free trial period ended.Of course I never received any notification, and they just took the money, via PayPal.LinkedIn's customer support email address, as supplied to PayPal, is unmonitored, which rang alarm bells.PayPal, unsurprisingly, found i

### 2.3. TF-IDF from Each Company and Sentiment (no lemmatization)

In [54]:
companies = list(set(df['itemReviewed_name']))

for company in companies:
    print('Summary for {0} - good feedback:'.format(company))
    df_name_good = 'df_{0}_good'.format(company)
    feedback_combined = ' '.join(str(body) for body in df_companies[df_name_good]['reviewBody_clean'].to_list())
    reference = ' '.join(str(body).replace('\n','') for body in df_companies[df_name_good]['reviewBody'].to_list())
    sentences = sent_tokenize(reference)
    summary = tf_idf_summarisation(feedback_combined, sentences, 7, False, None)
    rouge = Rouge()
    if len(reference) > 0:
        score_rouge = rouge.get_scores(summary, reference)
        print(score_rouge[0]['rouge-1'])
    print(summary, '\n')
    
    print('Summary for {0} - bad feedback:'.format(company))
    df_name_bad = 'df_{0}_bad'.format(company)
    feedback_combined = ' '.join(str(body) for body in df_companies[df_name_bad]['reviewBody_clean'].to_list())
    reference = ' '.join(str(body).replace('\n','') for body in df_companies[df_name_bad]['reviewBody'].to_list())
    sentences = sent_tokenize(reference)
    summary = tf_idf_summarisation(feedback_combined, sentences, 7, False, None)
    rouge = Rouge()
    if len(reference) > 0:
        score_rouge = rouge.get_scores(summary, reference)
        print(score_rouge[0]['rouge-1'])
    print(summary, '\n')

Summary for BPP - good feedback:
 

Summary for BPP - bad feedback:
{'f': 0.26261682014857635, 'p': 1.0, 'r': 0.1511565357719204}
I then rang customer service who said that I had emailed the wrong email address and should have emailed the first address which is why it never got booked, even though I had emailed that address in the first place.There has been another occasion where i emailed in and didn't get a response after 2 weeks.I have had a number of calls to customer service where they have been rude and unhelpful, and in one case I was told an issue would be rectified, I rang again a few days later (a lovely lady answered) and she informed me that the previous call handler didn't even log an issue in the first place.I have had issues with access to my hub account, which have taken 2 weeks to sort, which meant i was 2 weeks behind on my course as i had no access to learning materials; however they refused to move me to a course which had a later start date, so I have been unable t

{'f': 0.04602699964092676, 'p': 1.0, 'r': 0.023555596772697765}

Summary for Udacity - good feedback:
{'f': 0.030759043819213232, 'p': 1.0, 'r': 0.015619746293749045}
It was a wonderful education journey…It was a wonderful education journey that will satisfy my needs and meet all my expectations,In Digital Marketing Nanodegree they will teach all points and have a great practicing with live monitoring for all the sectionsThank you Udacity It was a wonderful education journey…It was a wonderful education journey that will satisfy my needs and meet all my expectations,In Digital Marketing Nanodegree they will teach all points and have a great practicing with live monitoring for all the sectionsThank you Udacity I jumped into the nanodegree course not knowing what to expect. The Kafka module covers the majority of components/tools used in Kafka ecosystem, the Teacher has a very good knowledge and excellent slides complemented by code examples to explain how Kafka works.The project gave me

{'f': 0.6632653016894002, 'p': 1.0, 'r': 0.4961832061068702}
Went through the full process with them, applied for a role, did multiple interviews and was told that I had the role.Then weeks went by with no contact, I tried to contact both of the staff at QA dealing with the role no response via email and won't pick up the phone.Contacted head office and they can't get me through to them either or talk about the position.So I resorted to contacting the business who would have been taking me on as an apprentice who have also told me they had been cut off by QA.This isn't just days or weeks of being cut off now it's been months. I recently attended a 3 day BCS Foundation Certificate in Business Analyst provided by this company recently. I am not someone who typical writes reviews, however I do feel let down by this organisation and can really relate to the numerous poor reviews attributed to this company. Was appalled by anti Irish racist commentary and aggressive teaching style. The cust

{'f': 0.006043941024930135, 'p': 1.0, 'r': 0.0030311305297651557}
The best investment is in knowledge and investment In our knowledge will be a very good investment which will yield life long returns. I have already recommended to my friends and some of them already were already in contacted .... I was told if I put a deposit down I could change my mind within 14 days but when I came to change my mind after a few hours in was told the deposit was nonrefunadable and I'd have to cancel in writing by post which was not explained to me; I was lead to believe I could just change my mind within 14 days easier than the actual process. I have been unwell this last week and added unnecessary stress upon myself regarding my studies.David Cox contacted me to see how I was getting on and within minutes Saraya Morgan contacted me.Saraya actively listened to me; she was empathic and acknowledged my anxieties and concerns;Saraya gave me advice and guidance which has reduced my anxieties and her appro

{'f': 0.03616950248101564, 'p': 1.0, 'r': 0.018417833355987494}
if instead of writing < i glyphicon glypohicon-world > < / i > you write  < i  glyphicon glypohicon-world / > In my instance I only got 5 distinct instances of the icon instead of the expected 1. 2) On the website: when you actually load up the course details and see the course overview with the chapters and episodes, it would be very nice if, when you have already watched some episodes of a chapter, to have the most recently watched chapter episode list automatically open out and the browser scroll down to the next unwatched episode - sounds kind of minor but it's actually pretty annoying after a while. 4) Chromecast: Chromecast is supported on the iPhone (I think), please can this be done for android devices as i love the chromecast and find it really useful. The rewind and fast-forward features need to become more prominent in the interface and allow granular control so that I can move back three seconds exactly, instea

### 2.4. TF-IDF from Whole Dataset by Sentiment (with lemmatization)

In [55]:
good_feedback_combined = ' '.join(str(body) for body in df_good['lemmatized_reviewBody'].to_list())
reference = ' '.join(str(body).replace('\n','') for body in df_good['reviewBody'].to_list())
sentences = sent_tokenize(reference)
lemmatized_sentences = sent_tokenize(' '.join(str(body).replace('\n','') for body in df_good['lemmatized_reviewBody_original'].to_list()))
summary = tf_idf_summarisation(good_feedback_combined, sentences, 7, True, lemmatized_sentences)

# recall: how much the words in the reference appeared in the summaries
# precision: how much the words in the summary appeared in the reference
# f1 score: 2*(precision*recall)/(precision+recall)

rouge = Rouge()
score_rouge = rouge.get_scores(summary, reference)

print(score_rouge[0]['rouge-1'])
print(summary)

{'f': 0.0008119499436721188, 'p': 0.9957264957264957, 'r': 0.00040614056647022016}
The Kafka module covers the majority of components/tools used in Kafka ecosystem, the Teacher has a very good knowledge and excellent slides complemented by code examples to explain how Kafka works.The project gave me an excellent overview about kafka and related components of kafka ecosystem. My favorite course was Seth Godin's "Seth Godin's Freelancer Course." !Leaders inspire other leaders to make more leaders that’s the only way! It was a wonderful education journey…It was a wonderful education journey that will satisfy my needs and meet all my expectations,In Digital Marketing Nanodegree they will teach all points and have a great practicing with live monitoring for all the sectionsThank you Udacity It was a wonderful education journey…It was a wonderful education journey that will satisfy my needs and meet all my expectations,In Digital Marketing Nanodegree they will teach all points and have a gre

In [57]:
bad_feedback_combined = ' '.join(str(body) for body in df_bad['lemmatized_reviewBody'].to_list())
reference = ' '.join(str(body).replace('\n','') for body in df_bad['reviewBody'].to_list())
sentences = sent_tokenize(reference)
lemmatized_sentences = sent_tokenize(' '.join(str(body).replace('\n','') for body in df_bad['lemmatized_reviewBody_original'].to_list()))
summary = tf_idf_summarisation(bad_feedback_combined, sentences, 7, True, lemmatized_sentences)

# recall: how much the words in the reference appeared in the summaries
# precision: how much the words in the summary appeared in the reference
# f1 score: 2*(precision*recall)/(precision+recall)

rouge = Rouge()
score_rouge = rouge.get_scores(summary, reference)

print(score_rouge[0]['rouge-1'])
print(summary)

{'f': 0.0017555641160730056, 'p': 1.0, 'r': 0.000878553245089893}
scam scam scam Most pathetic scam ever, never ever fall for such scam. Horrible Horrible horrible company. Excuse after excuse! Horrid Horrid experience. There is 1 dish choice for lunch and 1 dish choice for dinner. I believe sample questions provided by was wastage of time as exam question was no where related to those.If instructor advised to buy the sample questions book from ISC2 website then that should have been advised well before so people can buy in advance.i believe sample questions provided should be replaced witht he sample questions book from ISC as only around 35-40 candidates passed which was well below the % claimed. I am too used to XML rather than http and you should see the bizarre effects you can get:if instead of writing < i glyphicon glypohicon-world > < / i > you write  < i  glyphicon glypohicon-world / > In my instance I only got 5 distinct instances of the icon instead of the expected 1.


### 2.5. TF-IDF from Each Company (with lemmatization)

In [58]:
companies = list(set(df['itemReviewed_name']))

for company in companies:
    print('Summary for {0}:'.format(company))
    df_name = 'df_{0}'.format(company)
    feedback_combined = ' '.join(str(body) for body in df_companies[df_name]['lemmatized_reviewBody'].to_list())
    reference = ' '.join(str(body).replace('\n','') for body in df_companies[df_name]['reviewBody'].to_list())
    sentences = sent_tokenize(reference)
    lemmatized_sentences = sent_tokenize(' '.join(str(body).replace('\n','') for body in df_companies[df_name]['lemmatized_reviewBody_original'].to_list()))
    summary = tf_idf_summarisation(feedback_combined, sentences, 7, True, lemmatized_sentences)
    # recall: how much the words in the reference appeared in the summaries
    # precision: how much the words in the summary appeared in the reference
    # f1 score: 2*(precision*recall)/(precision+recall)

    rouge = Rouge()
    score_rouge = rouge.get_scores(summary, reference)

    print(score_rouge[0]['rouge-1'])
    print(summary, '\n')

Summary for BPP:
{'f': 0.26261682014857635, 'p': 1.0, 'r': 0.1511565357719204}
I then rang customer service who said that I had emailed the wrong email address and should have emailed the first address which is why it never got booked, even though I had emailed that address in the first place.There has been another occasion where i emailed in and didn't get a response after 2 weeks.I have had a number of calls to customer service where they have been rude and unhelpful, and in one case I was told an issue would be rectified, I rang again a few days later (a lovely lady answered) and she informed me that the previous call handler didn't even log an issue in the first place.I have had issues with access to my hub account, which have taken 2 weeks to sort, which meant i was 2 weeks behind on my course as i had no access to learning materials; however they refused to move me to a course which had a later start date, so I have been unable to qualify for pass assurance.On a number of times I

{'f': 0.372010625846964, 'p': 1.0, 'r': 0.22850924918389554}
Second my certificate says .... ''The certificate and transcript do not implythe award of credit or the conferment of a qualification from Universityof Leeds and National Institute for Health Research''...the course was clearly by NIHR and University   if Leeds I was not warned or told about this .. Is there any value in this certificate?Look at this from Futurelearn;  Is FutureLearn accredited?All certificates state the minimum number of hours required per week and the length of the course, so you can claim that many hours of CPD (Continuing Professional Development) by showing your employer or organisation the certificate.The majority of learners who purchase them add them to their CV, but it's important to note that certificates do not imply the award of credit points or the conferment of a university qualification.Where a course or program is accredited this information will be included on the course  .... this is from fu

{'f': 0.012863070411615676, 'p': 1.0, 'r': 0.006473167675923992}
I was scared at first because all my life I have only known face-to-face and classroom sessions. Don't hassle you continuously which is a breath of fresh air. I find this behavior inappropriate. My introductory webinar is tonight at 8pm, wish me luck! Can't fault it! Very smooth transaction. Whats the point of a provisional booking if it actually can't book anything? 

Summary for Deloitte:
{'f': 0.9408283973780681, 'p': 0.9875776397515528, 'r': 0.8983050847457628}
The most dishonest unprofessional company I've ever been involved with Very bad company !They won't Answer any of My E-mails, just Tell Me to Ask Scottish Power, who have NO Contact details for the Administrators other then "help" at "tonikenergy.com"The Administrators need to Transfer Customers Credits to Scottish Power NOW, People need that Credit to Pay for Winter Energy Usage !!!! Absolutely appalling experience when trying to obtain any information or inde

### 2.6. TF-IDF from Each Company and Sentiment (with lemmatization)

In [59]:
companies = list(set(df['itemReviewed_name']))

for company in companies:
    print('Summary for {0} - good feedback:'.format(company))
    df_name_good = 'df_{0}_good'.format(company)
    feedback_combined = ' '.join(str(body) for body in df_companies[df_name_good]['lemmatized_reviewBody'].to_list())
    reference = ' '.join(str(body).replace('\n','') for body in df_companies[df_name_good]['reviewBody'].to_list())
    sentences = sent_tokenize(reference)
    lemmatized_sentences = sent_tokenize(' '.join(str(body).replace('\n','') for body in df_companies[df_name_good]['lemmatized_reviewBody_original'].to_list()))
    summary = tf_idf_summarisation(feedback_combined, sentences, 7, True, lemmatized_sentences)
    rouge = Rouge()
    if len(reference) > 0:
        score_rouge = rouge.get_scores(summary, reference)
        print(score_rouge[0]['rouge-1'])
    print(summary, '\n')
    
    print('Summary for {0} - bad feedback:'.format(company))
    df_name_bad = 'df_{0}_bad'.format(company)
    feedback_combined = ' '.join(str(body) for body in df_companies[df_name_bad]['lemmatized_reviewBody'].to_list())
    reference = ' '.join(str(body).replace('\n','') for body in df_companies[df_name_bad]['reviewBody'].to_list())
    sentences = sent_tokenize(reference)
    lemmatized_sentences = sent_tokenize(' '.join(str(body).replace('\n','') for body in df_companies[df_name_bad]['lemmatized_reviewBody_original'].to_list()))
    summary = tf_idf_summarisation(feedback_combined, sentences, 7, True, lemmatized_sentences)
    rouge = Rouge()
    if len(reference) > 0:
        score_rouge = rouge.get_scores(summary, reference)
        print(score_rouge[0]['rouge-1'])
    print(summary, '\n')

Summary for BPP - good feedback:
 

Summary for BPP - bad feedback:
{'f': 0.26261682014857635, 'p': 1.0, 'r': 0.1511565357719204}
I then rang customer service who said that I had emailed the wrong email address and should have emailed the first address which is why it never got booked, even though I had emailed that address in the first place.There has been another occasion where i emailed in and didn't get a response after 2 weeks.I have had a number of calls to customer service where they have been rude and unhelpful, and in one case I was told an issue would be rectified, I rang again a few days later (a lovely lady answered) and she informed me that the previous call handler didn't even log an issue in the first place.I have had issues with access to my hub account, which have taken 2 weeks to sort, which meant i was 2 weeks behind on my course as i had no access to learning materials; however they refused to move me to a course which had a later start date, so I have been unable t

{'f': 0.023143392121235097, 'p': 1.0, 'r': 0.01170716796576494}
It was a wonderful education journey…It was a wonderful education journey that will satisfy my needs and meet all my expectations,In Digital Marketing Nanodegree they will teach all points and have a great practicing with live monitoring for all the sectionsThank you Udacity It was a wonderful education journey…It was a wonderful education journey that will satisfy my needs and meet all my expectations,In Digital Marketing Nanodegree they will teach all points and have a great practicing with live monitoring for all the sectionsThank you Udacity I jumped into the nanodegree course not knowing what to expect. The Kafka module covers the majority of components/tools used in Kafka ecosystem, the Teacher has a very good knowledge and excellent slides complemented by code examples to explain how Kafka works.The project gave me an excellent overview about kafka and related components of kafka ecosystem. i learned following after

{'f': 0.6155878424922826, 'p': 0.9957264957264957, 'r': 0.44550669216061184}
The course materials are better structured, more comprehensive in covering the syllabus and better quality in this presentation.The structure of the day and the week is well organised and admin around organising the actual exams is made as easy as possible with the exams being booked in advance to take place within the week of the course (unlike competitors who have moved away from this to drive down their own costs! Thank you Had a fantastic one day course led by Dragoslav, there's only so much you can learn in one day but this introduction from QA gave me a great start has inspired me to continue my web development journey. Good quality training - good trainers and well designed course materials. The trainers are both well experienced in the relevant field but also talented trainers who understand how to train a room full of individuals with differing experiences themselves and ways of Learning. Any question

{'f': 0.0015552948080563985, 'p': 0.9827586206896551, 'r': 0.0007782632441288913}
The best investment is in knowledge and investment In our knowledge will be a very good investment which will yield life long returns. Ameeen! It was a no brainier. They are distinctively illustrative and profoundly effectuating. And for that Diolch! No drama. I have already recommended to my friends and some of them already were already in contacted .... 

Summary for learndirect Limited - bad feedback:
{'f': 0.013206087049621036, 'p': 1.0, 'r': 0.006646933582593342}
Horrible Horrible horrible company. Excuse after excuse! So I enrolled and 15 days later call to explain that I have come across some financial emergencies and won’t be able to make payments, I was told you cannot cancel the course after 14 days, so I even agreed to pay for the first month even though I didn’t do anything at all, haven’t started but now I’m told whether I continue with the course or not I am still required to pay the whole £

## 3. BERT Extractive Summarisation Library

Can only handle text less than 1000000 length (number of characters).

In [31]:
def bert_summarisation(sentences):
    if len(sentences) > 1000000:
        return 'Feedback length exceeds 1000000 number of characters. It is going to cause memory allocation error.'
    model = Summarizer()
    result = model(sentences, min_length=30, num_sentences=7)
    summary = ''.join(result)
    return summary

### 3.1. BERT from Whole Dataset by Sentiment

In [22]:
sentences = ' '.join(str(body).replace('\n','') for body in df_good['reviewBody'].to_list())
bert_summarisation(sentences)

'Feedback length exceeds 1000000 number of characters. It is going to cause memory allocation error.'

In [23]:
sentences = ' '.join(str(body).replace('\n','') for body in df_bad['reviewBody'].to_list())
bert_summarisation(sentences)

'Feedback length exceeds 1000000 number of characters. It is going to cause memory allocation error.'

### 3.2. BERT from Each Company

In [24]:
companies = list(set(df['itemReviewed_name']))

for company in companies:
    print('Summary for {0}:'.format(company))
    df_name = 'df_{0}'.format(company)
    sentences = ' '.join(str(body).replace('\n','') for body in df_companies[df_name]['reviewBody'].to_list())
    summary = bert_summarisation(sentences)
    print(summary, '\n')

Summary for General Assembly:
I wasn't a student, I was a teacher here. I taught the full-stack web-development course. I'd always thought that bootcamps were a nice idea for getting a very basic level of competence, but were pretty pricey for the privilege for what you could get from an online course. So I was pleasantly surprised at how effective this teaching style was for ramping up students from basically knowing nothing about coding to being a hireable junior developer. 

Summary for Udacity:
it's excellent and better than other sites I experienced.specially great support from tutors.but for being better:1. I will nominate Udasti to anyone who need to learns marketing it going pretty good, decent information loved how it was presented and the support from the team Udacity Android Nanodegree is GoodThere are some things that are confusing when doing the project, like things not specified or vaguely defined that make you think the project needs things that weren't taught. Altogethe

Did a free course and enjoyed it - shame the good ones are difficult to find I purchased the course How to Study & Learn Effectively- Study & Learning Skills- I found the course effective and not too long. I had my exams booked and bought a practise test for mock exams, to my surprise I was not able to access it just a week before my actual exam, I asked udemy for help and they said they can't help me as its not uncommon for the instructors on the site to remove a course outside of our user agreement policies. These people have really bad systems, lock your account if you login from different IPS -- I travel all the time, what the hell?!!When I asked their customer service they were super unhelpful,  "oh it's how the system works" == not our problem, we don't care. I am learning new things in this lockdown period. It is a very cool platform where you can study everything you want for a cheap price. There are few exceptional experts in their fields on this platform but they are very rar

### 3.3. BERT from Each Company and Sentiment

In [32]:
companies = list(set(df['itemReviewed_name']))

for company in companies:
    print('Summary for {0} - good feedback:'.format(company))
    df_name_good = 'df_{0}_good'.format(company)
    sentences = ' '.join(str(body).replace('\n','') for body in df_companies[df_name_good]['reviewBody'].to_list())
    summary = bert_summarisation(sentences)
    print(summary, '\n')
    
    print('Summary for {0} - bad feedback:'.format(company))
    df_name_bad = 'df_{0}_bad'.format(company)
    sentences = ' '.join(str(body).replace('\n','') for body in df_companies[df_name_bad]['reviewBody'].to_list())
    summary = bert_summarisation(sentences)
    print(summary, '\n')

Summary for DPG plc - good feedback:
I waited 4 months to review DPG so that it can be more insightful. Hi there, I will leave a short review due the fact that I just enrolled on the CIPD course, but so far the staff from DPG were most helpful and prompt in my requests by phone or by email. I was very close to signing up with a different provider before discovered DPG, the reason why I changed my opinion in the last minutes is because customer service is a big thing for me. I have really enjoyed my onboarding experience with DPG.The signup process was quick and easy. All my questions have been answered quickly in a professional way I am due to commence my Level 5 HRM Diploma on Monday 3rd February so have yet to comment on the learning provider itself as of yet, but I must say that the Customer Service has been absolutely exceptional. Excellent customer service and support. thank you so much i love dealing with you Friendly staff and advice, an easy process to join the course. Quick to

No email or explanation to the users. less practical more theoretical discussion which is less interesting. I have not found a way to look at my course history, beyond the 10 'most recent' courses. The training path was a bit disappointing because it was added with the blog section rather than adopt Digital Tutors training path style system. These people are scammers/manipulative... don't let them get you This building just went up in Draper it is an eye sore,  They promoted global authors. The only option is to delete your account along with payment history. But instead of rolling back and fixing the major issues, software is still in production but seriously lacking ,especially the mobile one (tablet), Offline is not offline because requires from time to time to have a connection, very handy if you are on a train,...

Summary for learndirect Limited - good feedback:
Really easy and simple to cancel. I feel really supported on my course. Due to my personal finance result I couldn't st

 

Summary for Udacity - good feedback:
it's excellent and better than other sites I experienced.specially great support from tutors.but for being better:1. it helps me a lot, obviously my skills is upgrading, I enjoyed this practical, valuable experience The program and the project are very useful . Altogether I am very happy with my decision to take the course. By the end of the course I'll have paid a lot more, because everything I pay in AWS is USD and my currency is extremely undervalued compared to USD I had very good learning experience so far for the first half of the Data Streaming Nanodegree program. The library of questions from past students and the fast response time of the mentors in the forum also helps in solving technical problems encountered during the project. I do recommend Udacity programs for the people who want to challenge themselves to be highly qualified for the labor market This is my first experience in programming, so it wasn't fun at all when it came to su

Most people do not how to get the most of LinkedIn. Focus on skills, request skills endorsements and recommendations. Even if you apply in other job-boards, i.e. indeed, cv library, always provide  your LinkedIn page, so they realise you are endorsed and recommended. and your are fully endorsed and recommended. Extremely Poor Customer Service At LinkedIn:You literally cannot get hold of a human being - either by email, feedback forms or phone. As far as I am concerned LinkedIn is housed by a set of useless, faceless bureaucrats supposedly supporting the business community. What the hell do LinkedIn think they are playing at?1 star only our of 5. Great for networking and reaching out to old work mates 

Summary for Linkedin - bad feedback:
When a paid service gets renewed you have to send a reminder message!Feels like a scam: not even a thank you email after getting a new payment from you. Very unusual way to treat paying customers. To be honest - the worst paid subscription model I hav