> # Summarize Nike Reviews

In this notebook, our objective is to import customer reviews obtained through web scraping and subsequently generate a comprehensive CSV report summarizing pertinent information pertaining to the specified Nike product.

To achieve this, we will create a function named `summarize()` with the following functionalities:

1. Accept as a parameter the path to a csv file created by the first Notebook


2. Create a 1-page pdf file that includes a summary of all the reviews in the csv.


3. The nature of the summary is entirely free
    - It can be text-based, visual-based, or a combination of both
    - We should define what is important enough to be included in the summary.
    - We will focus on creating a summary that would be the most informative for customers.
    - The creation of the pdf should be done through the notebook with the use of every Python-based library we want. 


**Import Libraries**

In [1]:
#import libraries
from functions.summary_functions import *

**Create Function to summarize the reviews**

In [2]:
def summarize(path_csv_file):
    """
    Input: 
    A query  containing the path of the CSV file containing the reviews

    Function: 
    Importing the csv file from the current path, analyzing the reviews and create a Summary
    
    Output:
    A PDF file, containg a Summary of product's reviews
    
    """
    
    #import csv
    reviews_eng = pd.read_csv(path_csv_file)
    
    """
    1. At first, we make a correction of dates, due to greek-text in months
    """
    print('1. At first, we make a correction of dates, due to greek-text in months')
    
    #correction of dates
    reviews_eng = date_correction(reviews_eng)
    
    print('Done')
    
    """
    2. Next, we will do a text analysis in order to find the frequencies of each word of all the reviews
    """
    print('2. Next, we will do a text analysis in order to find the frequencies of each word of all the reviews')
    
    #define 
    nlp = spacy.load("en_core_web_lg")
    
    #find frequencies
    freq = find_freq(reviews_eng)
    
    print('Done')
    
    """
    3. After, we are going to do Aspect Mining of all the reviews using the Network Analysis method

    """
    print('3. After, we are going to do Aspect Mining of all the reviews using the Network Analysis method')
      
    #Use network analysis to group and extract semantically similar aspects
    aspects = get_aspects_from_undirected_graphs(freq,nx.algorithms.components.connected_components,0.8)
    
    print('Done')
    
    """
    4. Following, we will do Opinion Mining
    """
    print('4. Following, we will do Opinion Mining')
       
    #extract opinons
    opinions = get_opinions(reviews_eng.content, aspects)
    
    print('Done')
    
    """
    5. Next, we are going to find the positive and the negative aspects
    """
    print('5. Next, we are going to find the positive and the negative aspects')
    
    #classify aspects into positives and negatives
    positive_aspects, negative_aspects = classification_of_aspects(aspects,opinions)
    
    print('Done')
    
    """
    6. Following, we will classify the reviews into positive and negative based on rating
    """
    print('6. Following, we will classify the reviews into positive and negative based on rating')
    
    #apply function for classification
    reviews_eng['classif'] = reviews_eng['rating'].apply(classify)
    
    print('Done')
    
    """
    7. We will start the reporting process
    """
    print('7. We will start the reporting process')
        
    #add year, month and month names into reviews df and create a yearly df
    reviews_eng, reviews_eng_yearly = manipulate_df(reviews_eng)
    
    print('Done')
    
    """
    8. Plots
    """
    print('8. Plots')
    
    """
    8.1. Wordcloud
    """
    print('8.1. Wordcloud')
    
    #run function to create a wordcloud img    
    _ = wordcloud(freq)
    
    print('Done')
    
    """
    8.2. Barplot-lineplot-> Basic graph with two y axis
    
    """
    print('8.2. Barplot-lineplot-> Basic graph with two y axis')
    
    #run function for basic chart
    _ = basic_graph(reviews_eng_yearly)
    
    print('Done')
    
    """
    9. Create the pdf including all the summaries regarding reviews
    """
    print('9. Create the pdf including all the summaries regarding reviews') 
    
    _ = create_and_export_pdf(reviews_eng,positive_aspects,negative_aspects)
    
    print('Done')
    print(' ')
    
    return print(colored('Summary PDF has been successfully created and exported', 'green'))

**Import the reviews that we have previously downloaded**

In [3]:
#find the path where the notebook is located
current_path = os.getcwd()

#find the path where the csv file (reviews) is located
path_csv_file = glob.glob(os.path.join(current_path, '*.csv'))[0]
path_csv_file

'C:\\Users\\mkarampasis\\Desktop\\Michalis\\master\\Review Summarization\\Review Summarization Karampasis\\nike airfoce 1 07 reviews.csv'

**Run function to summarize reviews and export a CSV report file**

In [4]:
#start time
start = datetime.now()

summarize(path_csv_file)

#end time
end = datetime.now()

#total execution time
execution_time = end-start
print('The total execution time was:',execution_time)

1. At first, we make a correction of dates, due to greek-text in months
Done
2. Next, we will do a text analysis in order to find the frequencies of each word of all the reviews
Done
3. After, we are going to do Aspect Mining of all the reviews using the Network Analysis method
Done
4. Following, we will do Opinion Mining
Done
5. Next, we are going to find the positive and the negative aspects
Done
6. Following, we will classify the reviews into positive and negative based on rating
Done
7. We will start the reporting process
Done
8. Plots
8.1. Wordcloud
Done
8.2. Barplot-lineplot-> Basic graph with two y axis
Done
9. Create the pdf including all the summaries regarding reviews
Done
 
[32mSummary PDF has been successfully created and exported[0m
The total execution time was: 0:00:21.601488
