<img src= "https://github.com/skappal7/Sunil_Kappal_Portfolio/blob/main/Images/SnapChat%20Review%20Analysis.png?raw=true" alt ="SnapChat Analysis" style='width: 1200px;'>

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

# What this notebook is all about?

This tutorial cum analysis assumes that you are new to PyCaret and looking to perform some basic Natural Language Processing using pycaret.nlp Module.

This tutorial will help you to understand:

- **Getting Data:** How to import data from PyCaret repository?
- **Setting up Environment:** How to setup environment in PyCaret and perform critical text pre-processing tasks?
- **Create Model:** How to create a topic model?
- **Assign Model:** How to assign documents/text to topics using a trained model?
- **Plot Model:** How to analyze topic models / overall corpus using various plots?
- **Save / Load Model:** How to save / load model for future use?

# Getting Started

**Installing the PyCaret library**

In [None]:
pip install pycaret[full]

**Installing other important libraries**

In [None]:
import pandas as pd
import numpy as np

**Let's get the data!**

In [None]:
data=pd.read_csv('/kaggle/input/10k-snapchat-reviews/Snapchat_app_store_reviews.csv')

data.head()

**Let's sample the data**

In [None]:
data = data.sample(1000, random_state=786).reset_index(drop=True)
data.shape

# Let's get the NLP fired upðŸš€

In [None]:
from pycaret.nlp import *

In [None]:
SnapC_1 = setup(data = data, target = 'review', session_id = 123)

Once the setup is succesfully executed it prints the information grid with the following information:

- session_id : A pseduo-random number distributed as a seed in all functions for later reproducibility. If no session_id is passed, a random number is automatically generated that is distributed to all functions. In this experiment session_id is set as 123 for later reproducibility.
- Documents : Number of documents (or samples in dataset if dataframe is passed).
- Vocab Size : Size of vocabulary in the corpus after applying all text pre-processing such as removal of stopwords, bigram/trigram extraction, lemmatization etc.

**Notice that all text pre-processing steps are performed automatically when you execute setup().**

These steps are imperative to perform any NLP experiment. setup() 

*Source: PyCaret*

**convert 'review' column of dataset into list format**

In [None]:
text_list = list(data['review'])
type(text_list)

In [None]:
SnapC_1_list = setup(data = text_list, session_id = 123)

# Let's Create a Topic Model

**What is Topic Model?**
In machine learning and natural language processing, a topic model is a type of statistical model for discovering the abstract "topics" that occur in a collection of documents. Topic modeling is a frequently used text-mining tool for discovery of hidden semantic structures in a text body. 

Creating a topic model in PyCaret is simple. A topic model is created using create_model() function which takes one mandatory parameter i.e. name of model as a string. This function returns a trained model object. There are 5 topic models available in PyCaret. see the docstring of create_model() for complete list of models. See an example below where we create Latent Dirichlet Allocation (LDA) model:

In [None]:
lda = create_model('lda')

In [None]:
print(lda)

In [None]:
lda_results = assign_model(lda)
lda_results.head()

In [None]:
lda2 = create_model('lda', num_topics = 6, multi_core = True)

In [None]:
print(lda2)

In [None]:
lda_results = assign_model(lda)
lda_results.head()

#  Frequency Distribution of Reviews

In [None]:
plot_model()

# Top 100 Bigrams on Reviews

In [None]:
plot_model(plot = 'bigram')

# Frequency Distribution of Topic 1

In [None]:
plot_model(lda, plot = 'frequency', topic_num = 'Topic 1')

# Topic Distribution

In [None]:
plot_model(lda, plot = 'topic_distribution')

# T-distributed Stochastic Neighbor Embedding (t-SNE)

In [None]:
plot_model(lda, plot = 'tsne')

**What is t-SNE?**

T-distributed Stochastic Neighbor Embedding (t-SNE) is a nonlinear dimensionality reduction technique well-suited for embedding high-dimensional data for visualization in a low-dimensional space of two or three dimensions.

# Uniform Manifold Approximation and Projection Plot

In [None]:
plot_model(lda, plot = 'umap')

**What is Uniform Manifold Approximation and Projection ?**

UMAP (Uniform Manifold Approximation and Projection) is a novel manifold learning technique for dimensionality reduction. It is similar to tSNE and PCA in its purpose as all of them are techniques to reduce dimensionality for 2d/3d projections. UMAP is constructed from a theoretical framework based in Riemannian geometry and algebraic topology.

# Let's Evaluate the Model

In [None]:
evaluate_model(lda)

# Saving the model

In [None]:
save_model(lda,'Final LDA Model 03Jun2021')

# Loading the Model

In [None]:
saved_lda = load_model('Final LDA Model 03Jun2021')

In [None]:
print(saved_lda)