Home

Development Setup
Running the code
Terminologies
Fawkes Config
- Top Level Configs
- Channel Related Configs
Fawkes Pipeline
CircleCI Integration
Glossary

Development Setup

Install python (version >=3.7)
- Use pyenv to manage versions of python
Clone the fawkes repository and change your current directory to the root of the repository
Create a virtual environment with python
- python -m venv env
Activate the virtual environment
- ./env/bin/activate
- source env/bin/activate
After virtual environment has been activated, install the required packages.
- python -m pip install -r requirements.txt

Running the code

For running all the code, a cli.py has been exposed. All available actions in fawkes are exposed via the cli.py.
To see all the available commands:
- python fawkes/cli/cli.py -h
Example of parsing the already fetched user reviews.
- python fawkes/cli/cli.py parse

Terminologies

Fawkes is primarily designed around user reviews and the analysis on the user reviews. The terminologies therefore revolve around that. That being said, Fawkes can be used on any text based data set which require sentiment analysis, categorization or summarization.

Review

Any piece of feedback that the user leaves behind is what we call as a review. The most basic form of a review is a JSON object with a message and a time_stamp

Below is an example of how a user review might look like. Here the field content is the message and updated is the time_stamp

{
    "updated": "2020-03-15 14:13:17",
    "rating": 5,
    "version": "7.1.0",
    "content": "I just heard about this budgeting app. So I gave it a try. I am impressed thus far. However I still can\u00e2\u20ac\u2122t add all of my financial institutions so my budget is kind of skewed. But other that I can say I\u00e2\u20ac\u2122m more aware of my spending"
}

A review can have as many additional fields and Fawkes doesn't care much about them.

NOTE: The additional fields are however preserved throughout the life cycle of a review so that the data is not lost.

message - Any string
time_stamp - Any date-time or date object. Supported formats include:
- ISO Date Time Format
rating (Optional) - Number/Decimal

Channel

A channel is any source from where a bunch of user review's can be obtained. Most channels integrated into Fawkes have an API endpoint exposed through which data is fetched. The currently supported channel in Fawkes are:

App. Store
Play Store
- Reviews: List API
Twitter
- Twitter Search API
Salesforce
Splunk
Raw CSV files (No API required)
Raw JSON files (No API required)

Sentiment

Sentiment of a review tells us what was the user emotion which is embedded in text. We use the popular Natural Language Processing library, NLTK Vader to do the sentiment analysis.

Sentiment output returned looks like this:

{
    "neg": 0.0,
    "neu": 0.928,
    "pos": 0.072,
    "compound": 0.4767
}

The compound value tells us the overall sentiment.

Compound > 0 means Positive Review 😊
Compound < 0 means Negative Review 🙁
Compound = 0 means Neutral Review 😐

Parsed Review

Fawkes starts with a review in the raw format and then goes through a series of transformations. The first step is to parse the user review and convert it to a single class object of type (Review).

Processed Review

After a review is parsed and converted to Review its ready to be run through different algorithms like:

Sentiment analysis
Categorization
Summarization

All the algorithms run only on the parsed-data.

Fawkes Config

Fawkes requires a configuration file to tell it about the different levers which can be configured. Below is the list of all configuration items.

Kindly look at the Sample Mint Config File

Fawkes Pipeline

Fetching Data

python fawkes/cli/cli.py fetch

Parsing Data

python fawkes/cli/cli.py parse

Parsing converts the raw data to a single format consumed by all further steps. We call it Review. See the below diff to understand what happens in the parsing step:

data/raw_data/sample-mint/appstore-raw-feedback.json is the raw data
data/parsed_data/sample-mint/parsed-user-feedback.json is the parsed output

--- a/data/raw_data/sample-mint/appstore-raw-feedback.json
+++ b/data/parsed_data/sample-mint/parsed-user-feedback.json
@@ -1,8 +1,16 @@
 [
     {
-        "updated": "2020-03-15 14:13:17",
+        "message": "I just heard about this budgeting app. So I gave it a try. I am impressed thus far. However I still cant add all of my financial institutions so my budget is kind of skewed. But other that I can say Im more aware of my spending",
+        "timestamp": "2020/03/15 14:13:17",
         "rating": 5,
-        "version": "7.1.0",
-        "content": "I just heard about this budgeting app. So I gave it a try. I am impressed thus far. However I still can\u00e2\u20ac\u2122t add all of my financial institutions so my budget is kind of skewed. But other that I can say I\u00e2\u20ac\u2122m more aware of my spending"
+        "app_name": "sample-mint",
+        "channel_name": "appstore",
+        "channel_type": "ios",
+        "hash_id": "de848685d11742dbea77e1e5ad7b892088ada9c9",
+        "derived_insight": {
+            "sentiment": null,
+            "category": "uncategorized",
+            "extra_properties": {}
+        }
     }
 ]

Things to note:

message and time_stamp have been added. This is the most important thing.
Since the review came from App. Store, the channel_type key has been added
A unique hash of (message + timestamp) has been added

Run algorithms

python fawkes/cli/cli.py run.algo

Post Parsing, one can run a number of algorithms in Fawkes. The 2 which run by default are:

Sentiment Analysis
Categorization

See the below diff to understand what happens in the algorithms step:

--- a/data/parsed_data/sample-mint/parsed-user-feedback.json
+++ b/data/processed_data/sample-mint/processed-user-feedback.json
@@ -6,11 +6,25 @@
         "app_name": "sample-mint",
         "channel_name": "appstore",
         "channel_type": "ios",
-        "hash_id": "de848685d11742dbea77e1e5ad7b892088ada9c9",
+        "hash_id": "6dde3aa82726c0a9e3777623854d839184767571",
         "derived_insight": {
-            "sentiment": null,
-            "category": "uncategorized",
-            "extra_properties": {}
+            "sentiment": {
+                "neg": 0.0,
+                "neu": 0.928,
+                "pos": 0.072,
+                "compound": 0.4767
+            },
+            "category": "Application",
+            "extra_properties": {
+                "category_scores": {
+                    "User Experience": 0,
+                    "sign-in/sign-up": 0,
+                    "Notification": 0,
+                    "Application": 1,
+                    "ads": 0
+                },
+                "bug_feature": "feature"
+            }
         }
     }
 ]

Things to note:

sentiment has been added
category has been added
- The score of the review against each category also is present
The review has been classified as a bug/feature/user-experience

Configuring Categorization

Fawkes provides categorization in 2 variants.

Text Match
Deep Learning based Classification

Text Match

Text Match uses keywords to determine which review gets categorized into which category. See the category-keywords.json of how a keywords file looks like for a generic application.

Add your own categories and their related keywords
Add the file name to the algorithm_config.category_keywords_file
Run the script:

python fawkes/cli/cli.py generate.text_match.keywords

It will generate the app/category-keywords-weights.json which has the weight associated with each of the keywords for each category

Now you are ready to have your reviews to your custom categories.

Deep Learning based Classification

The problem with user reviews is that its incredibly difficult to get labelled data. Text Match is an easy way to generate labelled data. Once we have enough labelled data, we can use fawkes/algorithms/categorisation/lstm/trainer.py

The module lstm_classifier can be used to train data using the LSTM's. Use multi-class-text-classification-with-lstm-using-tensorflow as a reference.

To use the trained models from the above step, modify the algorithm_config.categorization_algorithm in the config file.

Storing Data

python fawkes/cli/cli.py push.elasticsearch

After parsing and running algorithms, all the data is pushed to Elastic Search for advanced searching and indexing capabilities.

Data Viz

For visualizing and running queries on the data we use Kibana.

For easier onboarding of Fawkes we are working a pre-built dashboard that you can import
- Issue: https://github.com/intuit/fawkes/issues/7
We are also working on creating a live dashboard with sample data
- Issue: https://github.com/intuit/fawkes/issues/6

CircleCI Integration

Glossary

Quickstart

Terminologies
Fawkes Config
- Top Level Configs
- Channel Related Configs
Fawkes Pipeline
CircleCI Integration
Glossary

Provide feedback

Saved searches

Use saved searches to filter your results more quickly