-
Notifications
You must be signed in to change notification settings - Fork 34
Home
- Development Setup
- Running the code
- Terminologies
- Fawkes Config
- Fawkes Pipeline
- CircleCI Integration
- Glossary
- Install python (version >=3.7)
- Use pyenv to manage versions of python
- Clone the fawkes repository and change your current directory to the root of the repository
- Create a virtual environment with python
- Activate the virtual environment
./env/bin/activate
source env/bin/activate
- After virtual environment has been activated, install the required packages.
- For running all the code, a cli.py has been exposed. All available actions in fawkes are exposed via the cli.py.
- To see all the available commands:
- Example of parsing the already fetched user reviews.
Fawkes is primarily designed around user reviews and the analysis on the user reviews. The terminologies therefore revolve around that. That being said, Fawkes can be used on any text based data set which require sentiment analysis, categorization or summarization.
Any piece of feedback that the user leaves behind is what we call as a review. The most basic form of a review is a JSON object with a message
and a time_stamp
Below is an example of how a user review might look like. Here the field content is the message
and updated is the time_stamp
{
"updated": "2020-03-15 14:13:17",
"rating": 5,
"version": "7.1.0",
"content": "I just heard about this budgeting app. So I gave it a try. I am impressed thus far. However I still can\u00e2\u20ac\u2122t add all of my financial institutions so my budget is kind of skewed. But other that I can say I\u00e2\u20ac\u2122m more aware of my spending"
}
A review can have as many additional fields and Fawkes doesn't care much about them.
NOTE: The additional fields are however preserved throughout the life cycle of a review so that the data is not lost.
-
message
- Any string -
time_stamp
- Any date-time or date object. Supported formats include: -
rating
(Optional) - Number/Decimal
A channel is any source from where a bunch of user review's can be obtained. Most channels integrated into Fawkes have an API endpoint exposed through which data is fetched. The currently supported channel in Fawkes are:
- App. Store
- Play Store
- Salesforce
- Splunk
- Raw CSV files (No API required)
- Raw JSON files (No API required)
Sentiment of a review tells us what was the user emotion which is embedded in text. We use the popular Natural Language Processing library, NLTK Vader to do the sentiment analysis.
Sentiment output returned looks like this:
{
"neg": 0.0,
"neu": 0.928,
"pos": 0.072,
"compound": 0.4767
}
The compound value tells us the overall sentiment.
- Compound > 0 means Positive Review π
- Compound < 0 means Negative Review π
- Compound = 0 means Neutral Review π
A review can be classified/bucketed/grouped into different categories based on what the user is talking about. This is one of the core elements of analyzing user reviews as it helps to get insights on what pain points the users have and the current trend of the application itself.
Fawkes starts with a review in the raw format and then goes through a series of transformations. The first step is to parse the user review and convert it to a single class object of type (Review
).
After a review is parsed and converted to Review
its ready to be run through different algorithms like:
- Sentiment analysis
- Categorization
- Summarization
All the algorithms run only on the parsed-data.
Fawkes requires a configuration file to tell it about the different levers which can be configured. Below is the list of all configuration items.
Kindly look at the Sample Mint Config File
python fawkes/cli/cli.py fetch
python fawkes/cli/cli.py parse
Parsing converts the raw data to a single format consumed by all further steps. We call it Review
. See the below diff
to understand what happens in the parsing step:
-
data/raw_data/sample-mint/appstore-raw-feedback.json
is the raw data -
data/parsed_data/sample-mint/parsed-user-feedback.json
is the parsed output
--- a/data/raw_data/sample-mint/appstore-raw-feedback.json
+++ b/data/parsed_data/sample-mint/parsed-user-feedback.json
@@ -1,8 +1,16 @@
[
{
- "updated": "2020-03-15 14:13:17",
+ "message": "I just heard about this budgeting app. So I gave it a try. I am impressed thus far. However I still cant add all of my financial institutions so my budget is kind of skewed. But other that I can say Im more aware of my spending",
+ "timestamp": "2020/03/15 14:13:17",
"rating": 5,
- "version": "7.1.0",
- "content": "I just heard about this budgeting app. So I gave it a try. I am impressed thus far. However I still can\u00e2\u20ac\u2122t add all of my financial institutions so my budget is kind of skewed. But other that I can say I\u00e2\u20ac\u2122m more aware of my spending"
+ "app_name": "sample-mint",
+ "channel_name": "appstore",
+ "channel_type": "ios",
+ "hash_id": "de848685d11742dbea77e1e5ad7b892088ada9c9",
+ "derived_insight": {
+ "sentiment": null,
+ "category": "uncategorized",
+ "extra_properties": {}
+ }
}
]
Things to note:
-
message
andtime_stamp
have been added. This is the most important thing. - Since the review came from App. Store, the
channel_type
key has been added - A unique hash of (message + timestamp) has been added
python fawkes/cli/cli.py run.algo
Post Parsing, one can run a number of algorithms in Fawkes. The 2 which run by default are:
- Sentiment Analysis
- Categorization
See the below diff to understand what happens in the algorithms step:
--- a/data/parsed_data/sample-mint/parsed-user-feedback.json
+++ b/data/processed_data/sample-mint/processed-user-feedback.json
@@ -6,11 +6,25 @@
"app_name": "sample-mint",
"channel_name": "appstore",
"channel_type": "ios",
- "hash_id": "de848685d11742dbea77e1e5ad7b892088ada9c9",
+ "hash_id": "6dde3aa82726c0a9e3777623854d839184767571",
"derived_insight": {
- "sentiment": null,
- "category": "uncategorized",
- "extra_properties": {}
+ "sentiment": {
+ "neg": 0.0,
+ "neu": 0.928,
+ "pos": 0.072,
+ "compound": 0.4767
+ },
+ "category": "Application",
+ "extra_properties": {
+ "category_scores": {
+ "User Experience": 0,
+ "sign-in/sign-up": 0,
+ "Notification": 0,
+ "Application": 1,
+ "ads": 0
+ },
+ "bug_feature": "feature"
+ }
}
}
]
Things to note:
- sentiment has been added
-
category has been added
- The score of the review against each category also is present
- The review has been classified as a bug/feature/user-experience
Fawkes provides categorization in 2 variants.
Text Match uses keywords to determine which review gets categorized into which category. See the category-keywords.json
of how a keywords file looks like for a generic application.
- Add your own categories and their related keywords
- Add the file name to the
algorithm_config.category_keywords_file
- Run the script:
python fawkes/cli/cli.py generate.text_match.keywords
- It will generate the app/category-keywords-weights.json which has the weight associated with each of the keywords for each category
Now you are ready to have your reviews to your custom categories.
The problem with user reviews is that its incredibly difficult to get labelled data. Text Match is an easy way to generate labelled data. Once we have enough labelled data, we can use fawkes/algorithms/categorisation/lstm/trainer.py
The module lstm_classifier can be used to train data using the LSTM's. Use multi-class-text-classification-with-lstm-using-tensorflow as a reference.
To use the trained models from the above step, modify the algorithm_config.categorization_algorithm
in the config file.
python fawkes/cli/cli.py push.elasticsearch
After parsing and running algorithms, all the data is pushed to Elastic Search for advanced searching and indexing capabilities.
For visualizing and running queries on the data we use Kibana.
- For easier onboarding of Fawkes we are working a pre-built dashboard that you can import
- We are also working on creating a live dashboard with sample data