<img width="10%" alt="Naas" src="https://landen.imgix.net/jtci2pxwjczr/assets/5ice39g4.png?w=160"/>


# Obsei - Google News sentiment analysis
<a href="https://app.naas.ai/user-redirect/naas/downloader?url=https://raw.githubusercontent.com/jupyter-naas/awesome-notebooks/master/template.ipynb" target="_parent">
<img src="https://img.shields.io/badge/-Open%20in%20Naas-success?labelColor=000000&logo="/>
</a>

**Tags:** #obsei #googlenews #sentimentanalysis

# Input

## Install package

In [1]:
!pip install git+https://github.com/lalitpagaria/obsei.git

Collecting git+https://github.com/lalitpagaria/obsei.git
  Cloning https://github.com/lalitpagaria/obsei.git to /tmp/pip-req-build-38wcxs7q
  Running command git clone --filter=blob:none -q https://github.com/lalitpagaria/obsei.git /tmp/pip-req-build-38wcxs7q
  Resolved https://github.com/lalitpagaria/obsei.git to commit e2c01d4beb36c2f25d177fe4a79bb194f9305852
  Preparing metadata (setup.py) ... [?25ldone
[?25hCollecting app-store-reviews-reader==1.2
  Using cached app_store_reviews_reader-1.2-py3-none-any.whl (8.4 kB)
Collecting atlassian-python-api==3.14.1
  Using cached atlassian_python_api-3.14.1-py3-none-any.whl
Collecting blis==0.7.5
  Using cached blis-0.7.5-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (9.9 MB)
Collecting catalogue==2.0.6
  Using cached catalogue-2.0.6-py3-none-any.whl (17 kB)
Collecting certifi==2021.10.8
  Using cached certifi-2021.10.8-py2.py3-none-any.whl (149 kB)
Collecting charset-normalizer==2.0.9
  Using cached charset_normalizer-2.0.9-py3

## Import libraries

In [2]:
# Importing Google News observer
from obsei.source.google_news_source import GoogleNewsConfig, GoogleNewsSource

# Preprocessing and clean the review text 
from obsei.preprocessor.text_cleaner import TextCleaner, TextCleanerConfig
from obsei.preprocessor.text_cleaning_function import *

# Classification Analyzer
from obsei.analyzer.classification_analyzer import ClassificationAnalyzerConfig, ZeroShotClassificationAnalyzer
from obsei.postprocessor.inference_aggregator import InferenceAggregatorConfig
from obsei.postprocessor.inference_aggregator_function import ClassificationMaxCategories
from obsei.preprocessor.text_splitter import TextSplitterConfig


ModuleNotFoundError: No module named 'obsei'

## Parameters


In [4]:
# Variables for Google News observer
QUREY = "bitcoin"
MAX_RESULTS = 10
FETCH_ARTICLES = True
LOOKUP_PERIOD = "1d"

# Variables for the classification analyzer 

LABELS=["buy", "sell", "going up", "going down"]
USE_SPLITTER_AND_AGGREGATOR=True
MAX_SPLIT_LENGTH=300
SPLIT_STRIDE=3
SCORE_THRESHOLD=0.3                           # Determines the minimum probability required to take a class into consideration
MODEL_NAME="typeform/mobilebert-uncased-mnli" # Other models can be choose from https://huggingface.co/models?pipeline_tag=zero-shot-classification,
DEVICE="auto"



# Model


## Configure Google News observer

In [None]:
source_config = GoogleNewsConfig(
    query=QUREY,
    max_results=MAX_RESULTS,
    fetch_article= FETCH_ARTICLES,
    lookup_period = LOOKUP_PERIOD,
)

source = GoogleNewsSource()

## Configure TextCleaner as Pre-Processor to clean review text

In [None]:
text_cleaner_config = TextCleanerConfig(
    cleaning_functions = [
        ToLowerCase(),
        RemoveWhiteSpaceAndEmptyToken(),
        RemovePunctuation(),
        RemoveSpecialChars(),
        DecodeUnicode(),
        RemoveStopWords(),
        RemoveWhiteSpaceAndEmptyToken(),
   ]
)

text_cleaner = TextCleaner()

## Configure Classification Analyzer

In [None]:
analyzer_config=ClassificationAnalyzerConfig(
   labels=LABELS,
   use_splitter_and_aggregator=USE_SPLITTER_AND_AGGREGATOR,
   splitter_config=TextSplitterConfig(
       max_split_length=MAX_SPLIT_LENGTH,
       split_stride=SPLIT_STRIDE
   ),
   aggregator_config=InferenceAggregatorConfig(
       aggregate_function=ClassificationMaxCategories(
           score_threshold=SCORE_THRESHOLD
       )
   )
)

text_analyzer = ZeroShotClassificationAnalyzer(
   model_name_or_path=MODEL_NAME,
   device=DEVICE
)

## Function

## Search and fetch news article

In [None]:
source_response_list = source.lookup(source_config)

## Pre-process text to clean it

In [None]:
cleaner_response_list = text_cleaner.preprocess_input(
    input_list=source_response_list,
    config=text_cleaner_config
)

## Analyze article to peform classification

In [None]:
analyzer_response_list = text_analyzer.analyze_input(
    source_response_list=cleaner_response_list,
    analyzer_config=analyzer_config
)

# Output

## Displaying Results


In [5]:
# Priniting the results
for analyzer_response in analyzer_response_list:
  print(vars(analyzer_response))

NameError: name 'analyzer_response_list' is not defined