# Twitter and local news data in Malaysia-AI Elasticsearch

As you know, Malaysia-AI got a cloud ElasticSearch, protected by Github Auth at https://elasticsearch.malaysiaai.ml/, and recently we deployed Tweepy streaming and local news crawler at https://github.com/malaysia-ai/crawlers. If you are member of Malaysia-AI, you should able to access that repository.

In [1]:
from elasticsearch import Elasticsearch
from elasticsearch_dsl import Search
from pycookiecheat import chrome_cookies
import requests

## ElasticSearch

To access ElasticSearch from anywhere, 

1. Install dependencies,

```bash
pip3 install requests pycookiecheat elasticsearch elasticsearch-dsl
```

2. You need to login https://elasticsearch.malaysiaai.ml using your Chrome.

3. Get the cookies.

In [2]:
url = 'https://elasticsearch.malaysiaai.ml'
cookies = chrome_cookies(url)

In [3]:
requests.get(url, cookies = cookies).json()

{'name': 'huseincomel-desktop',
 'cluster_name': 'elasticsearch',
 'cluster_uuid': 'WLWwNUhcTAaU7BGpsKwNkA',
 'version': {'number': '7.17.0',
  'build_flavor': 'default',
  'build_type': 'deb',
  'build_hash': 'bee86328705acaa9a6daede7140defd4d9ec56bd',
  'build_date': '2022-01-28T08:36:04.875279988Z',
  'build_snapshot': False,
  'lucene_version': '8.11.1',
  'minimum_wire_compatibility_version': '6.8.0',
  'minimum_index_compatibility_version': '6.0.0-beta1'},
 'tagline': 'You Know, for Search'}

If you are using https://jupyterhub.malaysiaai.ml/, ElasticSearch available at http://localhost:9200

## Twitter

Simply query https://elasticsearch.malaysiaai.ml/twitter_streaming/_search

In [4]:
url = 'https://elasticsearch.malaysiaai.ml/twitter_streaming/_search'
requests.get(url, cookies = cookies).json()

{'took': 1,
 'timed_out': False,
 '_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0},
 'hits': {'total': {'value': 10000, 'relation': 'gte'},
  'max_score': 1.0,
  'hits': [{'_index': 'twitter_streaming',
    '_type': '_doc',
    '_id': '8RHINoABg_n2UR6_b-gj',
    '_score': 1.0,
    '_source': {'datetime': '2022-04-17T09:08:21',
     'datetime_gmt8': '2022-04-17T17:08:21',
     'data_text': 'seokjin got shy after hobi praising hin https://t.co/MQSa0oGJnW',
     'body': 'RT @jinniesarchives: seokjin got shy after hobi praising hin https://t.co/MQSa0oGJnW',
     'screen_name': 'ultskivecart',
     'followers_count': 123,
     'friends_count': 85,
     'listed_count': 0,
     'favourites_count': 2846,
     'statuses_count': 2181,
     'quoted_status_text': 'NULL',
     'lang': 'in',
     'retweet': 'true',
     'retweet_text': 'seokjin got shy after hobi praising hin https://t.co/MQSa0oGJnW',
     'retweet_text_full': 'NULL',
     'retweet_count': 0,
     'retweet_detail'

In [5]:
url = 'https://elasticsearch.malaysiaai.ml/twitter_streaming/_count'
requests.get(url, cookies = cookies).json()

{'count': 172000,
 '_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0}}

## News

Simply query https://elasticsearch.malaysiaai.ml/news_v2/_search

In [6]:
url = 'https://elasticsearch.malaysiaai.ml/news_v2/_search'
requests.get(url, cookies = cookies).json()

{'took': 462,
 'timed_out': False,
 '_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0},
 'hits': {'total': {'value': 408, 'relation': 'eq'},
  'max_score': 1.0,
  'hits': [{'_index': 'news_v2',
    '_type': '_doc',
    '_id': 'https://www.theborneopost.com/2022/04/17/debate-between-anwar-and-najib-on-may-12/',
    '_score': 1.0,
    '_ignored': ['extractnet.description.keyword',
     'extractnet.content.keyword'],
    '_source': {'title': 'Debate between Anwar and Najib on May 12',
     'media': '',
     'date': '8 hours ago',
     'datetime': '2022-04-17T15:17:40.338691',
     'desc': 'KUALA LUMPUR (April 17): The debate between PKR president Datuk Seri Anwar \nIbrahim and former Prime Minister Datuk Seri Najib Tun Razak will be held \non May...',
     'link': 'https://www.theborneopost.com/2022/04/17/debate-between-anwar-and-najib-on-may-12/',
     'img': 'data:image/gif;base64,R0lGODlhAQABAIAAAP///////yH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==',
     'extractnet': {'conten

In [9]:
url = 'https://elasticsearch.malaysiaai.ml/news_v2/_count'
requests.get(url, cookies = cookies).json()

{'count': 408,
 '_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0}}

## simple alerts

We also deployed alerts at Grafana Cloud using PromQL, https://huseinzol05.grafana.net/d/BHvsNRwnz/crawler-alerts?orgId=1

You can check [prometheus.yaml](https://github.com/malaysia-ai/alerts/blob/main/prometheus.yaml).

Feel free to contact husein.zol05@gmail.com to get access for Grafana Cloud.

![](artifacts/alert.png)

Any graphs goes down below than 1, it will send an alert at Discord,

![](artifacts/alert-discord.png)