# Get actionable Threat Intelligence from Twitter

__Notebook Version:__ 1.0 <br>
__Notebook Author:__ Antonio Formato<br>


__Python Version:__ >=Python 3.8<br>

__Data Source Required:__ None<br>

__GPU Compute Required:__ No<br>

__Packages Downloaded:__ 
- tweepy
- pandas


## Overview

Social media platforms allow users to communicate and share information. For security professionals, it could be more than just a networking tool. It can also be an additional source of valuable information on topics from vulnerabilities, exploits, and malware to threat actors and anomalous cyber activities. 

<p style="border: solid; padding: 5pt; color: white; background-color: Green">
This notebook allows a user to get Indicators of Compromise (IOC) from Twitter. Search is based on hashtags.
</p>

## Prerequisites
**Please do not run the notebook cells all at once**. The cells need to be run sequentially and successfully executed before proceeding with the remainder of the notebook.

## Table of Contents

1. Setup environment to scrape cybersecurity info tweets
2. Defining script logic
3. Main function
4. Organizing IOCs obtained from previous step
5. MYSTICpy to invoke Microsoft Sentinel API to work wiht IOCs
6. Create Indicators using Microsoft Sentinel API

## 1 Setup environment to scrape cybersecurity info tweets
Python script needs to install `tweepy`  and `pandas`.<br>

In [None]:
# import some modules needed in Notebook

%pip install tweepy
%pip install pandas

## 2 Defining script logic

[https://www.geeksforgeeks.org/extracting-tweets-containing-a-particular-hashtag-using-python/: ](https://www.geeksforgeeks.org/extracting-tweets-containing-a-particular-hashtag-using-python/)

In [None]:
# import relevant modules
import pandas as pd
import tweepy

In [None]:
# Extract tweets based on Hashtag

# function - tweet data
def print_tweets(n, n_tweet):
		print()
		print(f"Tweet {n}:")
		print(f"Username:{n_tweet[0]}")
		print(f"Description:{n_tweet[1]}")
		print(f"Tweet Text:{n_tweet[2]}")
		print(f"Hashtags Used:{n_tweet[3]}")

# function - perform data extraction
def scrape(words, date_since, n_tweet):

		# Creating DataFrame using pandas
		db = pd.DataFrame(columns=['username',
								'description',
								'text',
								'hashtags'])

		# We are using .Cursor() to search through twitter for the required tweets.
		# The number of tweets can be restricted using .items(number of tweets)
		tweets = tweepy.Cursor(api.search_tweets,
							words, lang="en",
							since_id=date_since,
							tweet_mode='extended').items(n_tweet)

		# .Cursor() returns an iterable object. Each item in the iterator has various attributes that you can access to get information about each tweet
		list_tweets = [tweet for tweet in tweets]

		# Counter to maintain Tweet Count
		i = 1

		# we will iterate over each tweet in the list for extracting information about each tweet
		for tweet in list_tweets:
				username = tweet.user.screen_name
				description = tweet.user.description
				hashtags = tweet.entities['hashtags']

				# Retweets can be distinguished by a retweeted_status attribute, in case it is an invalid reference, except block will be executed
				try:
						text = tweet.retweeted_status.full_text
				except AttributeError:
						text = tweet.full_text
				hashtext = list()
				for j in range(0, len(hashtags)):
						hashtext.append(hashtags[j]['text'])

				# Here we are appending all the extracted information in the DataFrame
				n_tweet = [username, description,
                           text, hashtext]
				db.loc[len(db)] = n_tweet

				# print tweet data
				print_tweets(i, n_tweet)
				i = i+1
		filename = 'scraped_tweets.csv'

		#CSV file.
		db.to_csv(filename)

## 3 Main function
Secret can be secured using Azure Key Vault.<br>
Note: This could be an enahchment for future developments.

In [None]:
if __name__ == '__main__':

		#twitter dev account credentials
		consumer_key = "CONSUMER KEY TWITTER DEV ACCOUNT"
		consumer_secret = "CONSUMER SECRET TWITTER DEV ACCOUNT"
		access_key = "ACCESS KEY TWITTER DEV ACCOUNT"
		access_secret = "ACCESS SECRET TWITTER DEV ACCOUNT"

		auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
		auth.set_access_token(access_key, access_secret)
		api = tweepy.API(auth)

		# Enter Hashtag and initial date
		print("Enter Twitter HashTag")
		words = input()
		print("Enter Date yyyy-mm-dd")
		date_since = input()

		# number of tweets you want to extract in one run
		n_tweet = 100
		scrape(words, date_since, n_tweet)
		print('Scraping has completed!')

## 4 Organizing IOCs obtained from previous step

Organizing the collected information is a crucial step in the process of extracting insights from Twitter data. Regex, or Regular Expressions, is a powerful tool for searching, matching, and manipulating text. In this context, it can be used to extract domain information from tweets. This information can then be used to write indicators in a threat intelligence table using the Microsoft Sentinel API.<br>

Using the Microsoft Sentinel API, you can use `mystipy` library to write the indicators to the threat intelligence table. This will allow you to store and categorize the information in a structured manner, making it easier to analyze and track any potential security threats. The indicators could include information such as domain names, IP addresses, file hashes, or other relevant data. This information can then be used for security monitoring, threat detection, and incident response.<br>
By organizing the information in a threat intelligence table, you can create a centralized repository of information that can be used to support various security use cases. This will enable you to quickly access the information and use it to make informed decisions in the event of a security incident.


In [None]:
# import modules
import csv
import re

# set empty lists
urls = []

# open input file
with open('scraped_tweets.csv') as file:
    # skip header
    next(file)
    # loop over the lines
    for line in file:
        # extract the url
        u = re.findall('[a-z0-9][a-z0-9\-]{0,61}[a-z0-9]\[\.[a-z0-9][a-z0-9\-]*', line)
        # add the url to the list
        urls.extend(u)
        print(urls)
        with open('malicious_domains.csv', 'w') as csvfile:
             writer = csv.writer(csvfile)
             writer.writerow(['Domains'])
             #write it to the csv
             writer.writerow([urls])

In [None]:
import csv

# Open the CSV file
with open("malicious_domains.csv", "r") as file:
    # Create a CSV reader object
    reader = csv.reader(file)

    # Loop through the rows in the file
    for row in reader:
        # Print each row
        print(row)

## 5 MYSTICpy to invoke Microsoft Sentinel API to work wiht IOCs
**Connect to Sentinel via API provided by MISTIpy**

Following code imports several modules from the msticpy library:

1. `MicrosoftSentine` from `msticpy.context.azure.sentinel_core` - This class provides the ability to interact with the Microsoft Sentinel security platform, including writing data to the Sentinel Workspace.
2. `widgets` from `msticpy.nbwidgets` - This module provides a collection of Jupyter Notebook widgets that can be used for interactive data exploration and visualization
3. `data_obfus` from `msticpy.data` - This module provides functions for obscuring or masking sensitive data, such as IP addresses, URLs, and file hashes. The imported variable mask is an alias for the module.

`msticpy` is a python library developed by Microsoft for security investigation and hunting in Microsot Sentinel and other security data sources. The library provides a collection of tools and functions that simplify the process of working with security data, making it easier to extract meaningful insights and respond to security incidents.

In [None]:
from msticpy.context.azure.sentinel_core import MicrosoftSentinel
import msticpy.nbwidgets as widgets
from msticpy.data import data_obfus as mask

**Initializing a connection to the Microsoft Sentinel **

This code creates an instance of the MicrosoftSentinel class, with the following parameters:
- `sub_id`: The ID of the Azure Subscription that contains the Sentinel Workspace.
- `res_grp`: The name of the Azure Resource Group that contains the Sentinel Workspace.
- `ws_name`: The name of the Sentinel Workspace.

Once the instance is created, the `connect()` method is called to establish a connection to the specified Sentinel Workspace. The MicrosoftSentinel class provides a convenient way to interact with the Sentinel security platform, including reading and writing data to the workspace. 

In [None]:
azs = MicrosoftSentinel(sub_id="MICROSOFT SENTINEL SUBSCRIPTION ID",
    res_grp="MICROSOFT SENTINEL RESOURCE GROUP",
    ws_name="MICROSOFT SENTINEL ISANCE NAME")
azs.connect()

Getting all indicators from **Microsoft Sentinel Threat Intelligence** data ingested.<br>
This code is calling the `get_all_indicators()` method on the MicrosoftSentinel instance created in the previous code. The `get_all_indicators()` method retrieves all of the indicators stored in the Sentinel Workspace associated with the instance.

In [None]:
azs.get_all_indicators()

## 6 Create Indicators using Microsoft Sentinel API

By using the `Text` widget to input the indicator and the create_indicator method to write the indicator to the Sentinel Workspace, you can create a user-friendly interface for writing threat intelligence data to the Sentinel Workspace. This can make it easier for security analysts to quickly and accurately input and store threat intelligence data, without having to manually enter data into the Sentinel Workspace.<br>
<br>
After creating the Text widget, the code then calls the create_indicator method on the MicrosoftSentinel instance.<br>

The create_indicator method takes two parameters:

- `indicator`: The text that is entered in the text field, which represents the indicator that should be written to the Sentinel Workspace.
- `ioc_type`: The type of indicator being written, specified as "url".

In [None]:
import ipywidgets as widgets
from ipywidgets import Button, Layout, Checkbox
createindicator = widgets.Text(
    value = "",
    placeholder = 'Enter the TEXT',
    description = 'Test:',
    layout = Layout(width='90%', height='40px'),
    disabled = False
)
display(createindicator)

In [None]:
azs.create_indicator(indicator=createindicator.value, ioc_type="url")

Invoking `query_indicators` to get indicator created in previous step.

In [None]:
azs.query_indicators(keywords = "mail-client.software")

I hope you have **enjoyed** using this **Jupyter Notebook**.<br> 
I believe that it can be a valuable tool for managing and analyzing threat intelligence data.<br> 
If you found this notebook useful, I invite you to obtain the latest version from Github and to contribute to its development.<br> 

<p style="border: solid; padding: 5pt; color: white; background-color: Green">
Your feedback and contributions can help make this notebook even more useful for the security community.
</p>