Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
Adds reference iPython Notebooks
Based on a lot of discussions I’ve had with folks using ThreatExchange,
there’s an interest in tools that make that first sharing of data or an
initial data analysis easier.

This PR adds two ipynb files to perform these common tasks: sharing
data and making sense of the data that you’re able to see.

Happy to share more or build out notebooks that answer other questions
people are looking to solve!
  • Loading branch information
hammem committed Mar 18, 2016
1 parent 89956ee commit f6baca2
Show file tree
Hide file tree
Showing 3 changed files with 785 additions and 0 deletions.
274 changes: 274 additions & 0 deletions ipynb/Getting Started with Sharing.ipynb
@@ -0,0 +1,274 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Getting Started with ThreatExchange Sharing \n",
"\n",
"**Purpose**\n",
" \n",
"The ThreatExchange APIs are designed to make the sharing of indicators, and the connections between them, simple. Additionally, the APIs provide flexible options for deciding whom you share with: yourself, individual members, groups, and everyone!\n",
"\n",
"**What you need**\n",
"\n",
"Before getting started, you'll need a few things installed and some data. \n",
"\n",
" - [Pytx](https://pytx.readthedocs.org/en/latest/installation.html) for ThreatExchange access\n",
" - [Pandas](http://pandas.pydata.org/) for data manipulation and analysis\n",
" - A CSV file with data suitable for sharing\n",
" \n",
"All of the python packages mentioned below can easily be installed via \n",
"\n",
"```\n",
"pip install <package_name>\n",
"```\n",
"\n",
"### Setup a ThreatExchange `access_token`\n",
"\n",
"If you don't already have an `access_token` for your app, use the [Facebook Access Token Tool]( https://developers.facebook.com/tools/accesstoken/) to get one."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"from pytx.access_token import access_token\n",
"\n",
"# Specify the location of your token via one of several ways:\n",
"# https://pytx.readthedocs.org/en/latest/pytx.access_token.html\n",
"access_token()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Optionally, enable debug level logging"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"from pytx.logger import setup_logger\n",
"\n",
"# Uncomment this, if you want debug logging enabled\n",
"# setup_logger(log_file=\"pytx.log\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Configure Privacy Settings\n",
"\n",
"This will configure the API defaults for when you share data. There are [multiple levels of privacy](https://developers.facebook.com/docs/threat-exchange/reference/privacy/) to choose from. \n",
"\n",
"The code below will publish data to a whitelist that only your appID can see, for convenient testing."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"from pytx.access_token import get_app_id\n",
"from pytx.vocabulary import PrivacyType as pt\n",
"\n",
"# Choose the privacy level from \n",
"# https://pytx.readthedocs.org/en/latest/pytx.vocabulary.html#pytx.vocabulary.PrivacyType\n",
"privacy_type = pt.HAS_WHITELIST \n",
"\n",
"# Populate this with strings of app IDs or privacy groups. If using pt.VISIBLE, set to None\n",
"privacy_members=[str(get_app_id())] # Will also take other member or privacy group IDs as strings"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Define default fields for sharing\n",
"\n",
"Sometimes, your CSV data is a raw list of IPs or domains. Use this map to set default fields on the descriptors that are created. Don't worry though, if your data *does* have any of the defaults you've defined, we won't clobber it.\n",
"\n",
"In this example, our defaults are set for sharing manually curated data of malicious IP addresses from a botnet."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"from pytx.vocabulary import Attack as a\n",
"from pytx.vocabulary import ReviewStatus as rs\n",
"from pytx.vocabulary import Severity as s\n",
"from pytx.vocabulary import ShareLevel as sl\n",
"from pytx.vocabulary import Status as st\n",
"from pytx.vocabulary import ThreatDescriptor as td\n",
"from pytx.vocabulary import ThreatType as tt\n",
"from pytx.vocabulary import Types as t\n",
"\n",
"# See: https://pytx.readthedocs.org/en/latest/pytx.vocabulary.html#pytx.vocabulary.ThreatDescriptor\n",
"default_fields = {\n",
" #td.ATTACK_TYPE: a.MALWARE, # TODO uncomment when PR #120 gets added to Pytx in pip\n",
" td.CONFIDENCE: 75,\n",
" #td.EXPIRED_ON: '2016-02-25 00:00:00+0000',\n",
" td.PRIVACY_TYPE: privacy_type,\n",
" td.REVIEW_STATUS: rs.REVIEWED_MANUALLY,\n",
" td.SHARE_LEVEL: sl.AMBER,\n",
" td.SEVERITY: s.SEVERE,\n",
" td.STATUS: st.MALICIOUS,\n",
" td.THREAT_TYPE: tt.MALICIOUS_IP,\n",
" td.TYPE: t.IP_ADDRESS,\n",
" td.DESCRIPTION: '[example][tags] Test description'\n",
"}\n",
"\n",
"# Add in privacy members, as needed\n",
"if privacy_members is not None:\n",
" default_fields[td.PRIVACY_MEMBERS] = ','.join(privacy_members)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Share data from a file\n",
"\n",
"Grabs the data from a local CSV file and publishes it to ThreatExchange. We interpret the columns in \n",
"the data according to [Pytx's Vocabulary](https://github.com/facebook/ThreatExchange/blob/master/pytx/pytx/vocabulary.py)\n",
"\n",
"**At a minimum**, your CSV file should have one column, named `indicator`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"import csv\n",
"import pytx.errors\n",
"from pytx import ThreatDescriptor\n",
"\n",
"# The file to upload\n",
"file = 'test_share.csv'\n",
"\n",
"# Load the CSV and serially publish it\n",
"ind_count = 0\n",
"fail_count = 0\n",
"with open(file, 'rb') as csvfile:\n",
" reader = csv.DictReader(csvfile, delimiter=',', quotechar='\"')\n",
" for row in reader:\n",
" try:\n",
" fields = default_fields.copy()\n",
" fields.update(row)\n",
" result = ThreatDescriptor.new(params=fields)\n",
" except Exception, e:\n",
" print 'Unable to upload' + row['indicator'] + 'due to ' + result['message'] + \"\\n\"\n",
" fail_count = fail_count + 1\n",
" else:\n",
" ind_count = ind_count + 1\n",
"print \"Done publishing %d indicators with %d failures!\" % (ind_count, fail_count)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Confirm your data was shared\n",
"\n",
"Now, we do a quick search to confirm the data was published correctly to ThreatExchange."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"from datetime import datetime, timedelta\n",
"from time import strftime\n",
"import pandas as pd\n",
"from pytx import ThreatDescriptor\n",
"from pytx.vocabulary import ThreatExchange as te\n",
"\n",
"# Define your search string and other params, see \n",
"# https://pytx.readthedocs.org/en/latest/pytx.common.html#pytx.common.Common.objects\n",
"# for the full list of options\n",
"results = ThreatDescriptor.objects(\n",
" fields=ThreatDescriptor._default_fields,\n",
" limit=search_params[te.LIMIT],\n",
" owner=str(get_app_id()),\n",
" since=strftime('%Y-%m-%d %H:%m:%S +0000', (datetime.utcnow() + timedelta(hours=(-1))).timetuple()), \n",
" until=strftime('%Y-%m-%d %H:%m:%S +0000', datetime.utcnow().timetuple())\n",
")\n",
"\n",
"data_frame = pd.DataFrame([result.to_dict() for result in results])\n",
"data_frame.head(n=10)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Excellent, we've shared data!\n",
"\n",
"Now that we've walked through a simple example, try out the following exercises:\n",
"\n",
" - Share a list of malicious URLs with multiple members\n",
" - Share a list of malicious domain names with a privacy group"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# Put your Python code here!"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 2",
"language": "python",
"name": "python2"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.10"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
32 changes: 32 additions & 0 deletions ipynb/README.md
@@ -0,0 +1,32 @@
# Using Jupyter Notebook with Facebook ThreatExchange

This part of the Facebook ThreatExchange repository contains reference notebooks for getting started doing data analysis and sharing on ThreatExchange within the iPython Notebook framework.

## Installing Jupyter Notebook

If don't already have it installed, [this tutorial from Jupyter](https://jupyter.readthedocs.org/en/latest/install.html) is a great introduction.

## Additional Python Packages

All of the refernce notebooks make heavy use of the following Python libraries to greatly simplify common analytical tasks. It's recommended you install them prior to using the notebooks.


- [Pandas](http://pandas.pydata.org/) for data manipulation and analysis
- [Pytx](https://pytx.readthedocs.org/en/latest/installation.html) for ThreatExchange access
- [Seaborn](https://stanford.edu/~mwaskom/software/seaborn/) for making charts pretty

All of the python packages mentioned can be installed via

```
pip install <package_name>
```

But, no worries, we have put the same instructions at the top of each notebook, in case you don't want to read this far :)

## Using the Notebooks

Once you have the tools installed, simply run this command from your local GitHub repository folder or copy the *.ipynb files into your existing Jupyter Notebook setup.

## Feedback

Please let us know if these are useful, send us PRs with changes or submit your own notebooks!

0 comments on commit f6baca2

Please sign in to comment.