## Open Source INTelligence (OSINT)
---

### Preliminary evaluation 
---

What are the options?
- The data has been processed and it's ready to be downloaded somewhere
    - how recent is it?
    
    
- There is a python library for it
    - how flexible is it?
    - is it updated regularly?
    
    
- There is an API for it
    - how much does it cost? 
    - what fraction of the data does it cover?
    
    
- The data has to be scraped
    - how long would it take?
    - should I do it by hand?
    - does the URL accept parameters?

### Example: preprocessed data

[GDELT](https://www.gdeltproject.org/#intro)

### Example: python library

In [4]:
from googlefinance.client import get_price_data

param = {
    'q': ".DJI",     # Stock symbol 
    'i': "86400",    # Interval size in seconds 
    'x': "INDEXDJX", # Stock exchange 
    'p': "1M"        # Period 
}

df = get_price_data(param)
print()
print(df)


                         Open      High       Low     Close     Volume
2018-05-01 16:00:00  24117.29  24117.29  23808.19  24099.05  380066052
2018-05-02 16:00:00  24097.63  24185.52  23886.30  23924.98  385346919
2018-05-03 16:00:00  23836.23  23996.15  23531.31  23930.15  389239720
2018-05-04 16:00:00  23865.22  24333.35  23778.87  24262.51  329482313
2018-05-07 16:00:00  24317.66  24479.45  24263.42  24357.32  307674344
2018-05-08 16:00:00  24341.35  24412.34  24198.34  24360.21  344935025
2018-05-09 16:00:00  24399.18  24586.48  24323.87  24542.54  361584696
2018-05-10 16:00:00  24591.66  24794.99  24575.91  24739.53  304209370
2018-05-11 16:00:00  24758.64  24868.65  24717.50  24831.17  274145837
2018-05-14 16:00:00  24879.37  24994.19  24862.52  24899.41  282855588
2018-05-15 16:00:00  24809.55  24809.55  24629.39  24706.41  301903205
2018-05-16 16:00:00  24722.32  24801.19  24672.79  24768.93  280812814
2018-05-17 16:00:00  24752.40  24839.49  24639.40  24713.98  314650345
2018-

### Example: API

In [1]:
import feedparser
feed = feedparser.parse('http://feeds.reuters.com/reuters/businessNews')

In [2]:
len(feed.entries)

10

In [3]:
from pprint import PrettyPrinter
PPRINTER = PrettyPrinter()
PPRINTER.pprint(feed.entries[0])

{'feedburner_origlink': 'http://www.reuters.com/article/us-usa-trade-metals-europe/u-s-hits-allies-with-tariffs-as-risk-of-trade-war-rises-idUSKCN1IV2TN?feedType=RSS&feedName=businessNews',
 'guidislink': False,
 'id': 'http://www.reuters.com/article/us-usa-trade-metals-europe/u-s-hits-allies-with-tariffs-as-risk-of-trade-war-rises-idUSKCN1IV2TN?feedType=RSS&feedName=businessNews',
 'link': 'http://feeds.reuters.com/~r/reuters/businessNews/~3/18fl4LSAN48/u-s-hits-allies-with-tariffs-as-risk-of-trade-war-rises-idUSKCN1IV2TN',
 'links': [{'href': 'http://feeds.reuters.com/~r/reuters/businessNews/~3/18fl4LSAN48/u-s-hits-allies-with-tariffs-as-risk-of-trade-war-rises-idUSKCN1IV2TN',
            'rel': 'alternate',
            'type': 'text/html'}],
 'published': 'Thu, 31 May 2018 15:58:47 -0400',
 'published_parsed': time.struct_time(tm_year=2018, tm_mon=5, tm_mday=31, tm_hour=19, tm_min=58, tm_sec=47, tm_wday=3, tm_yday=151, tm_isdst=0),
 'summary': 'WASHINGTON/PARIS (Reuters) - The Unite