Skip to content
This repository

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Standalone internal scripts used on HFNews

branch: master

Fetching latest commit…

Octocat-spinner-32-eaf2f5

Cannot retrieve the latest commit at this time

Octocat-spinner-32 Crunchbase
Octocat-spinner-32 NewsScraper
Octocat-spinner-32 OpenCalais
Octocat-spinner-32 .gitignore
Octocat-spinner-32 README.md
README.md

hfnews-scripts

Standalone internal scripts used on HFNews, a news aggregator and search engine built in Django.

NewsScraper

Scrapes RSS feeds and runs them through Readability to get news articles. Requires bs4, readability-lxml, lxml, requests, and feedparser.

Call fetch_articles(rss.xml) to print to stdout.

Calais

Takes body of text and runs it through the OpenCalais API to get entities (tags). Requires requests.

tag = TagScraper(text)
tag.get_calais_json()
tag.get_entities()

# all entities under 30% relevance are filtered out:
print tag.entities
print tag.crunchbase_entities

Crunchbase

Used in combination with entities retrieved using calais.py to get relevant company/person information. Requires requests.

Call fetch_info(tag_name, tag_type) to retrieve relevant information from Crunchbase API. tag_type can be equal to Company or Person.

Something went wrong with that request. Please try again.