Grow your team on GitHub
GitHub is home to over 28 million developers working together. Join them to grow your own development teams, manage permissions, and collaborate on projects.Sign up
Audit which email spam bots can collect from your sites.
altcoin market and project analyses
GitHub Archive is a project to record the public GitHub timeline, archive it, and make it easily accessible for further analysis.
find any kind of occupation or job title in a text or file
Structured Data Extractor. An application to extract structured data from web pages. It uses Data Extraction Based on Partial Tree Alignment (DEPTA) method. (UPDATE: I implemented a newer algorithm: https://github.com/seagatesoft/webdext)
Extract social media links and account names from websites.
Dockerfile for Apache Kafka
Gets data from coincap.io into the CLI
Search library for yandex.ru search engine.
A bare minimum Scrapy project template ready for Scrapinghub's Scrapy Cloud service.
Cookiecutter template for a Python package.
convenience method for parsing html to lxml elementtree using sane character decoding
Extract text from HTML
A minimal template for python packages
Just the facts -- web page content extraction
A logical, reasonably standardized, but flexible project structure for doing and sharing data science work.
Most used topics
This organization has no public members. You must be a member to see who’s a part of this organization.