Skip to content

qre0ct/scrapper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

scrapper

Script to scrape from different social media platforms depending on the search word/s specified by the user in the config file. The results are stored in a MySql db as well. Currently the supported platforms are Pastebin, Pastie, Google, Reddit, Twitter. The results are obtained either through a direct search query on the platform or through a REST API exposed by the platform - NOT AN ELEGANT WAY OF DOING IT !

In a future release however, focus would be on

  1. consuming streaming APIs exposed by the various platforms.
  2. Multithreading the requests to various platforms
  3. Having a dashboard (may be measuring the social sentiments of posts on the various platforms) showing a more statistical data if possible.
  4. And having a more efficient alerting mechanism than mails (if possible)

Instructions to install the dependencies
pip install -U -r requirements.txt

The main entry point to the script is khabri.py
Ensure that the required python modules are installed and the required config.cfg is in place before running the script. The script may have some fancy debug messages --> that was to just kill the boring debug messages !

The config file has to be of the form :
[display] # this section is not really implemented as of now.
debug=false

[keyword]
search_term = comma separated search terms

[dbAccess]
db = dbName
username = dbUser
password = dbPassword

[mailerAccess]
username = gmail account from which you want to send mails
password = password for the above (IMPORTANT - the config file should be stored securely)
to = email id/(in case of ids it's a comma separated list) where the alerts should be sent
subject = Trust me over time these alerts (like anything else in this universe) would get boring. So you would want to have the subject as something that at least tickles you...atleast !

[apiTokens] # holds the respective token for different services. For now since it's only twitter keys needed for twitter, it contains details about twitter alone.
twitter_consumer_key =
twitter_consumer_secret =
twitter_access_token =
twitter_access_token_secret =

Save the file as config.cfg in the same dir as khabri.py

About

Script to scrape from different social media platforms.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages