Hackernews-Scraping

Business Requirements:

Scrape TheHackernews.com and store the result (Description, Image, Title, Url)
Maintain two relations - 1 with the url and title of the blog and other one with url and its meta data like (Description, Image, Title, Author)

Storages Supported:

MongoDB
Json
MySQL (WIP)

Requirements:

python3
pip
python libraries: _ requests _ BeautifulSoup4 _ pymongo _ jupyterlab * notebook
MongoDB
git

To run the application on your local machine:

Clone the repository:

Type the following in your terminal

git clone https://github.com/pushp1997/Hackernews-Scraping.git
Change the directory into the repository

cd ./Hackernews-Scraping
Create python virtual environment

python3 -m venv ./scrapeVenv
Activate the virtual environment created
- On linux / MacOS : source ./scrapeVenv/bin/activate
- On Windows (cmd) : "./scrapeVenv/Scripts/activate.bat"
- On Windows (powershell) : "./scrapeVenv/Scripts/activate.ps1"
Install python requirements

pip install -r requirements.txt
Open the ipynb using jupyter notebook

jupyter notebook "Hackernews Scraper.ipynb"
Run the notebook, you will be asked to provide inputs for no of pages to scrape to get the post and your MongoDB database URI to store the posts data.
Open mongodb shell connecting to the same URI you provided to the ipynb notebook while running it.
Change the database

use hackernews
Print the documents in the 'url-title' collection

db["url-title"].find().pretty()
Print the documents in the 'url-others' collection

db["url-others"].find().pretty()

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
.github		.github
hnscraper		hnscraper
tests		tests
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
Hackernews Scraper.ipynb		Hackernews Scraper.ipynb
LICENSE		LICENSE
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hackernews-Scraping

Business Requirements:

Storages Supported:

Requirements:

To run the application on your local machine:

Clone the repository:

About

Releases

Packages

Contributors 5

Languages

License

pushp1997/Hackernews-Scraping

Folders and files

Latest commit

History

Repository files navigation

Hackernews-Scraping

Business Requirements:

Storages Supported:

Requirements:

To run the application on your local machine:

Clone the repository:

About

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages