GitHub - hamsof/ScrapingNextLevel: Scraping of our own built and many other popular sites like Zameen.com, The News International, Github.com etc

This is an reserch oriented project supervised by Dr. Arif Butt (Proffessor at PUCIT) for scraping from Beautful Soup to Selenium, then working on Scrapy tool to making our own crawler. Have experience new chalanges of scraping and automation through this project.

Version 1:

Simple HTML page

Version 2:

Html with CSS with table format of 9 books data. Have desgined python script for scraping from fetching data by looping over the contents of tables

Version 3:

Completely New Layout in Grid and Flex model with Bootstrap for responsive design. The python scrypting is done by Beautiful Soup from fetching data by relevat Classes and ID`s

Version 4:

(The Js version) It is about the failure of Beautiful soup for scraping like adding js to pages: BS4 will not work also it cannot enter any keys or press enter any button.

Version 5:

The login page added

Version 6:

Scraping of zameen.com Applying filters of city, marla and price wise and then fetching houses information of adreess,pic,size,price,no of baths,bedroms

Initially I have applied filters of city = Lahore , min price = 50 lakh max price = 1 crore and then I was able to get 1200+ houses data in the CSV file

Veriosn 7:

Github scraping from using PyGithub and also with selenium to scrape users information.
read PyGithub API: https://pygithub.readthedocs.io/en/latest/introduction.html

Version 1: User name, name , email, from pakistan who has repos greater than 50. Data of 1200 peope stored in json.
Version 2: Fetching programming langueages used by users as well.
Version 3: In this version the task was not to use API and scrape users information. I have used selenium and scrape users` information from main page of Github.

Veriosn 8:

Display of 100 books in an infinite scroll style and then scraping it through Selenium

Veriosn 9:

Display of 100 books in an Pagination style and then scraping it through Selenium

Veriosn 10:

Pagination version with a pop appearing and then closing this pop-up through selenium before scraping

Veriosn 11:

Scraping of The News Internation website ( News Website )

Type any keyword and then scrape all the news related to it
I have scraped all the news of Imran khan and China and then stored in a CSV file

Version 12:

Scraping of Indeed.com

Scraping jobs from indeed.com and saving jobs info in a csv file

Version 13:

Scraping of Twitter

Type any celebrity name and then scrape any given number of tweets of that person

Github pages :

https://hamsof.github.io/ScrapingNextLevel/

Name		Name	Last commit message	Last commit date
Latest commit History 152 Commits
.vscode		.vscode
Version 1		Version 1
Version 10 (pop up)		Version 10 (pop up)
Version 11 (The News Internation)		Version 11 (The News Internation)
Version 12 (Indeed.com)		Version 12 (Indeed.com)
Version 13 (Twitter)		Version 13 (Twitter)
Version 2		Version 2
Version 3		Version 3
Version 5		Version 5
Version 6 (Zameen.com)		Version 6 (Zameen.com)
Version 7 (Github)		Version 7 (Github)
Version 8 (Infinite Scrol)		Version 8 (Infinite Scrol)
Version 9 (pagination)		Version 9 (pagination)
changed		changed
images		images
version 4 (js)		version 4 (js)
CA.html		CA.html
README.md		README.md
SP.html		SP.html
index.css		index.css
index.html		index.html

hamsof/ScrapingNextLevel

Folders and files

Latest commit

History

Repository files navigation