Skip to content

hamsof/ScrapingNextLevel

Repository files navigation

This is an reserch oriented project supervised by Dr. Arif Butt (Proffessor at PUCIT) for scraping from Beautful Soup to Selenium, then working on Scrapy tool to making our own crawler. Have experience new chalanges of scraping and automation through this project.

Version 1:

Simple HTML page

Version 2:

Html with CSS with table format of 9 books data. Have desgined python script for scraping from fetching data by looping over the contents of tables

Version 3:

Completely New Layout in Grid and Flex model with Bootstrap for responsive design. The python scrypting is done by Beautiful Soup from fetching data by relevat Classes and ID`s

Version 4:

(The Js version) It is about the failure of Beautiful soup for scraping like adding js to pages: BS4 will not work also it cannot enter any keys or press enter any button.

Version 5:

The login page added

Version 6:

Scraping of zameen.com Applying filters of city, marla and price wise and then fetching houses information of adreess,pic,size,price,no of baths,bedroms

Initially I have applied filters of city = Lahore , min price = 50 lakh max price = 1 crore and then I was able to get 1200+ houses data in the CSV file

Veriosn 7:

Github scraping from using PyGithub and also with selenium to scrape users information.
read PyGithub API: https://pygithub.readthedocs.io/en/latest/introduction.html

Version 1: User name, name , email, from pakistan who has repos greater than 50. Data of 1200 peope stored in json.
Version 2: Fetching programming langueages used by users as well.
Version 3: In this version the task was not to use API and scrape users information. I have used selenium and scrape users` information from main page of Github.

Veriosn 8:

Display of 100 books in an infinite scroll style and then scraping it through Selenium

Veriosn 9:

Display of 100 books in an Pagination style and then scraping it through Selenium

Veriosn 10:

Pagination version with a pop appearing and then closing this pop-up through selenium before scraping

Veriosn 11:

Scraping of The News Internation website ( News Website )

Type any keyword and then scrape all the news related to it
I have scraped all the news of Imran khan and China and then stored in a CSV file

Version 12:

Scraping of Indeed.com

Scraping jobs from indeed.com and saving jobs info in a csv file

Version 13:

Scraping of Twitter

Type any celebrity name and then scrape any given number of tweets of that person

Github pages :

https://hamsof.github.io/ScrapingNextLevel/

About

Scraping of our own built and many other popular sites like Zameen.com, The News International, Github.com etc

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published