Web-Scraping

A technique employed to extract large amounts of data from websites whereby the data is extracted and saved to a local file in the computer or to a database in table (spreadsheet) format

Web scraping of http://www.fallingrain.com/world/IN/ is done in this code.

All error scenarios (e.g. if the website is down or you don’t get the expected response) has been handled properly.

APPROACH :

Request and BeatuifulSoup Library has been mainly used for implementing the code. The Requests library allows us to make use of HTTP within our Python programs in a human readable way, and the Beautiful Soup module is designed to get web scraping done quickly.

Links/URL at different level changes by just single character. For Example : Cities in Rajasthan, starting with 'J' has link : http://www.fallingrain.com/world/IN/24/a/J/ and starting with 'Ja' has link : http://www.fallingrain.com/world/IN/24/a/J/a/ Thus we add characters to the link, to get onto next-level.

Firstly, we generate links and then check whether the given link or url is active or not.

If the url is active, then we see whether the given webpage contains the required data in the form of 'City', 'Latitude', etc. or the links leading to next level.

If webpage contains required data, then the data is added to the Pandas Dataframe, so that data manipulations can been done easily.

If it contain only links to the next-level, then we generate next-level url and repeat the procedure.

Required data is tagged under 'td' tag, which is tagged under 'tr' tag, thus if soup object has 'tr' tag, then we need to add the data to our dataframe.

If we only have 'href' tag, then we need to go onto next-level to fetch required data.

Lastly pandas dataframe is exported to CSV file.

References :

https://www.digitalocean.com/community/tutorials/how-to-scrape-web-pages-with-beautiful-soup-and-python-3 https://www.youtube.com/watch?v=v5DDW5dyfyc

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
README.md		README.md
data.csv		data.csv
web_scraping.ipynb		web_scraping.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Web-Scraping

APPROACH :

References :

About

Releases

Packages

Languages

ritvikGarg-zz/Web-Scraping

Folders and files

Latest commit

History

Repository files navigation

Web-Scraping

APPROACH :

References :

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages