Skip to content

ria08/web-scraping-and-API

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

7 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ•·οΈ Web Scraping Projects Portfolio

A collection of three data scraping projects showcasing different scraping methodsβ€”HTML scraping with requests and BeautifulSoup, and API scraping using RapidAPI. Each project involves data extraction, preprocessing, and storing the results in a structured format for analysis.


πŸ“‘ Table of Contents

Objective: Scrape UHD TV product listings from Flipkart for basic price and feature comparison.

  • Source: flipkart.com
  • Method: HTML scraping using requests and BeautifulSoup
  • Pages Scraped: 10
  • Total Rows: 240 products
  • Columns: 7 (Name, Price, Rating, Discount, Launch Year, Operating System, Delivery Type)
  • Output: CSV file

Key Learnings:

  • Pagination handling
  • Dealing with inconsistent HTML tags and missing data
  • Headers and user-agent spoofing

πŸ“š Project 2: Goodreads Dystopian Books Scraper

Objective: Extract book details for dystopian genre to analyze popularity and patterns.

  • Source: goodreads.com
  • Method: HTML scraping using requests and BeautifulSoup
  • Pages Scraped: 40
  • Total Rows: 3,636 books
  • Columns: 6 (Book Title, Author, Ratings, Avg Rating, Score, Total Votes)
  • Output: CSV file

Key Learnings:

  • Extracting nested data in HTML
  • Managing large pagination without server blocking
  • Parsing numeric and textual data from strings

🐦 Project 3: Twitter API Scraper

Objective: Collect tweet and user metadata using RapidAPI to experiment with API data extraction.

  • Source: RapidAPI - twitter154.p.rapidapi.com
  • Method: API scraping using requests and API key authentication
  • Total Rows: 83 tweets
  • Columns: 27 (tweet_id, creation_date, text, media_url, video_url, user, language, favorite_count, retweet_count, reply_count, quote_count, retweet, views, timestamp, video_view_count, in_reply_to_status_id, quoted_status_id, binding_values, expanded_url, retweet_tweet_id, extended_entities, conversation_id, retweet_status, quoted_status, bookmark_count, source, community_note)
  • Output: CSV or JSON

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published