Skip to content

Using puppeteer as a tool to web scrape information from IMDB website

Notifications You must be signed in to change notification settings

lacegiovanni17/IMDB_web_scrapping_puppeteer

Repository files navigation

IMDB Puppeteer WebScraping

Designed and implemented a NodeJS script using Puppeteer to scrape IMDb website for a list of movies and generate an array of movie objects with specific details.

postman results

About

  • 👋 Hi, I’m Chidike Henry
  • 😎 I’m a MERN fullstack engineer
  • 💻 This is a backend code for using puppeteer as tool to webscrape certain information from IMDB website
  • 💞️ I’m looking to collaborate on JS projects
  • 📫 How to reach me chidike.henry@gmail.com

Introduction

The purpose of this project task is to evaluate my abilities in creating a Node.js script using Puppeteer to scrape IMDb for a list of movies and generate an array of movie objects with specific details.

Technologies Used

  • NodeJS
  • ExpressJS
  • Javascript
  • Typescript
  • Puppeteer
  • Nodemon
  • Postman

Project Description: “IMDB Puppeteer Scrape”

Use Puppeteer to launch a headless browser. Navigate to IMDb's movie section (e.g., top-rated movies, popular movies, etc.). Scrape specific details for each movie (e.g., title, release year, rating, cast, etc.). Generate an array of movie objects, each containing relevant details for an individual movie. Output the array of movie objects. Steps:

Use Puppeteer to initiate a headless browser. Navigate to the IMDb website (www.imdb.com) and select a section that lists movies (e.g., top-rated movies). Identify and scrape specific details for each movie, such as title, release year, ratings, cast, etc. Create a structured array of movie objects, where each object contains details for an individual movie. Output the array of movie objects. Example Criteria:

The script should be written in Node.js and utilize Puppeteer. The array of movie objects should include details such as movie titles, release years, ratings, cast information, or any other relevant details available on IMDb. Proper error handling should be included to manage any issues during scraping. The script should be well-commented and provide clear documentation on the structure of the movie objects.

Getting Started

Mini-project Puppeteer web scraping

Prerequisites

  1. Ensure you have Node.js installed on your machine. You can download it from nodejs.org.

Installation

  1. Clone the repository: git clone <repository-url>
  2. Navigate to the project directory: cd
  3. Install dependencies: npm install

Running the App from your terminal

  1. From the parent directory change to the root folder by running the following command cd IDMB-data-Scraping
  2. run npm install to install all packages in package.json file
  3. From the root folder run the following command to start the backend server: npm run start
  4. The backend server will be running at http://localhost:3000.

Endpoints

  1. GET

Usage

To retrieve available info , make a GET request to /imdbscrape

Please use postman to test endpoints here http://localhost:3000/imdbscrape

Documentation

Access documentation here - (None for now)

Error Handling

The application provides appropriate error handling for invalid inputs and unexpected scenarios.

Testing

The application includes comprehensive unit tests to ensure reliability and functionality. Run tests using the following command: npm test

With these instructions, developers and users will be able to quickly set up and run the Drone Dispatch Backend App for testing and development purposes.

Author

👤 Author1

Contributing

Contributions, issues, critics and feature requests are welcome!

Show your support

Please give a ⭐️ if you like this project!

Acknowledgments

  • Hat tip to puppeteer
  • Inspiration
  • etc

About

Using puppeteer as a tool to web scrape information from IMDB website

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published