Designed and implemented a NodeJS script using Puppeteer to scrape IMDb website for a list of movies and generate an array of movie objects with specific details.
- 👋 Hi, I’m Chidike Henry
- 😎 I’m a MERN fullstack engineer
- 💻 This is a backend code for using puppeteer as tool to webscrape certain information from IMDB website
- 💞️ I’m looking to collaborate on JS projects
- 📫 How to reach me chidike.henry@gmail.com
The purpose of this project task is to evaluate my abilities in creating a Node.js script using Puppeteer to scrape IMDb for a list of movies and generate an array of movie objects with specific details.
- NodeJS
- ExpressJS
- Javascript
- Typescript
- Puppeteer
- Nodemon
- Postman
Use Puppeteer to launch a headless browser. Navigate to IMDb's movie section (e.g., top-rated movies, popular movies, etc.). Scrape specific details for each movie (e.g., title, release year, rating, cast, etc.). Generate an array of movie objects, each containing relevant details for an individual movie. Output the array of movie objects. Steps:
Use Puppeteer to initiate a headless browser. Navigate to the IMDb website (www.imdb.com) and select a section that lists movies (e.g., top-rated movies). Identify and scrape specific details for each movie, such as title, release year, ratings, cast, etc. Create a structured array of movie objects, where each object contains details for an individual movie. Output the array of movie objects. Example Criteria:
The script should be written in Node.js and utilize Puppeteer. The array of movie objects should include details such as movie titles, release years, ratings, cast information, or any other relevant details available on IMDb. Proper error handling should be included to manage any issues during scraping. The script should be well-commented and provide clear documentation on the structure of the movie objects.
- Ensure you have Node.js installed on your machine. You can download it from nodejs.org.
- Clone the repository:
git clone <repository-url>
- Navigate to the project directory:
cd
- Install dependencies:
npm install
- From the parent directory change to the root folder by running the following command
cd IDMB-data-Scraping
- run
npm install
to install all packages in package.json file - From the root folder run the following command to start the backend server:
npm run start
- The backend server will be running at http://localhost:3000.
- GET
To retrieve available info , make a GET request to /imdbscrape
Please use postman to test endpoints here http://localhost:3000/imdbscrape
Access documentation here - (None for now)
The application provides appropriate error handling for invalid inputs and unexpected scenarios.
The application includes comprehensive unit tests to ensure reliability and functionality. Run tests using the following command: npm test
With these instructions, developers and users will be able to quickly set up and run the Drone Dispatch Backend App for testing and development purposes.
- GitHub: [@lacegiovanni17]https://github.com/lacegiovanni17
- Twitter: [@ChidikeC] https://twitter.com/ChidikeC
- LinkedIn: [LinkedIn]https://www.linkedin.com/in/chidike-chizoba-25628a40/
Contributions, issues, critics and feature requests are welcome!
Please give a ⭐️ if you like this project!
- Hat tip to puppeteer
- Inspiration
- etc