Skip to content

Web Scraping Wikipedia for Disney Movies to create a Disney Movies dataset and then cleaning the data to perform further Data Analysis using the cleaned JSON

Notifications You must be signed in to change notification settings

sinjoysaha/Disney-Movies-Wiki-WebScraper

Repository files navigation

Disney Movies Wikipedia WebScraper

GitHub contributors GitHub forks GitHub stars GitHub watchers GitHub issues Profile views GitHub followers LinkedIn Twitter

Table of Contents

About the Project

In this Webscraping Project Jupyter notebook, we scrape the Wikipedia pages for Disney movies to create a Disney Movies dataset. We scrape data like Title, Directed by, Produced by, Written by, Narrated by, Music by, Cinematography, Edited by, Production company, Distributed by, Release date, Running time, Country, Language from Wikipedia. We also work with OMDb API to get imdb, metascore, rotten_tomatoes data. The data is stored as JSON and CSV and intermediately using Pickle library in Python.

Project Image

Tasks

  • Task 1: Scrape info box from Toy Story 3 Wiki page and save in python dictionary.
  • Task 2: Scrape info box for all Disney movies and save in list of python dictionaries.
  • Task 3: Clean the data!
    • Strip out all references ([1], [2], etc)
    • Split up long strings
    • Convert 'Running time' field to integer
    • Convert 'Budget' and 'Box office' fields to floats
    • Convert dates to datetime objects
    • Save data using Pickle
  • Task 4: Attach IMDb, Rotten Tomatoes, Metascores to dataset using OMDb API.
  • Task 5: Save final dataset as JSON and CSV files.

Built With

  • Jupyter Notebook
  • Beautiful Soup
  • Requests
  • Pickle
  • Pandas

Fork the Repo and Contribute

Contributions are what make the open source community such an amazing place to be learn, inspire, and create. Any contributions you make are greatly appreciated.

  1. Fork the Project (click on Fork in the top-left corner)
  2. Create your Feature Branch (git checkout -b feature)
  3. Commit your Changes (git commit -m 'Add some AmazingFeature')
  4. Push to the Branch (git push origin feature)
  5. Open a Pull Request

Contact

Sinjoy Saha