Skip to content

radhika3558/databases-mapbox-project

Repository files navigation

🎥 Film Analysis Project: Uncovering Cinematic Insights

🎯 Objective

The aim of this project was to delve deep into the realm of cinema and derive meaningful insights through the lens of data journalism. My mission was to analyze a dataset consisting of films, critics, and directors, originally sourced from a BBC article. The goal was to identify trends and narratives in film critique and production spanning the last 15 years, offering a unique perspective on the global film industry.

🛠️ Process

Data Collection

The project commenced with the complex task of data collection. Leveraging Python libraries such as BeautifulSoup and requests, I meticulously scraped a comprehensive list of movies, critics, and their affiliations from the BBC's webpage. This phase required acute attention to detail, especially due to the minimalistic layout of the source page, which presented its own set of challenges in data isolation.

Data Structuring

Following data acquisition, the next phase involved structuring the raw HTML data into a format amenable for analysis. I transformed the data into Python lists and dictionaries and subsequently into a structured pandas DataFrame. This process was critical in preparing the data for the subsequent analysis phase.

Regular Expressions

I extensively utilized regular expressions to parse and extract essential information from the dataset, including critic names, affiliations, countries, movie titles, directors, and release years.

Additional Data Enrichment

Parallel to the initial data scraping, I expanded the data collection to Wikipedia. This involved scraping director biographical details, thereby enriching the dataset with contextual information about their birthplaces and dates.

Data Visualization with Mapbox API

In an effort to bring the data to life visually, I integrated the Mapbox API for advanced geospatial data representations, allowing me to create interactive and engaging maps that vividly illustrate the global distribution and impact of film directors and critics.

🔎 Key Findings

  • 🇺🇸 The US, as expected, had the most number of critically acclaimed directors as well as the highest number of votes from critics
  • 🇨🇳 Movies directed by Chinese directors had the second-highest number of votes - 98 in total
  • 🇮🇷 A surprising finding was that Iranian movies were very popular with critics. Six directors were critically acclaimed, with Asghar Farhadi leading the pack with 31 votes

📖 Key Learnings

  • Data Scraping and Cleaning: This project significantly enhanced my skills in web scraping and the intricacies of data cleaning and preparation.
  • Regular Expressions Mastery: Through extensive use, I developed a proficiency in using regex for effective data extraction.
  • Pandas Proficiency: The process of data transformation using pandas was invaluable, improving my ability to manipulate and analyze large datasets.
  • Creative Problem-Solving: Each challenge encountered necessitated unique and creative solutions, underscoring the importance of adaptability and innovation in data journalism.

💭 Reflections and Future Directions

In retrospect, several areas could be refined in future projects:

  • Automation and Efficiency: Automating certain aspects of the scraping process would enhance efficiency.
  • Advanced Data Analysis: Implementing more sophisticated statistical methods could yield deeper insights.
  • Enhanced Visual Storytelling: Incorporating advanced data visualization techniques would further augment the storytelling aspect of the data.

🎬 Conclusion

This project represents a comprehensive endeavor in data journalism, blending technical data processing with narrative storytelling. It stands as a testament to the power of data in revealing hidden stories within the film industry.


About

Database analysis; visualization using Mapbox API

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published