A Data wrangling and Cleaning Project That Involves Gathering Data from Different sources Including querying Twitter's API using tweepy
- NumPy
- pandas
- Matplotlib
- Seaborn
- import tweepy
- json
- timeit
For this project, three datasets were used. Two of them were provided directly, while the third one required querying Twitter's API and writing the data to a .txt file. The datasets used are as follows:
- WeRateDogs Twitter Archive Data: This dataset contains information such as tweet ID, timestamp, rating numerator, rating denominator, name, etc.
- Tab Separated Values (TSV) file: This file contains images that need to be filtered to extract pictures of dogs.
- Querying Twitter's API: Data obtained from querying Twitter's API and writing it to a .txt file.
- The top dog breeds posted on WeRateDogs are:
- Labrador Retriever
- French Bulldog
- Chihuahua
- Pembroke
- Eskimo Dog
- The most commonly used device by handlers for tweeting is an iPhone.
- December has the highest tweet rates, followed by November.
- There is no significant effect of the day on the tweet rate.
This project consists of two reports:
- Wrangle Report: This report provides detailed information about the data wrangling efforts undertaken during the project. It is framed as an internal document.
- Wrangle Act: This report communicates all the insights and visualizations derived from the wrangled data. It is framed as an external document, similar to a blog post or a magazine article.
By presenting the findings in these two reports, the audience will gain a comprehensive understanding of the data wrangling process and the insights obtained from the analysis.