This is a data wrangling project.
In this project, I made use of the Data Wrangling steps, which are:
- Gather
- Assess
- Clean
In gathering, I made use of the we-rate-dogs twitter account which posts dogs and rates them with funny comments. To do this, there was need to use the tweeter API-- tweepy. All of this data were extracted(gathered) into a dataframe.
Also, I made use of the image prediction tsv dataframe which had a machine learning prediction of the dogs posted. The dataframe had 3 predictions, but one of higher certainty It was downloaded programmatically.
Also, the tweeter-archive-enhanced data frame which had more details of the we-rate-dogs account. It had the stages of dogs and the texts of the tweeter account.
Here was the whole cleaning process of the 3 dataset. It had the test for quality and test for tidiness of the data. More of these were summed up in the wrangling_report.pdf file. The two method used in this cleaning process is the visual method and programmatic method. For the visual, a use of spreadsheet and python for the programmatic. Also, using the Question, Code and Observation method
This was fully done until I had a clean data ready for exploration and visualization
The Visualization report is in the act_report.pdf file and a peep of it is shown thus: